A weblog by Bryan

A collection of thoughts, articles and stuff by Bryan

Fixing a Boiling Hot Macbook Retina 2015

Problem statement

My main machine is a 2015 macbook pro retina on which I run Linux natively (not in a VM)

It’s a beautiful machine with a high-density pixel display. It’s also an oven whenever you start to do some serious work ™ on it. What do I mean by serious work ?

  • Compiling clang
  • Compiling Android
  • Compiling a meaty Linux kernel

In tandem with say having a Google Hangouts call, watching youtube or streaming some music over whatever music streaming service it us you use. [offtopic:] I spotify BTW.

Background

Q: In the first instance - what type of processor is in this MacBook ?

cat /proc/cpuinfo
processor : 7
vendor_id : GenuineIntel
cpu family : 6
model : 70
model name : Intel(R) Core(TM) i7-4960HQ CPU @ 2.60GHz
stepping : 1
microcode : 0xf
cpu MHz : 3403.257
cache size : 6144 KB
physical id : 0
siblings : 8
core id : 3
cpu cores : 4
apicid : 7
initial apicid : 7
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm epb tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm xsaveopt dtherm ida arat pln pts
bugs :
bogomips : 5188.50
clflush size : 64
cache_alignment : 64
address sizes : 39 bits physical, 48 bits virtual
power management:

So it’s a Intel® Core™ i7-4960HQ CPU @ 2.60GHz with four cores and eight threads.

Looking at the output of the ‘sensors’ application in a quiescent system we can see the following.

$ sensors
BAT0-virtual-0
Adapter: Virtual device
temp1: +33.7°C

coretemp-isa-0000
Adapter: ISA adapter
Physical id 0: +60.0°C (high = +84.0°C, crit = +100.0°C)
Core 0: +59.0°C (high = +84.0°C, crit = +100.0°C)
Core 1: +60.0°C (high = +84.0°C, crit = +100.0°C)
Core 2: +59.0°C (high = +84.0°C, crit = +100.0°C)
Core 3: +58.0°C (high = +84.0°C, crit = +100.0°C)

applesmc-isa-0300
Adapter: ISA adapter
Left side : 2158 RPM (min = 2160 RPM, max = 6156 RPM)
Right side : 2001 RPM (min = 2000 RPM, max = 5700 RPM)

  • Four ‘Core #’ sensors reporting a temperature
  • One ‘Physical id 0’ reporting a temperature
  • A listing for a Left side fan and a Right side fan

In the default quiescent state - which for the purposes of this blog is an Ubuntu system booted to a desktop with two terminal windows open we see:

  • Physical id 0 @ 55 c
  • Core 0 @ 52
  • Core 1 @ 53
  • Core 2 @ 53
  • Core 3 @ 54
  • Left-Fan @ 2165/6156 RPM
  • Right side @ 1999/5700 RPM

Note there are four ‘Core’ sensors one for each of the cores previously identified for the 4960HQ and one ‘Physical id 0’ - physical id refers to the ‘package’ i.e. the four cores sit on one physical chip that gets soldered down to the board and this chip has its own individual sensor.

Take a look at the inside of the mac - the thing marked “CPU package” - that is what ‘Physical id 0’ referes to. The individual Cores are inside of that package.

Lets look inside the Macbook for starters.

So here we can see two big fans “Left fan” (Left side) and “Right fan” (Right side), Air vents right beside those fans, a GPU (graphics processing unit) and CPU package (remember we said four cores inside of one ‘package’) plus a heat-strip to dissipate the heat from the CPU and GPU.

Now that strip is really not much to draw away any excess heat generated especially when you consider this is how the 4950HQ (a desktop cousin of the 4960HQ) is tooled up for cooling.

How bad is it really ?

Are you just whinging - how bad can it really be ?

Let’s run a test. Initial conditions are

  • A freshly booted ubuntu system
  • Two windows open one to build Linux one to monitor ‘sensors’
  • A running Webcam capture (to crank up the GPU)

What will we monitor ? For the purposes of this test we will monitor the CPU temperature and the kernel log.

What will the test be ? Against Linux commit 520eccdfe187 (“Linux 4.13-rc2”) we will run this command with a Webcam stream active

1
make clean; make mrproper; make distclean; make allyesconfig && time make bzImage -j 14

This will switch all kernel options for an x86_64 system to ‘y’ and then run a build of up-to 14 concurrent build processes.

To begin with the build seems OK recall the initial conditions

  • Core 0 @ 52
  • Core 1 @ 53
  • Core 2 @ 53
  • Core 3 @ 54
  • Left-Fan @ 2165/6156 RPM
  • Right side @ 1999/5700 RPM

Viewed against our mid-build metrics

  • Core 0 @ 75 C
  • Core 1 @ 76 C
  • Core 2 @ 76 C
  • Core 3 @ 80 C
  • Left-Fan @ 6096/6156 RPM
  • Right side @ 5652/5700 RPM

Here we see the Core temperature is hot yes but below the ‘high’ threshold and way below critical. The fans are working at the top of their limit to keep the processor in that state.

However the fans keep spinning and eventually we get this kernel message indicating the temperature sensors have detected a prolonged ‘critical’ temperature.

This is bad, very bad. Intel rates most of its processors at 105 degrees celcius maximum i.e. if your processor gets this hot a component inside of it will drive an exception to shut down the system entirely before things start to melt. In this case we can see that the ACPI descriptors provided by the BIOS have told Linux that 100 degrees celcius is the maximum.

In any case - it would be bad to allow things to continue on this way - IT WOULD BE BAD.

What the hell can we do about it ?

Helpfully I’m not the first person to loose the rag with the over-heating Macs. Somebody else has already taken the extreme step of drilling their Mac

I should say - my Dad did the marking and drilling here - since he has more experience/skill at this type of thing than I do :)

The first thing to do is to mark out where the drill holes will go.

Then drill the holes

Then the second set of holes - this time marked in a box shape

And finally the finished product

Results

So that’s nice - you’ve drilled a hole in your €2,500 macbook - congratulations. Does it make a difference to temperature and Machine-check-exceptions ?

In short - yes; whereas before the holes were drilled @ around 5900 RPM Left Side - the fan couldn’t spin faster to evacuate out more air - so the processor just overheated.

Running the same test again we find that the Machine check exceptions have gone.

The hottest I’ve seen the processor get is 91 C - hot but still 9 degrees off max. Note how for a similar fan RPM we can operate at a much higher temperature.

Now with more airflow - the fans still aren’t maxed out when the temperature rises to 90+ degrees, and can allow the processor to draw more power - and rotate even faster to compensate.

Thus far - despite doing a few builds while watching some youtube, I haven’t encountered the same error again.

It looks like

Screaming Fast GPIO With Intel Galileo

Foreword

Intel Galileo and the Quark X1000 SoC that power the Galileo, are projects on which I was the leading Linux engineer. One problem we had when getting the SoC up-and running as an Arduino host - was that the GPIOs didn’t provide the set of functionality that the Arudino base-libraries provided. For that reason a Cypress CY8C9520A I2C GPIO expander was added. The Cypress though is an I2C based device and correspondingly has pretty disappointing GPIO performance - in comparsion to the base Arduino which was able to toggle GPIOs in the MegaHertz range - we were achieving the hundreds of KiloHertz.

Here is a post I made on the various different ways of driving GPIO on the Galileo subsequent to the original release of the project in 2013.

Screaming fast GPIO on Intel Galileo

Galileo - has two pins IO2 and IO3 through which we can drive significant data rates. By default these two pins are routed to the Cypress.

There are three methods to communicate with these pins - which have increasing throughput

digitalWrite()

digitalWrite(register uint8_t pin, register uint8_t val)

Using this method it is possible to toggle an individual pin in a tight loop @ about 477 kHz

1
2
3
4
pinMode(2, OUTPUT_FAST);
pinMode(3, OUTPUT_FAST);
digitalWrite(2, 1);
digitalWrite(2, 0);

This is a read-modify-write operation meaning we first read the value of the GPIO, then we modify the value, then we write the new value back.

This method isn’t especially fast since we are ultimately going through GPIOLib in user-space - which involves a context switch to kernel and back in order to toggle a GPIO.

Example digitalWrite() outputs 477kHz waveform on IO2:

1
2
3
4
5
6
7
8
9
10
11
12
setup(){
  pinMode(2, OUTPUT_FAST);
}

loop() {
  register int x = 0;

  while(1){
      digitalWrite(2, x);
      x =!x;
  }
}

fastGpioDigitalWrite()

fastGpioDigitalWrite(register uint8_t gpio, register uint8_t val)

This function actually lets you write directly to the registers - without going through the code around digitalWrite() and consequently has better performance than a straight digitalWrite(). Unlike digitalWrite() we won’t context-switch when updating the register associated with the GPIO. Only GPIO 2 and GPIO 3 are supported using this method.

Using this method it is possible to toggle an individual pin (GPIO_FAST_IO2, GPIO_FAST_IO3) at about 680 kHz.

1
2
3
4
pinMode(2, OUTPUT_FAST);
pinMode(3, OUTPUT_FAST);
fastGpioDigitalWrite(GPIO_FAST_IO2, 1);
fastGpioDigitalWrite(GPIO_FAST_IO3, 0);

Again this uses read/modify/write - and can toggle one GPIO at a time

Example fastGpioDigitalWrite - outputs 683kHz waveform on IO3:

1
2
3
4
5
6
7
8
9
10
11
12
13
setup(){
  pinMode(3, OUTPUT_FAST);
}

loop() {

  register int x = 0;

  while(1){
      fastGpioDigitalWrite(GPIO_FAST_IO3, x);
      x =!x;
  }
}

fastGpioDigitalWriteDestructive()

fastGpioDigitalWriteDestructive(register uint8_t gpio_mask)

Using this method it is possible to achieve 2.93 Mhz data toggle rate on IO2/IO3 individually or simultaneously

1
2
pinMode(2, OUTPUT_FAST);
pinMode(3, OUTPUT_FAST);

It is the responsibility of the application to maintain the state of the GPIO registers directly. To enable this a function called fastGpioDigitalLatch() is provided - which allows the calling logic to latch the initial state of the GPIOs - before updating later. This method just writes GPIO bits straight to the GPIO register - i.e. a destructive write - for this reason it is approximately 2 x faster then read/modify/write. Twice as fast for a given write - means four times faster for a given wave form - hence ~700kHz (680kHz) becomes ~2.8Mhz (2.93 MHz)

Example fastGpioDigitalWriteDestructive outputs 2.93MHz waveform on IO3:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
uint32_t latchValue;

setup(){

  pinMode(3, OUTPUT_FAST);
  latchValue = fastGpioDigitalLatch();

}

loop() {

  while(1){
      fastGpioDigitalWriteDestructive(latchValue);
      latchValue ^= GPIO_FAST_IO3;
  }

}

Example-4 fastGpioDigitalWriteDestructive outputs 2.93MHz waveform on both IO2 and IO3:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
uint32_t latchValue;

setup(){
  pinMode(2, OUPUT_FASTMODE);
  pinMode(3, OUPUT_FASTMODE);

  latchValue = fastGpioDigitalLatch(); // latch initial state

}

loop() {

  while(1) {

      fastGpioDigitalWriteDestructive(latchValue);
      if(latchValue & GPIO_FAST_IO3){

          latchValue |= GPIO_FAST_IO2;
          latchValue &= ~ GPIO_FAST_IO3;

      }else{
          latchValue |= GPIO_FAST_IO3;
          latchValue &= GPIO_FAST_IO2;

      }
  }
}

In other words the responsibility lies with the application designer in cases 3 and 4 to ensure the GPIO register values are correct - assuming - these values matter to the application use-case.