5 minute read

Once upon a time, in a fairy land of servers, I was upgrading a BMC firmware on my home server. And it went freaking bitter.

image-center

It goes like this, the fourth, the fifth

The process falls, the watchdog trips. Just look at the log:

Supermicro Update Manager (for UEFI BIOS) 2.13.0-p7 (2023/12/20) (x86_64)
Copyright(C) 2013-2023 Super Micro Computer, Inc. All rights reserved.
If the FW update fails, Please try again....
Uploading FW image....
Upload part 0   261456 bytes, [Ok]   
Upload part 1 15630336 bytes, [Ok]   
Upload part 2  2140296 bytes, [Ok]   
Upload part 3  8691744 bytes, [Ok]   
Upload part 4   262144 bytes, [Ok]   

********************************<<<<<ERROR>>>>>*********************************

ExitCode                = 149
Description             = IPMI execution failed
Program Error Code      = 286.14
Error message:
        Update package verification failed!

********************************************************************************

Shortly after, in some 10-15 minutes, the server disappears never to be seen again. And the cherry on the top - the BMC gets unresponsive. Detaching the power cords, waiting 5 minutes, and reattaching them again - and my nasty boy is slowly back again, but without the BMC:

badboi:~$ ipmitool lan print
Could not open device at /dev/ipmi0 or /dev/ipmi/0 or /dev/ipmidev/0: No such file or directory

It is widely known among the engineers that once released from the device, the magical white smoke cannot be inserted back. Luckily, the smoke was not seen, and the BMC is still there. At this point I’m assuming the BMC is sleeping and I just need to wake it up.

The secret of sun-landing is to land at night

After a couple of teapots and some extensive google-fu:
This BMC model is affected by CVE-2019-6260, which allows me to do some forbidden and nasty things like drinking bleach [citation needed] or randomly read and write its memory directly from the host. Of course, the vulnerability is long patched and fixed, but the catch is - the patch is applied to the firmware, and to load the firmware you actually need the BMC to boot. Eureka!

The BMC is not booted, but it’s not physically broken. That means, if I can read its memory, I might get a hint about why it’s not booting.

We will use the tool culvert to query its state via PCIe LPC interface, and get the FW contents.
Dumping the persistent memory is fairly easy with that tool:

badboi:~$ /tmp/culvert -vv read firmware > /tmp/fwbad.bin
[*] ahb_readl: 0x1e6e2070: 0xf102d286
[*] ahb_readl: 0x1e620010: 0x00000400
...

Now, let’s see what’s going on. Luckily, we can emulate the dumped firmware binary with QEMU:

qemu-system-arm -M ast2500-evb,fmc-model=mx66l51235f -nographic -drive file=fwbad.bin,format=raw,if=mtd

And what do we have here? Nothing but a UBoot prompt, sitting there for eternity:


U-Boot 2013.01 (Feb 12 2019 - 15:34:40)



Seems like it’s a super early stage of the boot process. It clearly can’t move further, but why it’s stuck there? No debug info, no nothing. We need to go deeper. Let’s check the dump itself, with hexedit.

 000022f0:  b4 30 c5 e1 02 36 a0 e3  00 30 85 e5 0b 30 a0 e3  .0...6...0...0..
 00002300:  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  ................
 00002310:  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  ................
<...>
 000023e0:  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  ................
 000023f0:  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  ................
 00002400:  2a 00 00 1a 01 3b a0 e3  b4 30 c5 e1 01 33 a0 e3  *....;...0...3..
 00002410:  20 00 5c e3 00 30 85 e5  01 30 a0 e3 3c 10 80 05   .\..0...0..<...

What we see here - the dump looks like a slice of swiss cheese. Every now and then we see 2048 bytes of zeroes mixed with data.
That’s not bussin’ at all. Let’s compare the dump with the original firmware binary:

diff -y <(head -c $(( 2**20 )) ~/avc/fwbad.bin | xxd ) <(head -c $(( 2**20 )) 2UNPACKED_SMT_H12AST2500_64M_30902_V.bin | xxd ) | grep '\s|\s' | less

and

00001000: 0000 0000 0000 0000 0000 0000 0000 0000  .......... | 00001000: 3840 2de9 3830 9fe5 0840 93e5 6dff ffeb  8@-.80...@
00001010: 0000 0000 0000 0000 0000 0000 0000 0000  .......... | 00001010: 0257 14e2 0600 000a 241a a0e1 0710 01e2  .W......$.
00001020: 0000 0000 0000 0000 0000 0000 0000 0000  .......... | 00001020: 0110 81e2 0111 a0e1 b1b1 00eb 0050 a0e1  ..........
00001030: 0000 0000 0000 0000 0000 0000 0000 0000  .......... | 00001030: 0100 00ea 0c00 9fe5 a845 00eb 0500 a0e1  .........E
00001040: 0000 0000 0000 0000 0000 0000 0000 0000  .......... | 00001040: 3880 bde8 0020 6e1e d5f7 0200 0840 2de9  8.... n...

Yep, that’s what I thought. The dump’s first 32K is crippled. That explains why the firmware can’t load.

Transplant

Luckily, this megabyte doesn’t contain any variable data, meaning we can just overwrite first 32K and see if that helps:

dd if=~/avc/SMT_H12AST2500_64M_30902_V.bin of=~/avc/cured.bin bs=256 count=128 conv=notrunc

Checking:

qemu-system-arm -M ast2500-evb,fmc-model=mx66l51235f -nographic -drive file=fwcured.bin,format=raw,if=mtd

U-Boot 2013.01 (Feb 12 2019 - 15:34:40)

DRAM:  496 MiB
Flash: 64 MiB
*** Warning - bad CRC, using default environment

In:    serial
Out:   serial
Err:   serial
COM: port1 and port2
PWM:   port[ABCDH]

<...>

Kernel trace log detected!
+ Dump FW analysis pack

starting pid 1123, tty '': '-/bin/sh'


BusyBox v1.23.1 (2019-12-16 13:49:06 CST) built-in shell (ash)
Enter 'help' for a list of built-in commands.

/ # 

Yippee ki-yay, motherboard-flicker!

Flashing!

With another tool we can write the healed firmware back to the broken BMC.

badboi:~$ /tmp/socflash_x64 ./cured.bin 

Phew! The BMC rebooted, and fully operational.

Necromancer’s footnote

Yes, I could just overwrite the broken firmware with an original vendor firmware - but that means I’d lose all configs on that BMC, and where’s the sport in that? After the flashing, the vulnerability is gone, so I can’t do anything like this anymore. But that’s not needed, for now.

The real pickle is this: you can’t do a full BMC dump unless it’s bricked, so you can’t actually have a backup.
Yes, you could backup the configs before the fw upgrade (and you honestly should), but hey, I’ve written a dozen papers on the backup importance and still trusted the vendor’s tool too much.

Probably, the upgrade would’ve went smoother if I’ve done it with a web interface - but I find them usually to be more broken than an ol’ reliable cli tools.

That’s all for now, thank you for coming to my TED talk.
Cya

Categories:

Updated: