README



Here we run some attacks to try to reproduce the same traces in the logic
analyser as an execution of the code in the BRAM.

To run the attacks from gdb, run the following commands (with openOCD running):

$ gdb-multiarch -ex "set architecture armv7" -ex "target extended-remote localhost:3333" --command=exec_from_bram.gdb
$ gdb-multiarch -ex "set architecture armv7" -ex "target extended-remote localhost:3333" --command=attack_0.gdb
[...]

To observe the traces in the logic analyser, run the following commands (there
is no need to run the attacks first as we saved the data from the logic
analyser):

$ make -C ./hw/logic_analyzer/decode clean
$ make -C ./hw/logic_analyzer/decode


In the attacks, we read in the BRAM to have the same traces as a fetch.
Also, we add indirect branches and re-configure CoreSight and the MMU to try to
have the same "trace_data" in the TPIU.

When we run the attacks from gdb, we use openOCD.

0)
The first step is to proceed to a fetch and execution of the code in the BRAM to
save the signals for a future comparison. It is recommended to hard-reset the
board between each try so that we get a determistic behaviour for CoreSight.

1)
In the first attack, we show that we must use 9 registers to read 8 times 32
bits from the BRAM. Otherwise, this does not reproduces the same signals.

2)
In the second attack, we show that, if we "add r10, pc, #16" and "mov pc, r10"
after each read in the BRAM, this introduces a delay between the reads. Hence a
mismatch with an execution of the code in the BRAM.

3)
In the third attack, we show that, if we prepare several registers with the
destination addresses first, then only "mov pc, [rx]" after each read in the
BRAM, we obtain the same traces for a read access as a fetch (there is a limit
since the number of registers is fixed by the architecture).

We have a mismatch on the "trace_data" in the TPIU. This is because we do not
have the same destination addresses and, if we configure the MMU to ouput the
same addresses, we will have a match.

4)
In the fourth attack, we reconfigure the MMU and we re-run the the code in the
BRAM with the same destination addresses as the third attack.
Once again, it is recommended to hard-reset the board between each try so that
we get a determistic behaviour for CoreSight.

Note: the third attack might need to be run several times to match the signals
from a fetch. But eventually, we have a match: this means that the hardware
monitor can be tricked.

The solution is to have enough indirect branches in the code in the BRAM so that
an attacker runs out of registers to prepare the attack.

ARM v7 microprocessors have thirteen general-purpose 32-bit registers, R0 to
R12; plus three 32-bit registers with special uses, SP, LR, and PC. (see ARM v7
architecture reference manual)

The attacker cannot use PC as a register for the attack, but (s)he can use SP
and LR. So, 15 registers are available.
The attacker must use 9 registers to read in the BRAM and simulate a fetch.
Then, the attacker can only prepare 6 indirect branches before running the
attack.

So, to defend ourselves, we must have a code that contains at least 7 packs of 8
words of code, each with an indirect branch.
This forces the attacker to re-create the desination address after the 6th read
and introduces a delay just like in the first attack.