Intro to Cryptanalysis

Posted by Ya Xiao

In this blog, I will briefly introduce cryptanalysis and learn about one kind of cryptanalysis method which is called side channel attacks.

Cryptography and cryptanalysis are the two major branches of cryptology. Cryptanalysis refers to the study of analyzing information systems in order to study the hidden aspects of the systems. While cryptography try to build more secure communication systems, the objective of cryptanalysis is to evaluate the security level of certain cipher algorithms, as well as find the weakness in the design of them. Thus, cryptography and cryptanalysis are antagonistic and mutually reinforcing.

In the branch of cryptanalysis, there are lots of method used to attack cipher algorithms, which can be classified as two categories. One is the mathematical cryptanalysis, the other is the main topic of this blog, side channel cryptanalysis/attacks. (I prefer the word “attacks” because these attacks can be actually conducted while the mathematical cryptanalysis is just about the theoretical analysis of the complexity of successful attack.)

Branches of crypology

Figure 1. Branches of Crypotology

The mathematical cryptanalysis is the more traditional work in this field, which considers that the cryptographic device is an abstract machine and target primarily the weaknesses of the cryptographic algorithm by taking advantage of the input and output data. In contrast, side channel attacks exploit the fact that the cryptographic device itself leaks physical information during the processing of a cryptographic algorithm which can be measured externally. These measurements (e.g. power consumption, electromagnetic emanation …) can then be used to compromise secret keys of cryptographic algorithms by some statistical methods.

Intro to Mathematical Cryptanalysis

First of all, I want to introduce one kind of the mathematical cryptanalysis so that you can gain the basic understanding about how the cipher algorithm will be broken by mathematical cryptanalysis.

Among numerous methods applied to cryptanalysis, differential analysis and linear analysis are two methods most extensively used. Therefore, I want to show you the basic principles of differential analysis on DES.

Differential cryptanalysis is a chosen paintext attack on cipher algorithm. It analyzes the effect of particular differences in plaintext pairs on the differences of the resultant ciphertext pairs.

Some Terminology

First, I want to begin with some terminology. (Since this blog post just aims to understand brief rationale of this method, I will not focus on the usage of rigorous mathematical expression.)

Difference

For two binary streams $Y$ and $Y’$, $Y \oplus Y’$ can be regarded as their difference.

Differential characteristic

In the iterated cipher algorithm DES, Figure 2 shows its structure. For two inputs (plaintexts) $L_0R_0$ and $L_0’R_0’$, they will generate a sequence of differences in every round. This sequence of differences is called differential characteristic.

Differential Characteristic

Figure 2. Differential Characteristic

Right Pair and Wrong Pair

Since differential caracteristic appear with probability (the probability is not 1 mostly), it means that if the fixed plaintext pair which satisfy the input difference is given, there is possible for both situations that the following differences satisfy the characteristics or not. If the sequence of differences the plaintext pair actually generate is the same with the characteristic, this plaintext pair will be called a right pair. Otherwise, it is a wrong pair.

Difference Distribution Table (DDT)

If we fix input difference of one round, what will be the output difference? We can easily understand that for linear part of this round, the output difference is determined by the input differnce. However, for the non-linear operation, the relationship between the input difference and output difference is undefined. To find out this relationship, we can build the DDT to describe this relationship.

Round Function

Figure 3. Round Function

Obviously, the S boxes is the only part we need to analyze for their relationship between input differences and output differnces. Thanks to the short length of each S box, we can just exhuast every possible situation following these steps:

enumerate every possible value (totally $2^6$ different values)as the fixed input difference, for every value, go step 2
exhaust every possible combination of two input ($2^6$), calculate the correspinding outputs and the difference of the two output

For example, we can obtain that in the first S box $S_1$, given the fixed input difference $0x110100$, different situation can be summerized as follows:

Possible Inputs with Fixed Input Difference

Figure 4. Possible Inputs with Fixed Input Difference

The right column possible input just shows one input value of the input pair (the other is determined by this one input and input difference).

The probability information also can be organized as a difference distribution table, which summarizes the probabilities of every output differece derived from the fixed input difference. Observing this table, we can find that there is some input difference-output difference combination with relatively high probablity, which make the attack feasible.

Key recovery attack in one round

For one round encryption, if we know the input pair of j-th Sbox before Key addition as $(E_j,E_j’)$, the output differential of S boxes as $\Delta{C_j}$, the possible values of partial key bits $K_j$. (In this part, we omit the E extension and P permutation operation because it is the linear part so you can just calculate the difference value after this operation)

Since all the possible combination of input differences and output differences of j-th Sbox can been showed in the DDT, if we know the value of $\Delta{C_j}$, we can find all $(B_j,B_j’)$ (inputs of Sbox afte key addition) which have the possibility to generate $\Delta{C_j}$. If we also have the knowledge of $E_j,E_j’$, the partial key bits used before this S box can be calculated as $K_j=E_j\oplus B_j$. we donote this process as $test_j(E_j,E_j’,\Delta{C_j})$. It reveals that the knowledge of $E_j,E_j’,\Delta{C_j}$ can help you get some information of key bits.

Since the differential analysis on 16-round DES contain lots of details need to be considered in the whole attack, to simplify it, this blog will only focus on a simpler instance of differential attack on the reduced round DES to show the basic rationale of this kind of cryptanalysis.

Differential Analysis on 5-round DES

3-round differential characteristic

Attack Path of 5-round DES Attack

Figure 5. Differential Attact Path of 5-round DES

From the DDT, this 3-round high probability characteristic can be easily found.
The attack path looks like the Figure 5.

Attact Process

The attack steps can be described as:

Construct plaintext pairs which satisfy difference $L_0R_0$
For every plaintext pairs, input them into encryption algorithm, and obtain their ciphertext differences.
For the last round, calculate all possible round key $K$ suggested by this paintext pair following $test(E,E’,\Delta C)$. In detail, the $(E,E’)$ can be known from the ciphertext pairs $(L_5R_5,L_5’R_5’)$.The output difference of round function F in last round can be calculated as $\Delta{L_5}\oplus\Delta{R_3}$.
Construct a counter for all values of the partial key bits we try to recover with the initial values as 0. When the value of partial key bits are suggested in step 3, add 1 to its counter. As we know, for right pairs, the suggested key values generated in step 3 will do contain the right key value. For wrong pairs, the suggested key values can be seen as being chosen randomly. Since the right pairs appear with the possibility of 1/16, the possibility of counter increment of right key value is also 1/16, which is significantly higher than the random possiblity of counter increment of other values. Consequently, the advantage of right value of the partial key bits can be distinguished from other values when there are enough chosen paintext pairs.

Side Channel Cryptanalysis

In the above, we know that traditional cryptanalysis leverages mathmatical feature (differential characteristic with high probability in the case we give above) to distinguish right subkey from wrong values. However, in side channel cryptanalysis, the information leaked by the cryptographic device is noticed, which is called side channel information(such as time and power consuming). When side channel information is applied to cryptanalysis, the significant threat is throwed to the security of cryptography. It even gives access to actually obtain the key, rather than just obtaining the complexity of theoretical attacks which we do in the mathematical cryptanalysis.

We can classify these attacks into two categories: passive attacks and active attacks. Passive attack refers to these attacks which observe the physical characteristics of the physical device but will not distrub the operation of the device. Timing attacks and Power Analysis Attacks are two common methods of passive attacks. Active attacks will control or tamper something of the cryptographic device, like fault attacks.

Power Analysis

Power analysis attacks exploit the fact that the instantaneous power consumption of the cryptographic device depends on the data processed and the operations performed by the device.

In a software implemention of AES in a 8051 microcontroller, the voltage during the encryption process can be measured:

Power Trace of AES

Figure 6. Power Trace of AES under Microprocessor

This vlotage curve can be regared as the power trace. From this figure, it apperantly showed the close relationship between power consuption and operation. We can recognize every rounds of AES by the 10 negtive peaks around every 0.4 ms. The last round seems shorter because there isn’t mixcolumn operation in last round.

Simple Power Analysis(SPA) attacks

Different operation and different data of the internal state will lead to different height and shape of the peaks. Sometimes, the observation results of one power trace can directly come to the key values, which is called simple power analysis.

This is a instance of exploiting SPA to attack RSA:
When we write the exponent(key) as binary string, the exponentiation computation of RSA can be processed as if the exponent bit is 1, there is a square and mutiplication operation while there is only square operation when the exponent bit is 0. Given the knowledge that the power comsuption of square operation is higher than muliplication operation we can obtain the key bits directly from the power trace.

Power Trace of RSA under Microprocessor

Figure 7. Power Trace of RSA under Microprocessor

Figure 7 is the power trace of RSA under microprocessor, we can obtain the types of operation as:

SMSMSMSSSSSSSSSMSMSMSMSSSSSSSSSSSSSMSMSMSMSM

S denotes square operation and M denotes multiplication.
As we know, SM appears when the key bit is 1 which S appears when key bit is 0.
Therefore, we can obtain the key string above the plot.

Differential Power Analysis(DPA) attacks

Compared with simple power analysis, differential power anslysis is more popular due to its ability of attacking without the detail knowledge of the device.

I will show you one instance of DPA on DES.

For the DES algorithm, the DPA selection function $D(C,b,K_s)$ is defined as computing the value of bit b ($0\leq b< 32$) of the DES intermediate $L_15$ for ciphertext C, where the 6 key bits entering the S box corresponding to bit $b$ are represented by $0\leq K_s < 2^6$. Note that if K_s is incorrect, evaluating $D(C,b,K_s)$ will yield the correct value for bit $b$ with probability $P\approx \frac{1}{2}$ for each ciphertext.

To implement the DPA attack, an attacker following these steps:

Encrypt $m$ plaintexts to get their corresponding ciphertexts $$C_{1 \cdots m}$$and captures power traces denoted as $$T_{1 \cdots m}[1 \cdots k]$$ containing $k$ samples each.
For each ciphertext $C_i$, try a possible $K_s$ to calculate corresponding $D(C,b,K_s)$. If $D(C,b,K_s)=0$, put this power trace in set 1. Otherwise, put this power trace in set 2.
When the m power traces are seperated into 2 sets, calculate the average of the traces for every set. Then, calculate the difference of the two average traces denoted as $\Delta_D[1\cdots k]$.
Go back to step 2, change the value of $K_s$ and repeat step 2 and step 3 until exhuasing all values of $K_s$.
Show all the $\Delta_D[1\cdots k]$, the trace with relatively high peaks is the trace under right $K_s$.

when the target bit $b$ is changed, the other fraction of K can be deduced by the same steps.

DPA Traces

Figure 8. DPA Traces

This figure is generated using 1000 samples($m=1000$)
In the above process, we can analyze that if $K_s$ is incorrect, the bit computed using $D$ will differ from the actual target bit for about half of the ciphertexts $C_i$. The selection function $D(C_i,b,K_s)$ is thus effectively uncorrelated to what was actually computed by the target device. If $K_s$ is correct, however, the computed value for $D(C_i,b,K_s)$ will equal the actual value of target bit $b$. The selection function is thus correlated to the value of the bit manipulated in the 16th round. As a result, the $\Delta_D[j]$ approaches the effect of the target bit on the power consumption as $m\rightarrow \infty$. Other data values, measurement errors, are not correlated to D approach zero. Therefore, the plot of $\Delta_D$ will be flat with spikes in regions where bit $b$ is processed. Obviously, the second curve is under the right key.

In the above instance, the relationship of the data processed and the power comsuption is described as bit model, which simply regards that the power comsuption of processing 0 is different from processing 1. However, this model to evaluate power comsuption is ineffective in some cases of implemention. So, there are many different process to achieve DPA attack.
In general, its process can be summerised as:

Select an intermediate value of the algorithm as target value.
- This intermediate value must be a function $f(d,k)$, where d is known variables can be calculated from plaintexts or ciphertexts, k is a fraction of Key.
Measure the actual power comsuption
- In this step, some $d$ values are selected to conduct the encryption of decryption to obtain some power traces. these values of $d$ can be denoted as a vector $\textbf{d} = (d_1,d_2,\cdots ,d_D)’$, where $d_i$ denote the i-th encryption or decryption of d value. Its power trace is denoted as $$\textbf{t_i'}=(t_{i,1} ,\cdots ,t_{i,T})$$ For every $d_i$, there is a power trace. Thus, the D power traces can be recorded as a $D \times T$ matrix $\textbf{T}$
Calculate the supposed value of the target interstate under the key hypotheses.
- This step need to exhaust every possible $k$ value. All the hypothetiacal k values are denoted as a vector $\textbf{k}=(k_1,\cdots ,k_K)$, where $K$ is the number of all possible values of $k$. Given the $\textbf{d}$ and $\textbf{k}$, attackers can easily calculate the hypothetical intermediate value as a $D \times K$ matrix $\textbf{V}$. The j-th column of $\textbf{V}$ is the hypothetical intermediate values under the hypothetical key $k_j$
Select proper model to map the supposed values of step 3 into hypothetical power consumption value.
- This step aims to map hypothetical intermediate values matrix $\textbf{V}$ to power comsuption value matrix $\textbf{H}$
- The common used simulation models include hamming distance model and hamming weight model
  - Hamming distance model is often used to simulated the power comsuption of bus and register. It regards that the power comsuption of changing data on bus from $v_0$ to $v_1$ is proportional to $HD(v_0, v_1) = HW(v_0 \oplus v_1)$.
  - Hamming weight model is often used when the hamming distance is not accessible. It regards that the power comsuption is proportional to the hamming weight of the processed data.
Compare these hypothetical power consumption values and actual power trace to find one hypothetical power consumption of the most relevant to actual power trace.
- This step will compare every column of matrix $\textbf{H}$ $\textbf{h}_i$ with every column of matrix $\textbf{T}$ $\textbf{t}_j$, that is, compare every hypothetical power comsuption under the corresponding hypothetical key with power traces on every position. There are also some method to compare the two curves. One of them is calculatiing their correlation coefficients and select the result with highest correlation coefficient.

Fault Attacks

In the fault attacks, attackers introduce external factors to inject the fault, so as to achieve the attack effect.
There are some assumptions of the ability of attackers:
The attackers can change some intermediate value in the algorithm by the fault.
The attackers can obtain the right ciphertexts and the wrong ciphertexts under the influence of the fault.

There are numerous ways to inject fault, however, to control the accuracy is difficult.

There is an instance of fault attacks on AES:

Fault Model

In this attacks, it deal with random faults, in the sense that the faulty value is random and uniformaly distributed. Then, it assumes that attackers can inject a fault to change one byte of a certain column before the mixcolumn in the 9th round of AES.

Spread of Fault

Figure 9. Spread of Fault

Figure 9 shows the spread of the fault. The shadow box represents the byte changed in 9th round. Then, after the mixcolumn operation, there are 4 bytes changed. As a result, the ciphertext bytes $C_0$, $C_7$,$C_{11}$,$C_{13}$ have been changed, where $C_i$ denotes the (i+1)-th byte of ciphertext.

Attack process

The attack process can be summerized as follows:

1.Count $255\times 4$ (4 possible bytes, every with 255 possible differences)all possible differences of the faulty column in 9-th round. And then computed the spread after mixcolumn. Store all the 1020 possible column value after mixcolumn in a list $D$
2.If the bytes $i$,$j$,$k$,$l$ denote the 4 faulty bytes in ciphertext. The corresponding key bytes are $K_i^r$, $K_j^r$, $K_k^r$, and $K_l^r$. Attacker can exhaust the value of the four bytes subkey and calculate(Where $C’$ is the faulty ciphertext while $C$ is its corresponding right ciphertext):$$\delta_i = SB^{-1}(C_i \oplus K_i^r)\oplus SB^{-1}(C_i’ \oplus K_i^r)$$

$$\delta_j = SB^{-1}(C_j \oplus K_j^r)\oplus SB^{-1}(C_j’ \oplus K_j^r)$$

$$\delta_k = SB^{-1}(C_k \oplus K_k^r)\oplus SB^{-1}(C_k’ \oplus K_k^r)$$

$$\delta_l = SB^{-1}(C_l \oplus K_l^r)\oplus SB^{-1}(C_l’ \oplus K_l^r)$$

3.match the $(\delta_i,\delta_j,\delta_k,\delta_l)$ with the list $D$, and then store the matched value of $(K_i^r,K_j^r,K_k^r,K_l^r)$ in List $L$.

According to the experient, every ciphertext pair $(C,C’)$ will generated around 1036 elements in List $L$ for 4 bytes subkey. When the second fault is injected and conducted like the above steps, the only one right subkey of these 4 bytes can be determined with probability 98%. If we change the column where fault is located, we can obtain other 4 byte subkeys. Consequently, the last round key can be recovered through 8 times fault injection.

CS/ECE 5584: Network Security, Fall 2017, Ning Zhang