Post

I Corrupted 5G Data Without Touching the Key

I Corrupted 5G Data Without Touching the Key

It was 1 AM on a Tuesday and I had a wireless security project due in three weeks. I opened the 5G standard specification on my second monitor, the one with the dead pixel in the corner, and started scrolling. The plan was to skim enough to sound credible, find a diagram to copy, write something about AES being unbreakable, and call it done. Graduate school has a way of turning ambitious ideas into survival exercises.

Four hours later I was still reading. I had found something that did not sit right, and I could not move on until I understood why it was allowed.

The 5G standard uses strong encryption. The session key is derived through a chain of cryptographic operations rooted in a secret stored on your SIM card. The algorithms that generate the keystream, whether SNOW 3G or AES in counter mode, are considered secure. No known attack breaks them in practice. And yet I could sit between two vehicles communicating over 5G, reach into an encrypted packet flying through the air, flip a specific bit, and the receiving vehicle would decrypt it, run its integrity check, and hand the application a corrupted number. Wrong in a specific and predictable way. No error. No alert. Clean pass.

I spent the next two weeks building this on a real open-source 5G stack using real highway driving data. This post is everything I learned, explained from the first principle up. You need to know what a bit is. That is all.

One thing to keep in your head through the whole post: vehicle A drives down a highway and broadcasts its position, velocity, and acceleration over 5G to vehicle B behind it. Vehicle B uses that data to maintain a safe following distance automatically, adjusting its speed based on what A reports. This system is called Cooperative Adaptive Cruise Control, CACC for short. It is one of the core applications 5G was designed to support, and it is exactly the kind of system where corrupted data does not trigger an error message. It triggers a collision.

Who the Attacker Is

Before any physics or cryptography, you need a precise picture of what the attacker can and cannot do. The attack only makes sense once you understand the constraints.

The attacker positions themselves between vehicle A and the base station, or between the base station and vehicle B. They have radio hardware capable of receiving the 5G signal, processing it down to the raw bytes at the encryption layer, modifying specific bytes, re-encoding everything, and retransmitting it in time for vehicle B to receive it as if it came from vehicle A directly. This class of attack is called a man-in-the-middle.

The attacker can intercept the physical-layer radio signal. They can decode it down to the encrypted PDCP payload. They can flip any bits they want in that payload. They can retransmit the modified packet.

The attacker cannot decrypt the ciphertext. The key that encrypts the data was never sent over the air. It was derived independently on both sides using a shared secret stored on the SIM card, and the derivation process involves cryptographic operations that cannot be reversed without knowing that secret. So the attacker sees bytes that look random. They cannot tell which bytes correspond to the acceleration field or the velocity field by reading the ciphertext.

The critical point, and the entire premise of what follows: the attacker does not need to read the plaintext to corrupt it in a targeted way. They only need to know the format of the message. Format is public. Format is in the protocol specification. The acceleration field always lives at the same byte offset, in the same 4-byte IEEE 754 floating-point structure, with the same bit layout. Flipping one specific bit in that structure doubles the value regardless of what the value is. The attacker does not need to know the current acceleration reading. They only need to know where it lives and what flipping one bit does to how floating-point numbers work.

Previous attacks of this type on LTE and early 5G required the attacker to know the plaintext. If you wanted to redirect a DNS reply to a malicious IP address, you needed to know the original IP address so you could compute exactly which bits to flip. This attack does not require that. Format knowledge alone is sufficient for numerical sensor data, and format knowledge is public information in any standardized protocol.

How a Radio Signal Travels and Why It Makes 5G Complex

To understand why 5G is built the way it is, and why the security layer sits at the specific layer it does, you need to understand what happens to a radio signal between the moment it leaves an antenna and the moment it arrives somewhere else. This is not background reading. It directly determines the structure of the attack.

Radio waves are electromagnetic waves. They propagate at the speed of light, roughly 300,000 kilometers per second, and they spread outward in all directions from the transmitting antenna. As they spread, the power per unit area decreases. If you double the distance from the transmitter, the power density at the receiver drops by a factor of four. This is the inverse square law. It is why radio signals get weaker with distance, and it is completely unavoidable.

In perfectly empty space with nothing in the way, the relationship between transmitted power $P_t$ and received power $P_r$ at distance $d$ is called the Friis transmission equation:

\[P_r = P_t \cdot G_t \cdot G_r \cdot \left(\frac{\lambda}{4\pi d}\right)^2\]

Let me walk through each piece. $G_t$ is the gain of the transmitting antenna. Gain here does not mean amplification. It means how much the antenna focuses energy in one direction compared to a theoretical antenna that spreads power perfectly equally in all directions. A directional antenna pointed at you has high gain from your perspective. $G_r$ is the same concept for the receiving antenna. $\lambda$ is the wavelength of the signal, which is the physical distance between two consecutive peaks of the electromagnetic wave. You calculate it as the speed of light divided by the frequency. At 3.5 GHz, which is the main mid-band frequency used in 5G deployments, $\lambda = 3 \times 10^8 / 3.5 \times 10^9 \approx 0.086$ meters, about 8.6 centimeters. At 28 GHz for millimeter-wave 5G, $\lambda \approx 1.07$ centimeters.

The $(\lambda / 4\pi d)^2$ term is the free-space path loss factor. It gets smaller as distance $d$ grows, meaning more power is lost. It also gets smaller as $\lambda$ decreases, meaning higher frequency signals lose more power over the same distance. This is why millimeter-wave 5G cells cover only a few hundred meters and require many more base stations per square kilometer than sub-6 GHz cells.

Before going further, you need to understand decibels because they appear everywhere in wireless engineering. A decibel is a way of expressing a ratio using logarithms. The formula is $\text{dB} = 10 \log_{10}(P_1 / P_2)$. If power $P_1$ is ten times $P_2$, then $P_1$ is 10 dB greater. If $P_1$ is one hundred times $P_2$, it is 20 dB greater. If $P_1$ is one thousand times $P_2$, it is 30 dB greater. The pattern: each factor of ten adds 10 dB. The reason engineers use this scale is that multiplication in linear scale becomes addition in log scale. Path loss over a long link involves multiplying many small factors together. In dB, you just add them.

dBm means decibels relative to one milliwatt. 0 dBm is 1 milliwatt. 30 dBm is 1 watt. -90 dBm is $10^{-12}$ watts, one picowatt, which is a typical received signal level for 5G at the edge of a cell.

Back to the problem. Highways are not empty space. There are other vehicles, overpasses, guard rails, buildings, and varying terrain. In real environments the path loss does not follow the clean inverse square law. It follows a steeper empirical model:

\[P_r(d)_{\mathrm{dBm}} = P_{t,\mathrm{dBm}} + P_L(d_0)_{\mathrm{dB}} - 10n \log_{10}\!\left(\frac{d}{d_0}\right)\]

$d_0$ is a reference distance you measure from, typically 100 meters for urban microcells. $P_L(d_0)$ is the path loss at that reference distance, computed from the Friis equation. The last term is the additional loss as you move beyond $d_0$. The exponent $n$ controls how steep the loss is. In free space, $n = 2$. In urban macrocells with heavy building obstruction, $n$ ranges from 2.7 to 3.5. In a street with a base station at road level and a clear line of sight down the street, $n$ can drop to 1.8 because the buildings and road surface act like a waveguide that channels energy along the street. On top of this average path loss, buildings and hills cause slow random variations called shadowing. When a large building sits between the transmitter and receiver, it absorbs and scatters energy and the received power drops below what the average model predicts. When the path is clear, it rises above the average. These variations follow a log-normal distribution, meaning in dB they look like a bell curve centered at zero. Typical standard deviations for shadowing are 4 to 10 dB depending on the environment.

For the attacker, this matters because their interception requires a clean signal. An attacker parked behind a concrete building relative to vehicle A sits in a shadow. The signal they receive may not be strong enough to decode reliably. Physical positioning is a real practical constraint on the attack.

image

The Signal That Arrives Is Not the Signal That Was Sent

Even when the attacker is well-positioned and receives the signal cleanly, the signal that arrives is not the signal that left vehicle A. It arrives as a superposition of dozens of copies of itself, each having bounced off something different.

When vehicle A’s antenna transmits, the radio wave does not travel only in a straight line to the base station. It scatters off buildings, diffracts over highway overpasses, reflects off other vehicles, and arrives at the receiver through many independent paths simultaneously. Each path has a different length, so each copy arrives at a slightly different time. Each copy has a different amplitude because it traveled a different distance and bounced off surfaces with different absorption characteristics. Each copy has a different phase, because the wave’s oscillation continues during propagation and a longer path means more oscillations completed before arrival. This phenomenon is called multipath propagation.

The received signal $r(t)$ at any moment is the sum of all these arriving copies:

\[r(t) = \mathrm{Re}\left\{ \sum_{n=0}^{N(t)} \alpha_n(t)\, u\!\left(t - \tau_n(t)\right) e^{-j2\pi f_c \tau_n(t) + j\phi_{D_n}(t)} \right\}\]

This equation looks dense but each piece has a physical meaning. $u(t)$ is the original transmitted signal in its baseband form, meaning the signal before it gets shifted up to the 3.5 GHz carrier frequency. Think of it as the data waveform itself. The $e^{-j2\pi f_c \tau_n(t)}$ term is a phase rotation caused by the propagation delay $\tau_n(t)$ of path $n$. To understand what phase rotation means: imagine the transmitted signal is a sine wave oscillating at 3.5 billion cycles per second. By the time a copy of this wave arrives after traveling path $n$, the original transmitter has continued oscillating. The copy is now out of step with where the transmitter currently is. How out of step depends on the path length. A path length difference of exactly one wavelength (8.6 cm at 3.5 GHz) means the copy is exactly back in step, a full rotation of $2\pi$ radians. A path length difference of exactly half a wavelength (4.3 cm) means the copy is exactly opposite in phase, a rotation of $\pi$ radians. The $\alpha_n(t)$ term is the amplitude of path $n$, which depends on how much power was lost along that path. The $e^{j\phi_{D_n}(t)}$ term is the Doppler phase shift, which we will get to shortly.

Now here is what matters. When two copies arrive with opposite phase (half-wavelength path difference), they partially or fully cancel each other. The received power drops. When they arrive in phase, they add constructively and the received power is higher than either copy alone. As vehicle A moves even a few centimeters, the lengths of all paths change. The phase relationships between the copies change continuously. The received power at vehicle B fluctuates rapidly as the geometry shifts. This rapid fluctuation is called small-scale fading.

image

To make this concrete: at 3.5 GHz, a path length difference of just 4.3 centimeters, roughly the width of your hand, shifts two copies by exactly 180 degrees and makes them destructively interfere. A vehicle moving at 30 km/h covers 4.3 cm in about 5.2 milliseconds. So the channel can go from constructive to destructive interference in under 6 milliseconds at urban speeds.

Doppler Shift and Why Moving Fast Makes the Channel Worse

The $e^{j\phi_{D_n}(t)}$ term in the multipath equation accounts for the Doppler effect. You already know the Doppler effect from sound: an ambulance siren sounds higher pitched when the vehicle approaches and lower pitched when it moves away. Radio waves experience the same thing. When vehicle A moves toward the base station, the radio waves it transmits get compressed slightly in the direction of motion. The receiver sees a slightly higher frequency than was transmitted. When vehicle A moves away, the waves stretch and the receiver sees a lower frequency.

The Doppler frequency shift for a single path is:

\[f_{D_n} = \frac{v \cos\theta_n}{\lambda}\]

$v$ is the relative velocity between the transmitter and receiver along the direction of the path. $\theta_n$ is the angle between the velocity vector and the direction of path $n$. $\lambda$ is the wavelength. At 3.5 GHz with a vehicle moving at 120 km/h (about 33 m/s) directly toward the base station ($\cos\theta = 1$), the Doppler shift is $33 / 0.086 \approx 384$ Hz. That sounds tiny compared to 3.5 GHz, but it matters enormously because it continuously changes the phase of every arriving path, which continuously changes how those paths interfere at the receiver.

The maximum Doppler shift sets something called the coherence time of the channel. Coherence time $T_c$ is approximately $1 / (4 f_{D,\text{max}})$. It is the time over which the channel stays approximately constant, meaning the phase relationships between multipath components do not change much. At 120 km/h on a 3.5 GHz carrier, $T_c \approx 1 / (4 \times 384) \approx 0.65$ milliseconds. Within any 0.65 ms window, the channel is approximately frozen. Across multiple such windows, it changes.

This directly constrains the attacker. After intercepting vehicle A’s transmission, the attacker needs to retransmit the modified packet. If they wait too long, the channel between the attacker and vehicle B has changed. The modified packet arrives through a different set of phase relationships than vehicle B’s receiver expects. This can expose the attack if vehicle B is doing coherent channel estimation, which modern 5G receivers do. The attacker has a window of roughly one coherence time to complete the interception and retransmission. At 120 km/h that window is under a millisecond. It is tight but achievable with properly designed software-defined radio hardware.

When All the Copies Arrive at Different Times: Delay Spread and ISI

The multiple propagation paths create another problem beyond just amplitude fluctuations. Each path has a different length, so each copy arrives at a different time. The earliest copy might be the direct line-of-sight path. Later copies are reflections off distant buildings. The time between the earliest and latest significant arriving copy is called the delay spread.

Formally, the rms delay spread $\sigma_{T_m}$ is:

\[\sigma_{T_m} = \sqrt{\frac{\int_0^\infty (\tau - \mu_{T_m})^2 A_c(\tau)\,d\tau}{\int_0^\infty A_c(\tau)\,d\tau}}\]

Break this down. $A_c(\tau)$ is the power delay profile: the average received power from paths arriving at delay $\tau$. Think of it as a bar chart where the horizontal axis is arrival time and the bar height is how much power arrived at that time. $\mu_{T_m}$ is the average delay, computed as the average of all arrival times weighted by their power. $\sigma_{T_m}$ is the standard deviation of those arrival times around the average. It quantifies the spread. For outdoor urban environments, $\sigma_{T_m}$ is typically around 1 to 5 microseconds. For suburban highway environments, it is in the range of 0.3 to 1 microsecond.

image

Now here is the problem delay spread creates. If you transmit symbols faster than $1/\sigma_{T_m}$, consecutive symbols start overlapping at the receiver. A copy of symbol $k$ arriving on a long path overlaps in time with symbol $k+1$ arriving on a short path. The receiver cannot tell which copy belongs to which symbol. This is called intersymbol interference, ISI. It corrupts data at the receiver before any decryption happens.

For a single-carrier system with $\sigma_{T_m} = 5$ microseconds, the maximum symbol rate before ISI becomes destructive is about $1/5\mu\text{s} = 200,000$ symbols per second, or 200 kbaud. Even at 8 bits per symbol (256-QAM), that is only 1.6 Mbps total. That is far too slow for 5G, which needs hundreds of megabits per second. There needs to be a fundamentally different approach to transmitting symbols through a channel with multipath delay spread. That approach is OFDM.

OFDM: How 5G Turns a Broken Channel Into Many Clean Ones

OFDM stands for Orthogonal Frequency Division Multiplexing. The name sounds complicated but the idea is precise. Instead of transmitting data on a single carrier frequency at a high symbol rate where ISI destroys everything, OFDM splits the available bandwidth into a large number of narrow sub-channels called subcarriers, and transmits one slow symbol on each subcarrier simultaneously. The ISI problem disappears because each subcarrier’s symbol period is long enough that multipath copies of previous symbols arrive before the next symbol begins.

To understand why this works, think about the math. If the total bandwidth available is $B$ Hz and you want to avoid ISI, you need your symbol period to be much longer than the delay spread. With a single carrier using all $B$ Hz of bandwidth, the symbol period is $1/B$, which might be shorter than the delay spread. But if you split $B$ into $N$ subcarriers each $\Delta f = B/N$ Hz wide, each subcarrier’s symbol period is $T_s = 1/\Delta f = N/B$. You can choose $N$ large enough that $T_s \gg \sigma_{T_m}$, and ISI becomes negligible on every subcarrier.

The mathematical form of one OFDM symbol in the time domain is:

\[s(kT) = \sum_{m=0}^{M-1} d_m\, e^{j2\pi m \Delta f\, kT}\]

Here $d_m$ is the complex data symbol on subcarrier $m$. It is a point from a QAM constellation, which we will explain shortly. $T$ is the sampling interval, $k$ is the sample index running from 0 to $N-1$, and $e^{j2\pi m \Delta f\, kT}$ is a complex exponential oscillating at frequency $m \Delta f$. The sum adds together $M$ sinusoids, one per subcarrier, each carrying a different data symbol.

If you look at the formula carefully, you will notice it matches the definition of the Inverse Discrete Fourier Transform. The IDFT takes a set of frequency-domain values and produces a time-domain signal. Here the subcarrier data symbols $d_m$ are the frequency-domain values, and $s(kT)$ is the time-domain OFDM symbol. This means the transmitter can generate the OFDM signal entirely in software using an IFFT algorithm, with no analog oscillators needed. The receiver reverses this with an FFT to extract the subcarrier data symbols.

image

The orthogonality condition is the key. Two subcarriers are orthogonal if their product integrates to zero over the symbol period. This happens when their frequency separation is exactly $\Delta f = 1/T_s$. This means even though the subcarrier spectra overlap in frequency, you can mathematically separate any one subcarrier from all others by correlating the received signal with that subcarrier’s frequency over one symbol period. The result is only that subcarrier’s data symbol, with zero contribution from all others.

image

The Cyclic Prefix: Turning Multipath Into Nothing

OFDM with long symbol periods reduces ISI but does not eliminate it. A multipath copy of the previous OFDM symbol can still extend into the current symbol’s DFT window and corrupt the orthogonality. The cyclic prefix, CP, eliminates this problem entirely.

Before transmitting each OFDM symbol, the transmitter takes the last $T_{cp}$ seconds of the symbol and copies them to the beginning. The result is a guard interval prepended to the symbol. $T_{cp}$ is chosen to be longer than the maximum expected delay spread. When multipath copies of the previous symbol arrive during the CP interval, they do not overlap with the meaningful part of the current symbol’s DFT window.

There is a subtlety worth understanding. Prepending the end of the symbol to the beginning makes the channel’s convolution with the OFDM signal circular rather than linear. In signal processing, a circular convolution in time corresponds to simple multiplication in frequency. This means each subcarrier at the receiver sees its transmitted data symbol multiplied by a single complex number (the channel’s frequency response at that subcarrier’s frequency), rather than a complicated mixture with neighboring symbols. The receiver just divides by that complex number, a one-multiplier operation per subcarrier, to recover the data.

The design rule for CP length: it should be 2 to 4 times the rms delay spread of the channel. The OFDM symbol period should be 5 to 6 times the CP length. The reason for the ratio: the CP carries no data, it is overhead. The fraction of capacity wasted on CP is $T_{cp} / (T_s + T_{cp})$. If the symbol period is only twice the CP, you waste 33% of capacity. At 5 to 6 times the CP, overhead drops to 14 to 17%. That is the practical balance between ISI suppression and spectral efficiency.

image

QAM: Packing Multiple Bits Into One Symbol

Each OFDM subcarrier carries one complex data symbol per symbol period. The question is how many bits that symbol can represent. The answer depends on the modulation order, and 5G uses adaptive modulation to choose the best order for each subcarrier based on channel quality.

A complex number has two components: real and imaginary. In signal processing terms, these correspond to the in-phase (I) and quadrature (Q) components of the transmitted waveform. Quadrature Amplitude Modulation, QAM, encodes data by choosing specific combinations of I and Q values, called constellation points.

In QPSK (Quadrature Phase Shift Keying), there are 4 constellation points arranged at equal angles. Each point encodes 2 bits. In 16-QAM, there are 16 points arranged in a 4x4 grid, encoding 4 bits per symbol. In 64-QAM, 64 points in an 8x8 grid, 6 bits per symbol. In 256-QAM, 256 points, 8 bits per symbol.

The tradeoff: more constellation points means more bits per symbol, but the points are closer together in the I-Q plane. The receiver has to correctly identify which point was sent. If the channel adds noise (and it always does), the received point may be displaced from the transmitted point. If the displacement is large enough, the receiver picks the wrong point and a bit error occurs. Higher-order QAM is more spectrally efficient but more sensitive to noise.

5G’s adaptive modulation engine measures the received signal-to-noise ratio (SNR) on each subcarrier every few milliseconds and picks the highest QAM order that keeps the bit error rate acceptable. A subcarrier in a good location with strong signal uses 256-QAM and carries 8 bits per symbol period. A subcarrier experiencing a fade uses QPSK and carries only 2 bits, but delivers those bits reliably. This per-subcarrier adaptation is one of the main reasons OFDM is efficient in frequency-selective fading channels.

image

Fading Statistics: Rayleigh, Rician, and Nakagami

When you look at the received signal power over time at a moving vehicle, it fluctuates. Sometimes it is strong, sometimes it drops by a factor of 100 or 1000 in a fraction of a millisecond. The statistical distribution of these fluctuations depends on the geometry of the environment.

In a dense urban environment with no direct line-of-sight path between transmitter and receiver, the received signal is the sum of many scattered copies arriving from all directions. Each copy has a random amplitude and phase. When you add together many random complex numbers, the Central Limit Theorem says the real and imaginary parts of their sum both approach Gaussian distributions with zero mean. The envelope of a zero-mean complex Gaussian signal follows the Rayleigh distribution:

\[p(r) = \frac{r}{\sigma^2}\exp\!\left(-\frac{r^2}{2\sigma^2}\right), \quad r \geq 0\]

Here $r$ is the signal envelope (the magnitude of the received complex signal), and $\sigma^2$ is the average power in one quadrature component. The Rayleigh distribution has no lower limit. Its fades can reach 20 to 30 dB below the average power, meaning the signal power drops by a factor of 100 to 1000 compared to the average. During a deep Rayleigh fade, the receiver cannot reliably decode any data.

On a highway with a clear line-of-sight path between vehicles, the picture changes. There is one dominant component traveling directly from transmitter to receiver, plus many scattered components. The dominant component has a fixed amplitude $A$ and the scattered components are random as before. The envelope follows a Rician distribution:

\[p(r) = \frac{r}{\sigma^2}\exp\!\left(-\frac{r^2 + A^2}{2\sigma^2}\right) I_0\!\left(\frac{rA}{\sigma^2}\right), \quad r \geq 0\]

$I_0$ is the modified Bessel function of the first kind of order zero. Do not worry about computing it; the important parameter is the K-factor, $K = A^2 / (2\sigma^2)$, which is the ratio of the power in the dominant component to the total power in the scattered components. A K-factor of 0 dB means the dominant and scattered components have equal power. A K-factor of 10 dB means the dominant path carries ten times more power than all scattered paths combined. Highway V2X links typically have K-factors between 3 and 10 dB. Higher K means shallower fades because the dominant path gives a stable floor of received power.

The Nakagami-m distribution is a more general model:

\[p(r) = \frac{2m^m r^{2m-1}}{\Gamma(m)\Omega^m} \exp\!\left(-\frac{m r^2}{\Omega}\right), \quad r \geq 0\]

$\Omega = E[r^2]$ is the average received power (the expected value of $r^2$). $m$ is a shape parameter you fit to measured data. When $m = 1$, this reduces exactly to the Rayleigh distribution. As $m$ increases, the fading gets shallower. As $m \to \infty$, the channel becomes constant with no fading at all. The Gamma function $\Gamma(m)$ in the denominator is a generalization of factorial to non-integer values; for a positive integer $n$, $\Gamma(n) = (n-1)!$. A Rician channel with K-factor $K$ corresponds approximately to Nakagami with $m = (K+1)^2 / (2K+1)$. The Nakagami model is convenient for analysis because many integrals involving it have closed-form solutions, which matters when you need to compute theoretical bit error rates.

image

How Long Does a Fade Last

Knowing the distribution of signal power tells you how deep fades get. Knowing the level crossing rate and average fade duration tells you how often they happen and how long they last. These are practical numbers for the attacker.

The level crossing rate $L_R$ is the expected number of times per second the received signal envelope crosses a threshold $R$ in the downward direction:

\[L_R = \sqrt{2\pi} \cdot f_{D,\text{max}} \cdot \rho \cdot e^{-\rho^2}\]

$f_{D,\text{max}}$ is the maximum Doppler frequency. $\rho = R / R_\text{rms}$ is the threshold normalized by the rms signal level. At $\rho = 0.1$ (10% of rms, a fairly deep fade), and $f_{D,\text{max}} = 384$ Hz (vehicle at 120 km/h), $L_R \approx \sqrt{2\pi} \cdot 384 \cdot 0.1 \cdot e^{-0.01} \approx 95$ crossings per second.

The average fade duration at threshold $R$ is:

\[\bar{t} = \frac{e^{\rho^2} - 1}{\sqrt{2\pi} \cdot f_{D,\text{max}} \cdot \rho}\]

At the same numbers, $\bar{t} \approx (e^{0.01} - 1) / (95) \approx 0.0001 / 95 \approx 0.1$ milliseconds. Fades at this depth happen about 95 times per second and last about 0.1 milliseconds each. A 5G packet at mid-band takes roughly 0.5 milliseconds to transmit. An attacker experiences brief channel outages frequently but they are short. Most packets are receivable.

When vehicle A transmits to the base station, that is an uplink transmission. When the base station transmits to vehicle B, that is downlink. Both need to happen, but using the same frequency at the same time for both directions would cause the transmitter and receiver to interfere with each other. 5G uses Time Division Duplex, TDD, in the mid-band spectrum. Uplink and downlink take turns using the same carrier frequency, switching between them according to a slot configuration defined in the 5G NR standard.

A slot in 5G NR contains 14 OFDM symbols. Slots are grouped into frames of 10 milliseconds each, with 10 subframes per frame. At 30 kHz subcarrier spacing (the most common for mid-band), one slot is 0.5 milliseconds. The TDD pattern specifies which slots carry downlink, which carry uplink, and which carry a guard period while hardware switches direction.

This matters for the attack because the NEA keystream generation function takes a DIRECTION bit as one of its inputs: 0 for uplink, 1 for downlink. If the attacker retransmits a packet that was originally uplink as if it were downlink, or with the wrong DIRECTION value, the receiving side generates a different keystream. After XOR decryption with the wrong keystream, the output is garbage and the receiver discards the packet immediately. The attacker must preserve the DIRECTION bit correctly, which means they must observe the TDD slot timing to know whether the packet they are intercepting is an uplink or downlink transmission.

The 5G Protocol Stack and Where Encryption Lives

Now that you understand how bits travel through the air, you can understand the layered structure that processes those bits before and after transmission. Every network communication system uses layers. Each layer handles one specific concern and hands the result to the layer above or below.

image

At the top is the Application layer. This is where the vehicle application serializes its acceleration, velocity, and position readings into bytes and passes them down.

The Transport layer adds the UDP header. UDP (User Datagram Protocol) is the protocol vehicle telemetry typically uses because it prioritizes low latency over guaranteed delivery. A late correction is often worse than no correction in a control loop. UDP adds an 8-byte header containing the source port number, destination port number, the total length of the UDP segment, and a checksum. The checksum is the error detection mechanism we will break. We will spend a lot of time on the checksum.

The Network layer adds the IP header, 20 bytes containing source and destination IP addresses, a time-to-live counter that prevents packets from circulating forever, and the protocol identifier telling the next layer what kind of payload it carries.

SDAP (Service Data Adaptation Protocol) is the first 5G-specific layer. Its job is to mark each packet with a QoS flow identifier that tells the network how urgently this packet needs to be delivered. Vehicle telemetry gets a latency-sensitive QoS class. A background file download gets a different one. The scheduler in the MAC layer uses these identifiers when deciding which user’s data to transmit in which time slot.

PDCP (Packet Data Convergence Protocol) is the security layer. This is where encryption and integrity protection happen. PDCP takes the full IP packet (headers and payload together), runs it through the NEA encryption algorithm, and produces a ciphertext of the same length. An eavesdropper who intercepts the packet at the RLC or MAC layer below PDCP sees only the ciphertext and cannot reconstruct the plaintext without the session key. The PDCP layer also adds a sequence number header, 2 bytes, from which the COUNT parameter used in keystream generation is derived.

RLC (Radio Link Control) handles segmentation. Large PDCP PDUs get split into smaller units that fit within the radio scheduler’s allocations. For latency-critical traffic like vehicle telemetry, RLC operates in unacknowledged mode: dropped packets are not retransmitted. Retransmission would introduce variable delay that the CACC control loop cannot tolerate.

MAC (Medium Access Control) is the scheduler. It decides which resource blocks in the OFDM time-frequency grid each vehicle occupies at each moment. The MAC layer reads the QoS identifiers from SDAP and allocates resources accordingly.

PHY (Physical layer) takes the bits from MAC and maps them through everything described in the previous sections: QAM modulation, OFDM symbol construction, IFFT, cyclic prefix insertion, digital-to-analog conversion, and upconversion to the carrier frequency.

The attack surface is at PDCP. The attacker intercepts the signal at the PHY layer, decodes upward through MAC and RLC, extracts the PDCP ciphertext, modifies two bytes, re-encodes downward through MAC and PHY, and retransmits. Vehicle B’s PHY receives and decodes, passes the ciphertext up through RLC to PDCP, which decrypts with the correct keystream and produces a modified plaintext that passes all checks.

How a Number Becomes Bits: IEEE 754 Floating Point

The vehicle’s acceleration sensor produces a number like 2.3 m/s². To transmit this number, the application must convert it to a fixed-length sequence of bits that can be packed into a byte array. The standard encoding for real numbers in almost every programming language and processor is IEEE 754 single-precision floating point, which uses exactly 32 bits.

Start from the beginning. A bit is the most basic unit of information in computing. It can be 0 or 1. Eight bits form a byte. A byte can represent 256 different values (0 through 255). Four bytes form 32 bits. With 32 bits you can represent $2^{32} = 4,294,967,296$ different values.

The question is: which 4,294,967,296 values should 32 bits represent? For integers from 0 to 4,294,967,295, the answer is straightforward: you just write the number in binary (base-2 counting). But for real numbers, you need fractions and negative numbers and a wide range of magnitudes. You cannot just write 2.3 in binary directly.

IEEE 754 borrows the idea of scientific notation. In decimal, you write large or small numbers as a coefficient times a power of ten: $2.3 \times 10^0$, or $0.00023 = 2.3 \times 10^{-4}$, or $230,000 = 2.3 \times 10^5$. The coefficient captures the significant digits and the exponent captures the scale. You can represent a huge range of magnitudes with only a few digits by shifting the decimal point.

IEEE 754 does this in base 2. Any non-zero number can be written as $1.\text{something} \times 2^n$, where the coefficient always starts with 1 before the binary point. The 32-bit single-precision format stores three fields:

1
2
3
Bit 31      Bits 30-23      Bits 22-0
[ sign ]    [ exponent ]    [ mantissa ]
  1 bit        8 bits          23 bits

The sign bit is 0 for positive numbers and 1 for negative numbers.

The 8-bit exponent field stores the power of 2, but with a trick. The actual exponent can be negative (for numbers smaller than 1) or positive (for numbers larger than 1). An 8-bit field normally stores values 0 to 255. To handle negative exponents, the stored value is the actual exponent plus 127. This offset is called the bias. So if the actual exponent is $+1$, you store $1 + 127 = 128$. If the actual exponent is $-3$, you store $-3 + 127 = 124$. To recover the actual exponent from the stored byte, subtract 127.

The 23-bit mantissa stores the fractional digits after the leading 1. Since the leading digit is always 1, you do not need to store it. The mantissa stores only the part after the binary point. This gives you effectively 24 bits of precision from 23 stored bits. This is called the hidden bit.

The full value encoded by a bit pattern is:

\[\text{value} = (-1)^{s} \times 1.\text{mantissa}_2 \times 2^{(\text{stored exponent} - 127)}\]

Let me walk through $2.0$. In binary scientific notation, $2 = 1.0 \times 2^1$. The sign is positive so $s = 0$. The actual exponent is 1, so the stored exponent is $1 + 127 = 128$. In 8-bit binary, 128 is $10000000$. The mantissa is the fractional part after the leading 1, which is $.0$, so all 23 mantissa bits are zero.

1
2
3
4
5
0  10000000  00000000000000000000000
   ^^^^^^^^  ^^^^^^^^^^^^^^^^^^^^^^^
   exponent       mantissa
^
sign

Verification: $(-1)^0 \times 1.0 \times 2^{128-127} = 1.0 \times 2^1 = 2.0$. Correct.

Now look at bit 23 counting from position 0 on the right. The exponent field occupies bits 30 down to 23. Bit 23 is the least significant bit of the exponent. Flipping bit 23 changes the stored exponent from $128 = 10000000_2$ to $129 = 10000001_2$. New value: $1.0 \times 2^{129-127} = 1.0 \times 2^2 = 4.0$. One bit flip. The number doubles.

This is not a coincidence specific to 2.0. Adding 1 to the exponent always doubles the value. Subtracting 1 always halves it. The least significant exponent bit is a factor-of-2 toggle for any nonzero number.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
import struct

# struct.pack('>f', f) converts the Python float f into 4 raw bytes,
# big-endian (most significant byte first, which matches how we read bits left to right).
# struct.unpack('>I', ...) reads those same 4 bytes back as an unsigned 32-bit integer.
# This lets us see and manipulate the raw bit pattern of a float.
def float_to_bits(f):
    return struct.unpack('>I', struct.pack('>f', f))[0]

def bits_to_float(b):
    return struct.unpack('>f', struct.pack('>I', b))[0]

original = float_to_bits(2.0)

# The expression (1 << 23) shifts the number 1 leftward by 23 bit positions.
# In binary: 00000000 10000000 00000000 00000000
# This creates a mask with exactly bit 23 set and all other bits zero.
# XOR (the ^ operator) flips a bit when the mask has a 1, and leaves it unchanged when the mask has a 0.
# So original ^ (1 << 23) flips only bit 23, leaving all 31 other bits exactly as they were.
flipped = original ^ (1 << 23)

print(f"2.0 in bits: {original:032b}")
print(f"4.0 in bits: {flipped:032b}")
print(f"decoded back: {bits_to_float(flipped)}")
1
2
3
2.0 in bits: 01000000000000000000000000000000
4.0 in bits: 01000000100000000000000000000000
decoded back: 4.0

The only bit that changed is bit 23. Everything else is identical.

Now connect this to the attack. The vehicle telemetry message contains three floats: position at byte offset 0, velocity at byte offset 4, acceleration at byte offset 8. After IP and UDP headers, the acceleration field starts at byte offset 36 from the beginning of the IP packet. Bit 23 of those 4 bytes, which is bit 7 of the third byte of the float (byte offset 38), is the least significant exponent bit. That bit is at a fixed, known, public byte and bit position in every telemetry packet. Flipping the corresponding bit in the ciphertext will, after decryption, double whatever acceleration was being transmitted.

image

XOR: The Operation That Makes Encryption Malleable

XOR is a bitwise operation that compares two bits and outputs 1 if they differ and 0 if they match. The truth table:

1
2
3
4
0 XOR 0 = 0
0 XOR 1 = 1
1 XOR 0 = 1
1 XOR 1 = 0

When applied to bytes or 32-bit integers, XOR operates independently on each bit position. For example:

1
2
3
4
  10110011  (179 in decimal)
  01100101  ( 99 in decimal)
XOR
  11010110  (214 in decimal)

Each output bit is determined only by the two input bits at that same position. Bit 0 of the output depends only on bit 0 of the two inputs. Bit 7 of the output depends only on bit 7 of the two inputs.

The property that makes XOR the foundation of symmetric encryption is self-inverse: $A \oplus B \oplus B = A$ for any values $A$ and $B$. You can verify this from the truth table: XOR-ing with 1 flips a bit, and XOR-ing with 1 again flips it back. XOR-ing with 0 leaves a bit unchanged. So XOR-ing with any value $B$ and then XOR-ing with the same $B$ again always returns you to $A$.

A stream cipher generates a long sequence of pseudorandom bits called the keystream $K$. The keystream looks completely random to anyone who does not know the key, but both the sender and receiver can regenerate the exact same sequence given the same key and the same initialization parameters. Encryption is $C = P \oplus K$: XOR the plaintext with the keystream to get ciphertext. Decryption is $C \oplus K = (P \oplus K) \oplus K = P \oplus (K \oplus K) = P \oplus 0 = P$: XOR the ciphertext with the same keystream and the keystream cancels out, leaving the plaintext.

Security comes from the keystream being indistinguishable from random without the key. An attacker who sees ciphertext $C = P \oplus K$ but does not know $K$ cannot compute $P$, because the ciphertext is the XOR of the plaintext and a pseudorandom sequence, and removing a pseudorandom sequence requires knowing it.

Now define a flip mask $\Delta$: a bit string with 1s at positions you want to flip and 0s everywhere else. The attacker takes the intercepted ciphertext $C$ and computes $C’ = C \oplus \Delta$. The receiver decrypts:

\[C' \oplus K = (C \oplus \Delta) \oplus K = (P \oplus K \oplus \Delta) \oplus K = P \oplus (K \oplus K) \oplus \Delta = P \oplus \Delta\]

The keystream $K$ cancels out entirely. The receiver decrypts $C’$ and gets $P \oplus \Delta$: the original plaintext with exactly the bits specified by $\Delta$ flipped. The attacker never touched the key. The attacker never decrypted anything. The modification to the plaintext is exact and predictable.

This is called malleability. A cipher is malleable if you can modify ciphertexts in a way that produces predictable modifications to the corresponding plaintexts without knowing the key. XOR-based encryption is malleable by construction, because the XOR operation passes through any additive modification directly.

This property holds for AES in counter mode (AES-CTR) for the same reason. AES-CTR uses AES to generate the keystream and then XORs it with the plaintext. The AES core is strong and the keystream is computationally indistinguishable from random, but the XOR step makes the ciphertext malleable regardless of how strong AES is. Malleability is a structural property of the XOR combination step, not a weakness of AES itself.

The formal defense against malleability is a Message Authentication Code (MAC): a short tag appended to each packet, computed from the key and the entire packet content, that the receiver verifies. Any modification to the ciphertext changes the tag. We will cover this when we get to NIA.

How 5G Generates the Keystream: NEA, SNOW 3G, AES-CTR, ZUC

The keystream for a given packet in 5G is generated by a function called NEA (NR Encryption Algorithm). NEA takes five inputs:

1
NEA(KEY, COUNT, BEARER, DIRECTION, LENGTH) -> KEYSTREAM

KEY is the 128-bit session encryption key $K_{UPenc}$, derived from the SIM card secret during authentication. COUNT is a 32-bit packet counter, incremented for each new packet. BEARER is a 5-bit number identifying which radio bearer (logical channel) this packet belongs to. DIRECTION is the 1-bit uplink/downlink flag. LENGTH is how many bits of keystream are needed to cover this packet.

The reason COUNT increments with every packet: if the same keystream were reused for two different packets, an attacker who intercepts both ciphertexts can XOR them together. The two keystreams cancel: $(P_1 \oplus K) \oplus (P_2 \oplus K) = P_1 \oplus P_2$. The result is the XOR of the two plaintexts. From the XOR of two plaintexts, an attacker who knows something about one plaintext (such as its format) can often recover both. Using a fresh keystream for every packet closes this attack. But freshness does not stop malleability: the flip mask $\Delta$ is a constant that works regardless of which keystream is used.

5G supports three cipher algorithms for NEA.

128-NEA1 uses SNOW 3G. SNOW 3G is a stream cipher built from two components: a 16-stage linear feedback shift register (LFSR) where each stage stores a 32-bit word, and a finite state machine (FSM) that mixes values from the LFSR with nonlinear operations. The LFSR advances using a feedback polynomial over $GF(2^{32})$, which is a mathematical structure (Galois Field) where arithmetic wraps around in a specific way. The FSM uses two 32-bit registers and applies S-boxes borrowed from AES. The LFSR provides statistical uniformity and long period; the FSM breaks the linear structure so algebraic attacks on the LFSR cannot predict the keystream. The combination produces 32 bits of keystream per clock cycle.

128-NEA2 uses AES in counter mode. AES is a block cipher operating on 128-bit blocks. It takes a 128-bit key and a 128-bit input and produces a 128-bit pseudorandom output through 10 rounds of SubBytes (byte substitution using a fixed lookup table called the S-box), ShiftRows (permuting bytes within each row of the 4x4 state matrix), MixColumns (linear mixing within each column), and AddRoundKey (XOR with a round-specific key derived from the main key). In counter mode, instead of encrypting actual data through AES, you encrypt a sequence of counter values and use the outputs as the keystream. Counter values are formed by packing COUNT, BEARER, DIRECTION, and padding into a 128-bit block, incrementing for each 128-bit chunk of keystream needed. Because each counter block is independent, keystream generation can run in parallel across many processor cores.

128-NEA3 uses ZUC, named after the fifth-century Chinese mathematician Zu Chongzhi who computed $\pi$ to seven decimal places using methods that were not matched in Europe for over a thousand years. ZUC uses a 16-stage LFSR over $GF(2^{31}-1)$, a bit reorganization step, and a nonlinear function $F$ structurally similar to SNOW 3G’s FSM. It was designed by the Chinese Academy of Sciences and included in 5G for regulatory reasons in jurisdictions requiring a domestic cipher.

All three produce keystreams that are computationally indistinguishable from random without the key. The malleability attack works against all three, because all three produce the keystream and then XOR it with the plaintext.

How the Session Key Gets Established Without Being Transmitted

You have seen $K_{UPenc}$ mentioned several times. Now you need to understand where it comes from, because understanding the key establishment is what confirms the attacker cannot compute the keystream. They know COUNT (visible in the PDCP header as the sequence number). They know BEARER and DIRECTION from the packet structure. The only thing blocking them is $K_{UPenc}$, which was never transmitted.

Your vehicle’s SIM card stores a 128-bit root key $K$, programmed by the mobile operator at the factory. The operator stores the same $K$ in their backend authentication server (called the UDM, Unified Data Management). $K$ never leaves the SIM card or the UDM.

When the vehicle connects to the network, the authentication procedure begins. The UDM generates a 128-bit random number called RAND. It also generates an Authentication Token (AUTN) that proves the challenge came from a legitimate network. Both are sent to the vehicle. The AUTN check prevents fake base stations from tricking the vehicle into authenticating to an attacker.

The vehicle’s SIM runs the Milenage algorithm, which is a set of five functions all built on AES-128, using $K$ and RAND as inputs. The outputs include a session cipher key $CK$, an integrity key $IK$, and a response $RES$ that proves to the network that the SIM holds the correct $K$. The UDM runs the same computation and generates the same $CK$, $IK$, and an expected response $XRES$. When the vehicle sends $RES$ back and it matches $XRES$, mutual authentication is complete. Neither side transmitted the actual keys.

From $CK$ and $IK$, a tree of key derivations produces increasingly specific keys using HMAC-SHA-256, which is a keyed hash function that takes a key and a message and produces a fixed-length output that cannot be reversed to find the input:

\[K \xrightarrow{\text{Milenage, RAND}} CK, IK \xrightarrow{\text{HMAC-SHA-256}} K_{AUSF} \to K_{SEAF} \to K_{AMF} \to K_{gNB} \to K_{UPenc}, K_{UPint}\]

$K_{UPenc}$ is used only for encryption. $K_{UPint}$ is used only for integrity protection. They are derived in parallel from $K_{gNB}$ and are independent. Compromising one does not expose the other.

The attacker cannot compute $K_{UPenc}$ from what they can observe. The COUNT in the PDCP header gives them the counter. The BEARER and DIRECTION are inferable from context. But $K_{UPenc}$ requires $K_{gNB}$, which requires $K_{AMF}$, which requires $K_{SEAF}$, which requires $K_{AUSF}$, which requires $CK$ and $IK$ from Milenage, which requires the root key $K$ from the SIM card. Each step in the chain is a one-way function. The chain is computationally unbreakable.

This is why the malleability attack matters: it bypasses the key entirely. The attacker does not compute the keystream. They do not need to. They add a constant to the ciphertext and the XOR structure propagates that constant through the decryption.

What the Packet Looks Like Before Encryption

The vehicle application serializes three 32-bit floats: position, velocity, acceleration. Concatenated, that is 12 bytes of payload. UDP wraps this in an 8-byte header. IP wraps the UDP segment in a 20-byte header. PDCP receives the full 40-byte IP packet and encrypts it.

The full structure before encryption:

1
2
3
4
5
6
7
8
Bytes 0-19:  IP header (source IP, destination IP, TTL, protocol=UDP, total length, ...)
Bytes 20-21: UDP source port
Bytes 22-23: UDP destination port
Bytes 24-25: UDP length
Bytes 26-27: UDP checksum      <-- the attacker touches this
Bytes 28-31: position (float)
Bytes 32-35: velocity (float)
Bytes 36-39: acceleration (float)  <-- the attacker touches bit 23 of these 4 bytes

Bit 23 of the acceleration float: counting from bit 0 at the rightmost position, bit 23 is the least significant bit of the exponent field. Within the 4 bytes at offsets 36-39, bits 30-23 are the exponent. Bit 23 occupies bit position 7 of byte 38 (the third byte of the float).

The CACC context: vehicle B receives this packet 100 times per second and feeds the acceleration value into a Kalman filter that tracks vehicle A’s state. A Kalman filter is an algorithm that maintains a probabilistic estimate of an object’s position and velocity and updates it with each new measurement. When acceleration reads 4.0 instead of 2.0, the filter updates differently than it should. Over many consecutive corrupted packets, the filter converges to a systematically wrong trajectory prediction. Vehicle B may brake earlier than necessary, or maintain insufficient gap under the wrong assumption that vehicle A is decelerating more aggressively than it is.

The UDP Checksum: How It Works and How to Break It

The UDP checksum is the last line of defense between the attacker’s ciphertext modification and a corrupted value reaching the application. Understanding it precisely is necessary to understand the bypass.

RFC 768, published in 1980, defines the UDP checksum. It uses one’s complement 16-bit arithmetic. Before explaining what one’s complement means, let me explain why checksums exist at all.

Radio channels introduce bit errors. Even after coding and modulation, there is a nonzero probability that a bit flips between transmission and reception. Higher layers of the protocol stack need a way to detect when a received packet has been corrupted in transit. The checksum is a simple error detection code: a number computed from the packet contents that the receiver recomputes and compares. If the packet arrives corrupted, the receiver’s recomputed value should differ from the sender’s stored value, flagging the error.

Normal binary addition: if you add two 16-bit numbers and the sum exceeds 65535 (more than 16 bits can hold), the carry bit falls off and the result wraps around. $60000 + 10000 = 70000$, which in 16-bit arithmetic wraps to $70000 - 65536 = 4464$. The carry bit is lost.

One’s complement addition differs in one way: the carry bit is not lost. Instead, it wraps around and gets added back to the least significant bit. This is the end-around carry. Let me show an example:

1
2
3
4
5
6
7
8
9
  1111000011110000  (61680 decimal)
+ 0000111100001111  ( 3855 decimal)
  ────────────────
 10000000011111111  (sum has 17 bits: bit 16 is the carry)
  ────────────────
  0000000011111111  (lower 16 bits = 255)
+                1  (the carry bit gets added back)
  ────────────────
  0000000100000000  (256 decimal, one's complement result)

The checksum computation: take the pseudo-header (source IP, destination IP, zeroed protocol byte, protocol number 17, UDP length), concatenate the UDP header with checksum field set to zero, and the UDP data. Pad to an even number of bytes if needed. Split into 16-bit words. Sum all words using one’s complement addition. Take the bitwise complement of the result (flip every bit) and store it as the checksum.

Verification at the receiver: sum all 16-bit words of the pseudo-header, UDP header including the stored checksum, and UDP data using one’s complement addition. If the result is 0xFFFF (all 16 bits are 1), the packet is intact. Why 0xFFFF? Because the stored checksum is defined as the complement of the sum of everything else. Adding a number to its complement in one’s complement arithmetic always gives 0xFFFF.

The checksum computation includes both the IP pseudo-header and the UDP payload. Any single-bit flip anywhere in the covered region changes the sum and causes the receiver to see a result different from 0xFFFF. The packet gets dropped silently.

The attacker needs to flip the acceleration exponent LSB at bit 23 of the acceleration float and have the packet still pass checksum verification. Flipping just one bit in the payload causes a checksum failure. The attacker needs to compensate.

image

Why the Checksum Bypass Works (and Exactly When It Fails)

Here is the bypass. The checksum is computed over 16-bit words. Position, velocity, and acceleration each span 4 bytes, which is two 16-bit words. The checksum field itself is a 16-bit word. The key insight: flipping a bit at position $k$ within its 16-bit word changes that word’s contribution to the checksum sum by $+2^k$ or $-2^k$, depending on whether the bit was 0 or 1 before the flip. To maintain the sum at 0xFFFF, you need a compensating change somewhere else that contributes an equal and opposite amount.

The compensation: flip the bit at the same position $k$ within the checksum field’s 16-bit word. Now both words have a bit at position $k$ flipped. The change to the acceleration word’s contribution is $(-1)^{d} \cdot 2^k$ where $d$ is the original bit value (0 or 1). The change to the checksum word’s contribution is $(-1)^{c} \cdot 2^k$ where $c$ is the original checksum bit value.

For these two changes to cancel, you need $(-1)^d = -(-1)^c$, which means $(-1)^d + (-1)^c = 0$, which means $d \neq c$. The two bits must have different values in the plaintext. If the acceleration payload bit is 0 and the corresponding checksum bit is 1, or vice versa, the changes cancel. If they are both 0 or both 1, the changes add instead of canceling, and the checksum fails.

The attacker flips two ciphertext bits: the acceleration exponent LSB at its ciphertext position, and the aligned checksum bit at its ciphertext position. After decryption, the receiver sees both plaintext bits flipped. The checksum passes if the original plaintext values of those two bits differ (one 0, one 1). The checksum fails if they match (both 0 or both 1).

The attacker cannot see the plaintext. They do not know the original bit values. The checksum field encodes a value that depends on the IP addresses, port numbers, message length, and the payload. These combine in a way that makes the checksum bit approximately pairwise independent of any single payload bit. This means the probability that the checksum bit equals the payload bit is approximately 0.5. About half the attacks pass checksum, half fail. On failure, the packet is silently dropped at vehicle B. Vehicle B treats it as a normal packet loss, exactly like what happens routinely in a lossy wireless channel.

One critical implementation note: OAI (the open-source 5G platform used for the experiment) inverts all checksum bits after calculation. This flips the success-failure parity pattern compared to the theoretical prediction. Where theory predicts success when the two bits differ, OAI’s implementation succeeds when the two bits match. The mathematics are identical; the bit convention is inverted. If you run this on OAI and observe the opposite of what you expect, this is why.

Measured success rates across 500 packets per vehicle:

FlipsVehicle 1Vehicle 2Vehicle 3
2-bit0.5140.4820.482
4-bit0.2800.2580.284
6-bit0.1100.0900.108
8-bit0.0640.0340.054

Each additional pair of bit flips requires its own independent parity coincidence. The success probabilities multiply approximately as $2^{-n/2}$ where $n$ is the total number of flipped bits. At 2 flips: $2^{-1} = 0.5$. At 4 flips: $2^{-2} = 0.25$. At 8 flips: $2^{-4} = 0.0625$. The measured data tracks this theoretical exponential decay closely.

image

The Payload-Only Variant

The second attack strategy flips two bits both within the data payload, targeting two floats rather than one float and the checksum. For this to pass checksum verification, the two payload bits must lie in the same 16-bit word and have different original values. If they match (both 0 or both 1), flipping both adds or subtracts the same amount from that word’s contribution twice, and the sum changes. If they differ, flipping both cancels the contributions and the sum is unchanged.

Success rates depend on the data distribution. On a highway, vehicles accelerate more than they brake (in absolute magnitude, at least on the dataset used), so the sign bit of the acceleration float is nearly always 0. Two sign bits that are both 0 cannot satisfy the different-value condition. The sign bit attack rarely succeeds. Mantissa bits distribute more uniformly across 0 and 1 because the fractional parts of real measurements are less predictable, so mantissa attacks get closer to 50%.

Positions attackedSuccess rate (Vehicle 4)
Checksum + Acceleration exponent LSB0.530
Acceleration sign + Velocity sign0.412
Acceleration exponent MSB + Velocity exponent MSB0.276
Acceleration mantissa MSB + Velocity mantissa MSB0.436

When the data distribution pushes the bits toward equal values, success rate falls below 50%. When the distribution makes different values more likely, success rate can exceed 50%. For different applications or data types with different distributions, the payload-only attack may be more or less favorable than these numbers suggest.

The Experiment

The experiment ran on OpenAirInterface (OAI), an open-source implementation of the complete 5G NR protocol stack that follows 3GPP specifications closely enough to interoperate with commercial hardware. Vehicle trajectory data came from the NGSIM dataset, recorded by the US Federal Highway Administration using roadside cameras on US-101 in Los Angeles and I-80 in Emeryville, California. The data contains position in feet, velocity in feet per second, and acceleration in feet per second squared at 0.1-second intervals for dozens of vehicles.

The attack code: two XOR operations inserted into OAI’s PDCP transmit function deliver_pdu_drb_ue in openair2/LAYER2/nr_pdcp/nr_pdcp_oai_api.c, executed immediately after nea_encrypt() produced the ciphertext:

1
2
3
4
5
6
7
8
9
10
// Flip the least significant bit of the checksum field's last byte.
// 0x01 in binary is 00000001. XOR with this flips only the rightmost bit of that byte.
// ^= is compound assignment: reads the byte, XORs with 0x01, writes back.
ciphertext[CHECKSUM_BYTE_OFFSET] ^= 0x01;

// Flip the least significant bit of the third byte of the acceleration float.
// This targets bit 23 of the 32-bit float: the least significant bit of the exponent.
// The acceleration float occupies bytes 36-39 of the IP packet.
// Byte 38 (offset 2 within the float) contains bit 23 at its bit-0 position.
ciphertext[ACCEL_BYTE_OFFSET] ^= 0x01;

Ciphertext bytes before and after:

  • Checksum: 0xc354 changed to 0xc3d4
  • Acceleration: 0xe2c2ce32 changed to 0xe242ce32

Vehicle B decrypted, ran the checksum, found 0xFFFF, and delivered the packet to the application, which logged: Received Data: (300.0, 25.0, 4.0). Vehicle A had sent (300.0, 25.0, 2.0). The acceleration doubled. Clean pass.

image

For the payload-only variant, modifying position and velocity ciphertext bytes:

image

Why This Works on CACC Specifically

Packet loss in CACC is not a safety problem. The Kalman filter in vehicle B’s CACC controller treats each received packet as a noisy measurement. When a packet is missing, the filter simply predicts the state forward using the previous estimate and its motion model. The uncertainty in the prediction grows while measurements are absent, but the vehicle handles this gracefully by increasing its following distance slightly.

What the Kalman filter cannot distinguish is a measurement that arrives with a systematic bias. When 50% of packets report acceleration = 4.0 m/s² instead of 2.0 m/s², the filter’s measurement model tells it this is a valid reading. It updates its state estimate accordingly. Over 10 seconds at 100 Hz, the filter processes 1000 measurements, roughly 500 of which are biased. The estimated state of vehicle A converges toward a value that reflects the biased data. Vehicle B’s controller produces speed setpoints based on this estimate. If vehicle A is decelerating at 2 m/s² but vehicle B estimates 4 m/s², vehicle B brakes more aggressively than necessary. If vehicle A is accelerating at 2 m/s² but vehicle B estimates 4 m/s², vehicle B may increase its following speed more aggressively than safe.

With shuffling, the 3% residual success rate corrupts random fields of random packets. The filter sees these as outliers scattered across position, velocity, and acceleration without pattern. Its outlier rejection or robustness mechanisms flag them as noise. The systematic bias disappears. The attack loses its power to drive the filter toward a wrong estimate.

Prior Work: What Made This Different

Research on bit-flipping attacks in network protocols goes back over a decade. Paterson and Yau showed in 2006 that IPSec-protected datagrams could be manipulated by exploiting the block cipher mode used at the time. In the LTE context, Rupprecht et al. introduced the ALTER attack in 2019, which flipped bits in LTE user-plane traffic. Their setup required the attacker to know the original plaintext to compute the correct mask for targeted modifications. Changing a DNS reply to point to a malicious IP address requires knowing the legitimate IP address so you can compute the XOR difference. Tan et al. analyzed bit-flipping on 5G data-plane packets but focused on fields with predictable content like IP header fields, where the plaintext is structurally known.

The IEEE 754 floating-point structure changes this. You do not need to know the acceleration value to double it. You need to know that byte offset 38 carries the third byte of a 32-bit float representing acceleration, and that the lowest bit of that byte is the least significant exponent bit, and that incrementing the exponent doubles the value. None of this is secret. It is in the protocol specification. Format knowledge alone is sufficient for a targeted attack on numerical sensor data.

The Defense: Shuffling the Ciphertext With Its Own Keystream

The attack requires knowing which ciphertext byte position corresponds to which plaintext byte position. After decryption, ciphertext byte $i$ becomes plaintext byte $i$. If the attacker flips ciphertext byte 38, plaintext byte 38 flips. The byte mapping is identity. Format knowledge translates directly into targeted position knowledge.

The defense breaks this identity mapping by randomly rearranging the ciphertext bytes before transmission. The receiver knows the same rearrangement and reverses it before decryption. An attacker who flips a byte in the transmitted (shuffled) ciphertext does not know which original ciphertext byte position they modified, so they do not know which plaintext byte will be corrupted.

The problem: the rearrangement must be secret from the attacker but known to the receiver. This sounds like it requires a separate secret, but 5G already generates one for each packet: the NEA keystream itself. The keystream is derived from $K_{UPenc}$, which the attacker cannot compute. The receiver generates the same keystream because it holds the same key. You can use the keystream bytes as the randomness source for the permutation, at no additional cost.

The permutation algorithm is Fisher-Yates, specifically the Durstenfeld implementation from 1964. It produces a uniformly random permutation (every possible ordering of $n$ elements is equally likely) in $O(n)$ time with exactly $n-1$ swaps. Here it is:

Start with an array $T = [0, 1, 2, \ldots, n-1]$. For each position $i$ from $n-1$ down to 1: pick a random index $j$ uniformly from 0 to $i$ using keystream byte $i$ as the source of randomness ($j = \text{keystream}[i] \bmod (i+1)$). Swap $T[i]$ and $T[j]$.

After the loop, $T[i]$ is the destination position in the shuffled array for the original byte at position $i$.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
def build_permutation_table(n, keystream_bytes):
    # Start with the identity: T[i] = i means "byte i goes to position i", no movement.
    T = list(range(n))
    for i in range(n - 1, 0, -1):
        # keystream_bytes[i] is one byte (0 to 255).
        # (i + 1) is the size of the range we're choosing from: 0 to i inclusive.
        # Modulo maps the keystream byte to a valid index in that range.
        # For small n like 40 bytes, the modulo bias is negligible.
        j = keystream_bytes[i] % (i + 1)
        # Swap the destination entries for positions i and j.
        T[i], T[j] = T[j], T[i]
    # After all swaps, T[i] holds the destination position for original byte i.
    return T

def invert_permutation(T):
    # Build T_inv such that T_inv[T[i]] = i.
    # If T[i] = d, it means "original byte i went to position d in the shuffled array."
    # T_inv[d] = i means "the byte at shuffled position d came from original position i."
    T_inv = [0] * len(T)
    for i, dest in enumerate(T):
        T_inv[dest] = i
    return T_inv

The sender’s procedure, using the same keystream for both encryption and shuffling:

1
2
3
4
5
6
SENDER(Plaintext P, key K_UPenc, COUNT, BEARER, DIRECTION):
    K_stream = NEA(K_UPenc, COUNT, BEARER, DIRECTION, len(P) * 8)
    C = P XOR K_stream                              // standard NEA encryption
    T = build_permutation_table(len(C), K_stream)   // build permutation from same keystream
    C_shuffled = [C[T_inv[i]] for i in range(len(C))]  // rearrange ciphertext bytes
    transmit(C_shuffled)
1
2
3
4
5
6
RECEIVER(C_shuffled, key K_UPenc, COUNT, BEARER, DIRECTION):
    K_stream = NEA(K_UPenc, COUNT, BEARER, DIRECTION, len(C_shuffled) * 8)
    T = build_permutation_table(len(C_shuffled), K_stream)
    T_inv = invert_permutation(T)
    C = [C_shuffled[T[i]] for i in range(len(C_shuffled))]  // unshuffle
    P = C XOR K_stream                              // standard NEA decryption

No additional bytes transmitted. No additional key material. No change to the NEA function or the key derivation. The only overhead is one pass through the ciphertext bytes to apply the permutation, which on 40 bytes takes nanoseconds.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
Algorithm: Keystream-based Shuffling Implementation

Shared: Cipher Key k, control parameters paras for each transmit message,
        Pseudorandom Permutation Generator PRP()

1:  procedure SENDER(Plaintext P)
2:      K ← NEA(k, paras)                 ▷ Generate Keystream
3:      C ← P ⊕ K                         ▷ XOR Ciphering
4:      T ← PRP(K)                        ▷ Permutation Table
5:      create C_s s.t. |C_s| = |C|
6:      for i in len(C) do
7:          C_s[T[i]] = C[i]              ▷ Shuffle according to T
8:      end for
9:      send(C_s)
10: end procedure

11: procedure RECEIVER(Shuffled Ciphertext C_s)
12:     K ← NEA(k, paras)
13:     T ← PRP(K)
14:     create C_u s.t. |C_u| = |C_s|
15:     for i in len(C_s) do
16:         C_u[i] = C_s[T[i]]            ▷ Invert the shuffle
17:     end for
18:     P ← C_u ⊕ K                       ▷ XOR(·, K) is its own inverse
19: end procedure

For the attacker: they know byte offset 38 in the original plaintext. After shuffling, that byte could be at any of the 40 positions in the shuffled ciphertext. They do not know which one because the permutation was derived from $K_{UPenc}$. Flipping any two bytes of the shuffled ciphertext causes two random original ciphertext bytes to flip, which after decryption lands at two random plaintext positions.

For the bypass to succeed: the two randomly-chosen plaintext positions must land in the same 16-bit checksum column, and they must have different original values. The probability that two uniformly random positions out of 40 bytes land in the same 2-byte column is approximately $1/16$ (since there are about 20 columns in a 40-byte packet, and you need both bytes to pick the same column). Given they do land in the same column, the probability their original values differ is approximately $1/2$. Combined: $1/16 \times 1/2 = 1/32 \approx 0.031$.

Measured rates with shuffling:

FlipsVehicle 1Vehicle 2Vehicle 3
2-bit0.0360.0220.028
4-bit0.0040.0080.004
6-bit0.0000.0000.002
8-bit0.0020.0000.000

Success rate dropped from roughly 50% to roughly 3% for 2-bit flips. At 4 flips it falls below 1%. At 6 and 8 flips it is effectively zero.

And for the 3% that do succeed: the attacker has no control over which field gets corrupted. The modified bits land at random plaintext positions determined by the permutation, which changes every packet. Across many packets, corruptions scatter randomly across all fields. The CACC filter sees random single-field noise with no systematic bias. This pattern is statistically identical to normal wireless channel errors. The attack’s power to drive the filter toward a systematically wrong state disappears entirely.

The defense was implemented in OAI at nr_pdcp_entity.c, in functions nr_pdcp_entity_process_sdu (sender side, shuffle after encryption) and nr_pdcp_entity_recv_pdu (receiver side, unshuffle before decryption). Both functions already had the NEA cipher context, so no new state was needed.

NIA: The Defense That Would Have Stopped This Completely

5G includes a formal defense against ciphertext modification called NIA, NR Integrity Algorithm. NIA computes a 32-bit Message Authentication Code (MAC-I) from the entire packet content and the key $K_{UPint}$, appends it to each packet, and the receiver verifies it after decryption.

The three NIA algorithms mirror the NEA algorithms: 128-NIA1 uses SNOW 3G configured as a pseudorandom function, 128-NIA2 uses AES in CMAC (Cipher-based MAC) mode, and 128-NIA3 uses ZUC.

AES-CMAC processes the message through AES in cipher block chaining mode. Each 16-byte block of the message is XORed with the previous AES output before being fed into AES. The final AES output is the MAC. Because each block depends on the previous block’s output, CMAC cannot be computed in parallel. The entire message must be processed sequentially before the MAC is available.

Any modification to the ciphertext, whether one bit or a hundred bits, produces a different plaintext after decryption, which causes the receiver’s recomputed MAC-I to differ from the transmitted MAC-I. The probability that a randomly modified packet happens to produce the correct 32-bit MAC by chance is $2^{-32}$, about 1 in 4 billion. This is not a statistical property of the data. It is a consequence of the computational hardness of forging a keyed MAC without knowing $K_{UPint}$.

3GPP TS 33.501 mandates that all 5G devices support NIA. It also says NIA use is optional and operator-configured for user-plane traffic.

The reason NIA gets disabled: it appends 4 bytes to every packet and requires computing a MAC for every transmitted and received packet. For a vehicle telemetry system at 100 Hz, that is 400 bytes per second of MAC overhead per vehicle. For 200 vehicles sharing one base station, the base station processes 80 kilobytes per second of MAC data on top of all normal decryption. For AES-CMAC specifically, the sequential processing requirement means the MAC computation sits on the critical path for packet delivery latency. Research measuring the actual throughput cost of user-plane integrity protection on commercial 5G hardware (published at USENIX BigMAC Workshop) found significant reductions in achievable throughput when NIA2 was enabled.

For IoT sensors sending 10-byte readings at high rates, 4 extra bytes per packet is a 40% overhead increase. For industrial control actuators with microsecond timing requirements, sequential MAC computation adds latency that may violate real-time constraints.

The applications where this attack causes the most damage, high-frequency vehicle telemetry, industrial control, drone navigation, are exactly the applications most likely to have NIA disabled. They have the tightest latency requirements, the highest packet rates, the most constrained hardware, and the most severe consequences when their data is silently corrupted.

Shuffling costs zero bytes of overhead and zero additional latency on the critical path. Its protection is probabilistic rather than cryptographic, which is a real limitation. A determined attacker with enough patience can still succeed on 3% of packets. But those successes land at random positions and cannot be targeted. The systematic bias that would corrupt a Kalman filter disappears. For CACC and similar control applications where the threat is sustained targeted bias rather than individual packet corruption, shuffling addresses the actual threat at the actual cost.

References

[1] Tian, B. (2024). Wireless Communications. De Gruyter / China Science Publishing & Media. ISBN 978-3-11-075135-2.

[2] 3rd Generation Partnership Project. (2025). TS 33.501: Security architecture and procedures for 5G system, v19.3.0.

[3] 3rd Generation Partnership Project. (2024). TS 38.211: NR; Physical channels and modulation, v17.6.0.

[4] 3rd Generation Partnership Project. (2024). TS 38.323: NR; Packet Data Convergence Protocol (PDCP) specification, v17.4.0.

[5] Ekdahl, P., & Johansson, T. (2002). A New Version of the Stream Cipher SNOW. SAC 2002, pp. 47-61.

[6] National Institute of Standards and Technology. (2001). Advanced Encryption Standard (AES). FIPS PUB 197.

[7] ZUC Design Team. (2011). Specification of the 3GPP Confidentiality and Integrity Algorithms 128-EEA3 and 128-EIA3. ETSI/SAGE, v1.6.

[8] Postel, J. (1980). User Datagram Protocol. RFC 768. IETF.

[9] IEEE. (2019). IEEE Standard for Floating-Point Arithmetic. IEEE 754-2019.

[10] Fisher, R. A., & Yates, F. (1938). Statistical Tables for Biological, Agricultural and Medical Research. Oliver and Boyd.

[11] Durstenfeld, R. (1964). Algorithm 235: Random permutation. Communications of the ACM, 7(7), p. 420.

[12] Rupprecht, D., Kohls, K., Holz, T., & Pöpper, C. (2019). Breaking LTE on Layer Two. IEEE Symposium on Security and Privacy, pp. 1121-1136.

[13] Schmitt, L., Heindl, L., & Schotten, H. D. (2022). Potential of 5G User-Plane Integrity Protection. USENIX BigMAC Workshop.

[14] Paterson, K. G., & Yau, A. K. (2006). Cryptography in Theory and Practice: The Case of Encryption in IPsec. EUROCRYPT 2006, pp. 12-29.

[15] Tan, Z. (2022). System Security in 5G/4G/xG Mobile Networks: New Attacks and Countermeasures. UCLA.

[16] U.S. Department of Transportation FHWA. (2016). Next Generation Simulation (NGSIM) Vehicle Trajectories and Supporting Data. doi:10.21949/1504477.

[17] Rappaport, T. S. (2002). Wireless Communications: Principles and Practice, 2nd ed. Prentice Hall.

[18] Goldsmith, A. (2005). Wireless Communications. Cambridge University Press.

[19] OAI implementation: github.com/DerekDuan615/5G-Bit-Flipping-Attack-Exploration (branch: keystream-based-shuffle)

This post is licensed under CC BY 4.0 by the author.