warn

This is experimental and shouldn't be used in production. Don't transmit random stuff on ISM bands without understanding local duty cycle limits, ERP restrictions and other regulations!

In this cute post :3
  • Capturing raw IQ data with RTL-SDR
  • Building a GNU Radio flowchart for BPSK demodulation
  • Writing a Rust program to decode framed BPSK packets

Part 5 - BPSK demodulation

Introduction

All good things come to an end, including time spent not using programs I don't understand. Today I managed to convince myself to use GNU Radio to try to demodulate my framed BPSK signal I implemented in the previous part of the series! The first step was to actually understand what I'm trying to achieve rather than blindly copy and paste an example GNU Radio flowchart. I tried that several times before and failed each time, because I just didn't understand AT ALL what am I even working with - what are the properties of the signal, how and why does demodulation work and why does it all have to look in this very specific way.

I fired up the following command to get raw data from my RTL-SDR radio, during the execution of which I connected my STM32WLE5JC board with bpsk-tx example already flashed:

lusia@lusia-laptop ~> rtl_sdr -f 868100000 -s 250000 -g 40 capture.raw
Found 1 device(s):
  0:  Realtek, RTL2838UHIDIR, SN: 00000001

Using device 0: Generic RTL2832U OEM
Found Rafael Micro R820T tuner
Exact sample rate is: 250000.000414 Hz
Enabled direct sampling mode, input 2
Sampling at 250000 S/s.
[R82XX] PLL not locked!
Disabled direct sampling mode
Tuned to 868100000 Hz.
Tuner gain set to 40.20 dB.
Reading samples in async mode...
^CSignal caught, exiting!

User cancel, exiting...

Okay, so I got the raw bytes, and my beautiful framed signal is somewhere there. What now?

Tf am I supposed to do??

Inside the .raw file, there's just a bunch of bytes. But one detail is important and made everything click for me after reading about it for some time! These bytes are actually ordered in a special way:

I0 Q0 I1 Q1 I2 Q2 ...

Bytes come in pairs called I - in-phase and Q - quadrature. SDR receivers measure the signal along these two axes so that the amplitude and instantaneous phase information are preserved.

This is rather easy to understand, but I didn't understand one thing - how can ANYTHING useful be encoded in these pairs? How do they even work? How do I get the pretty waterfall I'm seeing in SDR programs?

Each IQ pair is not dimensionless, it's a measurement done at a precise moment in time! If the sample rate is set to, let's say, 250ksps (thousands of samples per second), then a new sample arrives after exactly 1/250000 = 4 µs. So the data really means: sample 0 -> t = 0, sample 1 -> t = 4 µs, sample 2 -> t = 8 µs. Each pair can be combined into a complex value z = I + jQ, which now represents a point on a 2D plane. When plotting one sample after another, a constant-freq signal becomes a rotating point, where higher frequency means faster rotation!

So the frequency isn't stored directly, but it emerges natually when thinking about these samples in the complex domain. To get a pretty waterfall with all the frequencies and strengths, this task now becomes rather straightforward to understand: the FFT has to analyze a number of samples during a given time window, and measures how much energy (how long the arrows are) exists at different rotation speeds (frequencies). The result becomes a spectrum plot!

Damn now what?

I'm gonna show you the final version of the flowchart I made first, and then explain step by step what happens (I have 0 RF background though, some stuff might be wrong, but I really prepared myself and tried to grasp it mentally!).

The final flowchart
nothing is that bad when you somewhat understand it!

The RX side of BPSK transmission is responsible for several things. On input, the receiver gets raw bytes coming from a SDR radio. It knows nothing - when the signal starts or ends, what does it contain, how fast the bitrate is, nothing. It's also noisy and, in short, useless without doing any digital signal processing.

The first step is to convert the bytes back to IQ form of the signal. The .raw file is read with the File Source block, followed to Uchar To Float, subtracting 127 to transform it, so that the whole signal gets "centered" - 0.0 is the center now. This is followed by Deinterleave, which puts all Is and all Qs on comfy separate outputs. These two values are then combined into a complex value, Throttled to get back real timings, and then put into a Virtual Sink containing the reconstructed signal in proper IQ form.

The next step is to throw out the trash to make the job of subsequent processing steps easier. It's done by Frequency Xlating FIR Filter with firdes.low_pass(1, samp_rate, 2000, 500) settings - basically it tells the filter to cut off anything that's further than 2 kHz of center of the signal (my 600bps BPSK signal is around 1.2 kHz wide, but apparently it's crucial to not cut off too much stuff around it). I also chose the center frequency to be at +1kHz, because the STM32WLE5JC board would offset my signals by that much for some reason. No idea why though.

The filter also does low-pass filtering to remove any high-frequency data + moves it to baseband + signal decimation (I chose 25) so that the program can reduce the number of samples it's working on. Both of these work together to produce a filtered signal, with less samples that didn't get fucked over by random high frequency noise.

The filtered signal is then passed through a AGC2 - it simply normalizes the signal strength.

After that, the actual demodulating happens! The first block here is a Costas Loop, which is a PLL (phase-locked loop) used for phase recovery. Ideally the BPSK signal would get received as ideal 0° <-> 180° phase transitions, but thanks to an infinite number of various variables (unknown initial phase offset, doppler effect, etc), it never does that. It's rotating! So looking at the complex plane, you'd see a circle (or worse, multiple rotating states at once) made of points, just like this one (AGC2 output):

Circle of points in the constellation output
so ugly!

Costas loop fixes that issue by estimating the phase error, frequency offset and by rotating the signal back. It constantly tries to rotate the points until the constellation stops drifting, basically recreates the carrier wave locally and locks onto it. Neat fucking stuff :3

The last part of the puzzle is Symbol Sync, which is kinda similar, but instead of recovering the phase, it recovers the timings. Even with derotated signal, the timings between transmitted symbols are still not known yet, so this algorithm fixes it by carefully choosing when to sample a symbol (eg. not during phase transition, but rather when it's most stable).

A filtered signal waterfall at the top and a constellation diagram at the bottom
so pretty! filtered signal waterfal (top), constellation diagram (bottom)

Finally, the complex IQ signal is converted to real numbers, because now all the BPSK data lies on the in-phase axis, so it's safe to discard the quadrature component - it contains noise. All of the recovered data is either some negative or positive number of the I component. Then, Binary Slicer does one thing: converts >=0 to 1s and the rest to 0s.

info

There's a 50/50 chance that the Costas loop will rotate the signal in such a way, that the phase is rotated by additional 180° - it's not wrong, the bits are just inverted then! It's a normal property of BPSK modulation.

Pack K Bits combines these separate bits and puts them into bytes. Finally, File Sink saves it to a file!

That was a fuckton to unwrap!!!

What's in the file???

Reading the file with xxd reveals what it contains:

lusia@lusia-laptop ~> xxd output.bin
00000000: 196c 936d b692 5a5b 5b4b 4b49 6d2d 696b  .l.m..Z[[KKIm-ik
00000010: 5294 a95a a54a b52a 5294 ad4a d5ea a955  R..Z.J.*R..J...U
00000020: 5aa9 55ea a855 5555 55aa aaaa aaaa aaaa  Z.U..UUUU.......
00000030: 5550 aaa5 556a aa55 2ad5 2ab5 4a95 a95a  UP..Uj.U*.*.J..Z
00000040: 9529 4b4b 4a5b 4b69 2da4 92db 696d 2d2d  .)KKJ[Ki-...im--
00000050: 6969 4b4a 5a94 ad6b 52b5 2aaa aaaa aaaa  iiKJZ..kR.*.....
00000060: aaaa aaaa aaaa aaaa aaaa aaaa aaaa aaaa  ................
00000070: aaaa aaaa aaaa aaaa aabc 194a dce4 bf3a  ...........J...:
00000080: a1db ce25 61b5 71c8 616c b334 55bf 7654  ...%a.q.al.4U.vT
00000090: aee7 88d4 4a00 e4b9 c55d f0e9 400c c05f  ....J....]..@.._
000000a0: eb78 065c d9ca 788e 9433 129a daf8 cad1  .x.\..x..3......
000000b0: 4f9d 8ab2 2ddf c8cb 0da1 cf18 2e5e c9e3  O...-........^..
000000c0: fdea 9394 d61c 340f 06b9 442c 01d6 23e1  ......4...D,..#.
000000d0: b500 4ad6 891b 656e 9017 fedc ff05 1d00  ..J...en........
000000e0: 0a6d b365 2e2b 2d80 97ba b88a 696c 2a3c  .m.e.+-.....il*<
000000f0: 1142 0b7c 22b0 9fef 8cb2 add5 ea02 46f0  .B.|".........F.
00000100: 7f36 a5ac b06f dff8 fb42 38ac a609 a4a2  .6...o...B8.....
00000110: 4600 f488 f07e 0c3e 410a 1ed8 dac0 b825  F....~.>A......%
00000120: f7bb 7786 4356 ad5a a954 aa95 5512 8555  ..w.CV.Z.T..U..U
00000130: 556a aaa5 5aad 56aa d54a a956 ab55 2ab5  Uj..Z.V..J.V.U*.
00000140: 5aa9 54a9 52a9 56aa 54a9 4a52 d2d2 d2d2  Z.T.R.V.T.JR....
00000150: 5a4b 6969 6d24 b6da 4924 9249 2492 4924  ZKiim$..I$.I$.I$
00000160: 9249 25b6 4936 c993 3399 9999 9999 98c6  .I%.I6..3.......
00000170: 39c6 318c 718e 38e7 8e1c 7078 7870 f0f0  9.1.q.8...pxxp..
00000180: 783f 01fe 01f8 3fc0 fc07 f803 ff80 007f  x?....?.........
00000190: ffff d000 3fe0 1fc0 fc0f c1f0 3e07 c1f0  ....?.......>...
000001a0: fc0f 0e1e 3861 c631 8e71 8c63 1ce6 6333  ....8a.1.q.c..c3
000001b0: 2666 4c9b 2649 36db 696d 694a 54ab 5555  &fL.&I6.imiJT.UU
000001c0: 556a ab54 ab5a 94a4 b6d9 3331 c61f 0000  Uj.T.Z....31....

I was weirded out at first, because I saw the aaaa preamble bytes, but I didn't see any header - neither the normal 1f35 one or the inverse of it. It turns out that it's normal and expected, because it'd be a miracle if the demodulation somehow figured out that hmmm yeah let's align perfectly to the original bytes.

The goal now was to go bit by bit and detect the header (normal or inverted) in that data. I decided to modify my stm32wl-subghz crate for that! After all, why not use the existing BPSK-related infrastructure and expose it? I added a hal feature (enabled by default), which the user can disable to get the non-STM32-hardware-related stuffs.

The thing I added that's gated by the inverted hal feature (NO HAL, YES STD!!!) was a decode function impl for BpskPacket. It does very similar stuff to the to_bytes function, but kinda in reverse.

Decoding!!! Decoding!!!

    #[cfg(not(feature = "hal"))]
    pub fn decode(&self, data: &[u8]) -> Vec<DecodeResult> {
        match self {
            // Can't do much with Raw packets so just return the same data and accept as valid
            BpskPacket::Raw => vec![DecodeResult {
                bit_offset: 0,
                inverted: false,
                payload: data.to_vec(),
                crc_valid: true,
            }],
            BpskPacket::Framing {
                preamble_len: _,
                sync_word,
                sync_word_len,
                crc_type,
                whitening,
                whitening_seed,
            } => {
                let sync_word = &sync_word[..*sync_word_len];
                let sync_bits = sync_word_len * 8;
                let total_bits = data.len() * 8;
                let mut results = Vec::new();

                let crc_len = match crc_type {
                    CrcType::None => 0,
                    CrcType::Crc8 => 1,
                    CrcType::Crc16 => 2,
                };

                // Go through every bit, giving space to the sync word length
                for i in 0..total_bits.saturating_sub(sync_bits) {
                    let mut matching: usize = 0;
                    for j in 0..sync_bits {
                        // Compare the sync word bit-by-bit against the data stream
                        let data_bit = (data[(i + j) / 8] >> (7 - ((i + j) % 8))) & 1;
                        let sync_bit = (sync_word[j / 8] >> (7 - (j % 8))) & 1;

                        if data_bit == sync_bit {
                            matching += 1;
                        }
                    }

                    // Allow up to 2 bit errors, if <= 2 bits match or unmatch, it's an accepted
                    // normal or phase-inverted data
                    let inverted = if sync_bits - matching <= 2 {
                        false
                    } else if matching <= 2 {
                        true
                    } else {
                        continue;
                    };

                    // Extract bytes that happen after the sync word is found at any bit offset
                    let bit_start = i + sync_bits;
                    let remaining_bytes = (total_bits - bit_start) / 8;
                    if remaining_bytes == 0 {
                        continue;
                    }

                    // Reassemble bytes from arbitrary bit positions after sync word is found at offset `i`
                    let mut raw = vec![0u8; remaining_bytes];
                    for b in 0..remaining_bytes {
                        let mut byte = 0u8;
                        for bit in 0..8 {
                            let idx = bit_start + b * 8 + bit;
                            let mut val = (data[idx / 8] >> (7 - (idx % 8))) & 1;
                            // Handle the 180 degree phase ambiguity
                            if inverted {
                                val ^= 1;
                            }

                            byte |= val << (7 - bit);
                        }
                        raw[b] = byte;
                    }

                    // De-whiten by applying the same whitening operation again (it's reversible)
                    whitening.apply(*whitening_seed, &mut raw);

                    // Extract payload length
                    let (payload_start, payload_len) = (1, raw[0] as usize);

                    if payload_len == 0 {
                        continue;
                    }

                    if payload_start + payload_len + crc_len > raw.len() {
                        continue;
                    }

                    // Verify CRC over len field and payload
                    let crc_data = &raw[..payload_start + payload_len];
                    let (crc_computed, _) = crc_type.compute(crc_data);

                    let crc_valid = match crc_type {
                        CrcType::None => true,
                        CrcType::Crc8 => raw[payload_start + payload_len] == crc_computed as u8,
                        CrcType::Crc16 => {
                            // Assemble u16 from two bytes
                            let received = ((raw[payload_start + payload_len] as u16) << 8)
                                | raw[payload_start + payload_len + 1] as u16;
                            received == crc_computed
                        }
                    };

                    results.push(DecodeResult {
                        bit_offset: i,
                        inverted,
                        payload: raw[payload_start..payload_start + payload_len].to_vec(),
                        crc_valid,
                    });
                }

                results
            }
        }
    }

First of all, it starts at bit position 0 and tries to find the sync word (normal and inverted version at the same time!). If it can't find it starting at bit position 0, it moves to bit 1, and so on. I'm allowing the algorithm to allow up to 2 bit errors, in case the signal was noisy and the sync word got corrupted slightly. If <=2 bits match, the data was probably inverted, and it should be accepted. Similarly, if <=2 bits don't match, the data is probably correct and didn't get inverted.

Then, the program is extracting bytes that happen after the sync word is found at any bit offset. To do that, it has to unpack the bits and repack them properly, so that everything is bit-aligned again. If the signal was inverted, a simple XOR bit flip is made to correct for that effect.

The prepared data is then ready for further processing! The first step is to de-whiten the data, by applying the exact same whitening operation that was made when encoding in to_bytes - whitening is completely reversible, so the result of a double whiten is just the same data. This allows the program to extract and validate the payload length byte and calculate the proper CRC over length field and the payload. This CRC value is then compared with the decoded CRC, providing a very simple and fast way of telling if every single bit was demodulated and decoded properly! The result (might be wrong, I'm including it anyways!) is then pushed as a DecodeResult to the Vec<DecodeResult> array.

This function allows the end user to not think too much of data decoding, assuming the encoding was done in the exact way the decoder expects it to look like. Here's a simple decoding program I then prepared!

use std::{env, fs};

use stm32wl_subghz::modulations::bpsk::BpskPacket;

fn main() {
    let help_str = "Usage: bpsk-rx <file.bin> [options]\nOptions:\n--show-all: shows packets with invalid CRC";
    let path = env::args().nth(1).expect(help_str);
    let show_all = matches!(env::args().nth(2).unwrap_or_default().as_str(), "--show-all");

    let data = fs::read(&path).unwrap();

    let packet = BpskPacket::default();
    let decode_results = packet.decode(&data);

    println!("Found {} packets!!", decode_results.len());

    for result in decode_results {
        if !result.crc_valid && !show_all {
            continue;
        }
        println!("offset {}\ninverted {}\ncrc valid {}\npayload hex {:x?}\npayload utf-8 {:?}\n\n",
            result.bit_offset,
            result.inverted,
            result.crc_valid,
            result.payload,
            String::from_utf8_lossy(&result.payload)
        );
    }
}

It's soooooooo shrimple and short! And pretty! I tested it quickly on my output.bin file from GNU Radio, and holy fuck:

lusia@lusia-laptop ~/P/bpsk-rx (master)> cargo run -- ~/output.bin
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.01s
     Running `target/debug/bpsk-rx /home/lusia/output.bin`
Found 7 packets!!
offset 971
inverted true
crc valid true
payload hex [68, 69, 69, 69, 69, 69, 20, 68, 65, 6c, 6c, 6f, 20, 3a, 33, 20, 3a, 33, 20, 3a, 33, 20, 74, 68, 69, 73, 20, 69, 73, 20, 61, 20, 6c, 6f, 6f, 6f, 6f, 6f, 6f, 6f, 6f, 6f, 6f, 6f, 6f, 6f, 6f, 6f, 6f, 6f, 6e, 67, 20, 74, 65, 78, 74, 21, 20, 76, 65, 72, 79, 20, 6c, 6f, 6e, 67, 20, 3a, 3e, 20, 61, 6e, 64, 20, 63, 75, 74, 65, 21, 20, 3a, 33, 20, 3a, 33, 20, 3a, 33, 20, 3a, 33, 20, 3a, 33, 20, 3a, 33, 20, 3a, 33, 20, 3a, 33, 20, 3a, 33, 20, 3a, 33, 20, 3a, 33, 20, 3a, 33, 20, 3a, 33, 20, 3a, 33, 20, 75, 6d, 6d, 6d, 6d, 20, 75, 72, 67, 68, 68, 68, 20, 61, 77, 77, 77, 6f, 6f, 6f, 6f, 6f, 6f, 6f, 20, 77, 6f, 6f, 66, 20, 77, 6f, 6f, 6f, 6f, 6f, 66, 20, 77, 6f, 6f, 66]
payload utf-8 "hiiiii hello :3 :3 :3 this is a looooooooooooooooong text! very long :> and cute! :3 :3 :3 :3 :3 :3 :3 :3 :3 :3 :3 :3 :3 :3 ummmm urghhh awwwooooooo woof wooooof woof"
info

It says Found 7 packets!! but it only shows one - that's because it found other places with something that looks like a header (with <=2 bit errors), but the CRC didn't match up. To show all the packets in this example, you'd have to run it with --show-all flag!

I actually got the original data back! The whole framing -> sending -> receiving -> demodulating -> decoding pipeline WORKS WELL and it's now possible to send well-made BPSK packets with STM32WLE5JC chips!

Conclusions

For a change, it was less about code today, and more about telecommunications, which is nice I think! I enjoyed learning about these concepts a lot today and it felt just euphoric to see the same data I was sending, after putting it through many steps of processing. SDRs are sooooo cool and I guess GNU Radio isn't that bad after all...

Ofc the source is available here, as usual!

See ya :3

References