Audio Analysis on Zephyr - Nucleo G474RE
A project I built to learn Zephyr RTOS by doing something more involved than blinking an LED. It samples audio via ADC at 44.1 kHz using hardware timer triggering and DMA, runs a real-time FFT with CMSIS-DSP, and streams results over UART to a Python live plotter.
I've used FreeRTOS before and wanted to understand how Zephyr handles devicetree, Kconfig, and the kernel primitives. This project ended up touching all of those plus direct STM32 LL/HAL register work, which was a good way to see where Zephyr's abstractions end and the hardware begins.
Hardware
| Board | ST Nucleo G474RE |
| MCU | STM32G474RE (Cortex-M4F, 170 MHz, FPU) |
| Audio Input | Analog signal on PA0 (Arduino A0) |
| Console | LPUART1 via onboard ST-Link VCP (PA2/PA3) |
| ADC Trigger | TIM6 TRGO at 44,098.6 Hz |
For testing I used a waveform generator feeding a sine wave into PA0. Any 0-3.3V analog source works.
How It Works
TIM6 overflows at ~44.1 kHz and triggers an ADC1 conversion via hardware TRGO. DMA transfers each sample into a ping-pong buffer (2 x 1024 samples). On half-transfer and transfer-complete interrupts, a semaphore wakes the processing thread, which runs a 1024-point real FFT using CMSIS-DSP and sends the results over UART.
TIM6 (44.1 kHz) -> ADC1 conversion -> DMA -> ping-pong buffer
|
DMA half/full IRQ
|
proc_thread wakes
|
FFT + RMS + peak detect
|
UART output
The CPU spends about 0% of its time on processing (verified with Zephyr's thread analyzer). 98% idle. The Cortex-M4F at 170 MHz handles a 1024-point float FFT in well under a millisecond.
Project Structure
audio_analysis/
├── src/
│ ├── main.c # thread setup, init sequence
│ ├── audio_capture.c/.h # TIM6 + ADC1 + DMA (STM32 LL)
│ ├── audio_process.c/.h # CMSIS-DSP FFT, RMS, peak detection
│ └── audio_output.c/.h # UART formatting (summary, CSV, raw)
├── boards/
│ └── nucleo_g474re.overlay # ADC channel + TIM6 devicetree config
├── prj.conf # Kconfig
├── plotter.py # Python live plotter (matplotlib + pyserial)
└── serial_debug.py # Quick serial diagnostic tool
Build and Flash
west build -p
west flash
Board is set in CMakeLists.txt so no -b needed.
Serial Monitor
Open a terminal on the ST-Link VCP at 115200 baud. Output looks like:
RAW:2048,2100,2200,...
FFT:0.12,0.45,3.21,...
RMS: 0.3412 | Peak: 1000.0 Hz (bin 23)
Min: -0.8123 | Max: 0.7945
With the thread analyzer, the output looks like:
Thread analyze:
0x20000150 : STACK: unused 3608 usage 488 / 4096 (11 %); CPU: 0 %
: Total CPU cycles used: 200757869
thread_analyzer : STACK: unused 512 usage 512 / 1024 (50 %); CPU: 1 %
: Total CPU cycles used: 1373467186
sysworkq : STACK: unused 808 usage 216 / 1024 (21 %); CPU: 0 %
: Total CPU cycles used: 1446
idle : STACK: unused 320 usage 64 / 384 (16 %); CPU: 98 %
: Total CPU cycles used: 129000604158
ISR0 : STACK: unused 1832 usage 216 / 2048 (10 %)
Very useful for seeing how my threads are behaving, if I am over allocating stack sizing, or bottlenecks.
Python Plotter
pip install pyserial matplotlib numpy PyQt5
python plotter.py COM5
Shows a live time-domain waveform and frequency spectrum. Data is decimated (every 4th raw sample, first 128 FFT bins) to fit within UART bandwidth at 115200 baud.
What I Learned
Devicetree and Kconfig - Devicetree describes what hardware exists (ADC channel on PA0, TIM6 as a basic timer). Kconfig enables software features (CMSIS-DSP, FPU, thread analyzer). They answer different questions and you need both.
Where Zephyr stops and HAL starts - Zephyr's ADC API doesn't expose hardware timer triggering. For the TIM6 -> ADC1 trigger routing and DMA setup, I had to use STM32 LL functions directly. The devicetree still handles clock enablement and pin configuration, but the actual peripheral interconnection is done in C with register-level calls.
DMA ping-pong buffering - One contiguous buffer, DMA in circular mode, half-transfer and transfer-complete interrupts. While one half fills, the CPU processes the other. No memcpy, just pointer swapping.
CMSIS-DSP on Cortex-M4F - arm_rfft_fast_f32 is fast. The FPU matters. Needed to enable specific Kconfig modules (TRANSFORM, COMPLEXMATH, STATISTICS) for each function family used.
UART is the bottleneck - At 115200 baud you can push maybe 11 KB/s. Sending 1024 raw samples as ASCII text takes longer than the 500ms between frames. Had to decimate the output and move serial reading to a background thread in Python.
Thread analyzer - Adding a few Kconfig lines gives you per-thread CPU% and stack usage. My processing thread uses 11% of its stack and rounds to 0% CPU. The idle thread runs 98% of the time. Good to know before adding more features.
Timing - One sample period is 22.7 us (the ADC/DAC tick). One buffer of 1024 samples is 23.2 ms (the processing deadline). Easy to confuse. The per-sample timing is pure hardware. The CPU only needs to keep up at the buffer level.