Embedded systems do not have virtual memory, swap files, or an operating system that politely terminates a process when it runs out of RAM. A single stack overflow or heap fragmentation bug can silently corrupt data, crash the firmware hours later, or create a safety hazard. In this lesson you will build a UART command processor that avoids dynamic allocation entirely by using FreeRTOS static allocation APIs. You will measure stack usage with high-water marks, compare the five FreeRTOS heap schemes to understand their tradeoffs, and configure the memory protection unit (MPU) to catch an intentional buffer overrun at the hardware level. #FreeRTOS #MemorySafety #MPU
What We Are Building
Memory-Safe Command Processor
A UART-driven command processor where every FreeRTOS object (tasks, queues, semaphores) is statically allocated. The system accepts text commands over serial, parses them, and executes actions like toggling LEDs or reporting memory statistics. Stack watermarking monitors each task’s peak usage. A deliberate buffer overflow triggers an MPU fault, demonstrating hardware-level memory protection.
Project specifications:
Parameter
Value
MCU
STM32 Blue Pill (STM32F103C8T6) or ESP32 DevKit
RTOS
FreeRTOS with static allocation enabled
Allocation mode
configSUPPORT_STATIC_ALLOCATION = 1
Heap schemes tested
heap_1, heap_2, heap_3, heap_4, heap_5
Stack monitoring
uxTaskGetStackHighWaterMark() on all tasks
MPU demo
Deliberate out-of-bounds write triggers HardFault
Interface
UART serial at 115200 baud
Components
MCU board only (reuse existing hardware)
Parts List
Ref
Component
Quantity
Notes
U1
STM32 Blue Pill or ESP32 DevKit
1
Reuse from prior courses
-
USB-to-Serial adapter
1
If not using USB CDC (reuse from prior courses)
-
Jumper wires
Several
For UART connections if needed
Why Memory Matters in Embedded Systems
On a desktop PC, the operating system gives each process its own virtual address space backed by gigabytes of physical RAM and disk-based swap. If a process allocates too much memory, the OS can page out other processes, kill the offender, or simply slow everything down. None of this exists on a microcontroller. The STM32F103C8T6 has 20 KB of SRAM. That is the total. Every task stack, every queue buffer, every global variable, and the FreeRTOS kernel data structures all share that 20 KB with no protection between them.
Three failure modes dominate embedded memory bugs:
Stack overflow. Each FreeRTOS task has its own stack, allocated at task creation. If a function call chain goes too deep, or a local array is too large, the stack pointer grows past the allocated region and silently overwrites whatever sits below it in memory. This might be another task’s stack, a queue buffer, or the kernel’s internal state. The corruption may not cause a visible crash for minutes or hours, making it extremely difficult to diagnose.
Heap fragmentation. If your application repeatedly allocates and frees different-sized blocks, the heap gradually breaks into small non-contiguous fragments. Eventually a perfectly reasonable allocation fails because no single contiguous block is large enough, even though the total free memory is sufficient. On a desktop this might cause a minor slowdown. On a microcontroller it means your firmware stops working in the field.
Memory leaks. Forgetting to free allocated memory is less obvious on an embedded system because there is no process exit to reclaim everything. The leak accumulates across days or weeks of continuous operation until the heap is exhausted.
STM32F103 Memory Map (20 KB SRAM)
──────────────────────────────────
0x2000_5000 ┌──────────────────┐
│ FreeRTOS Heap │ configTOTAL_HEAP
│ (task stacks, │ = 10 KB
│ queues, TCBs) │
0x2000_2800 ├──────────────────┤
│ .bss + .data │ Global variables
│ (static alloc) │ SSD1306 framebuf
0x2000_0400 ├──────────────────┤
│ MSP (main │ Used before
│ stack pointer) │ scheduler starts
0x2000_0000 └──────────────────┘
Every byte counts. A stack overflow in one
task silently corrupts another task's data.
The solution used by most safety-critical embedded systems is straightforward: avoid dynamic allocation entirely. Allocate everything at startup, use fixed-size buffers, and verify at compile time that everything fits. FreeRTOS supports this approach through its static allocation API, and this lesson is built around that philosophy.
FreeRTOS Heap Schemes
FreeRTOS does not use the standard C library malloc and free by default. Instead, it provides five heap implementation files in the portable/MemMang/ directory. You link exactly one of these into your project, and all FreeRTOS internal allocations (tasks, queues, semaphores created with the dynamic API) go through it.
Scheme
Allocate
Free
Coalescing
Best For
heap_1
Yes
No
N/A
Systems that create all objects at startup and never delete them. Simplest, fully deterministic, zero fragmentation risk.
heap_2
Yes
Yes
No
Systems that create and delete objects of the same size. Uses best-fit algorithm. Risk: fragmentation if block sizes vary.
heap_3
Yes
Yes
Depends on libc
Wraps standard malloc/free with scheduler suspension for thread safety. Useful when you need compatibility with third-party libraries that call malloc.
heap_4
Yes
Yes
Yes
General-purpose embedded use. Combines adjacent free blocks to reduce fragmentation. The most common choice for non-safety-critical applications.
heap_5
Yes
Yes
Yes
Like heap_4 but supports non-contiguous memory regions. Required when your MCU has multiple RAM banks (e.g., STM32F4 with CCM and main SRAM).
Heap Scheme Decision Tree
────────────────────────────────────
Create objects at startup only?
YES ──► heap_1 (no free, simplest)
NO ──► Same-size blocks?
YES ──► heap_2 (no coalesce)
NO ──► Multiple RAM banks?
YES ──► heap_5
NO ──► Need libc malloc?
YES ──► heap_3
NO ──► heap_4
(best general)
Selecting a Heap Scheme
In your build system, you include exactly one of the five source files. For example, in a Makefile:
# Choose ONE of these:
SRCS += $(FREERTOS)/portable/MemMang/heap_1.c # Allocate only
# SRCS += $(FREERTOS)/portable/MemMang/heap_2.c # Alloc + free, no coalescing
If you use PlatformIO with the STM32Cube framework, the heap scheme is typically configured through the build flags or by placing the desired heap_X.c file in your source tree.
Heap Statistics
When using heap_1, heap_2, heap_4, or heap_5, you can query the heap at runtime:
snprintf(buf, sizeof(buf), "Heap free: %u bytes, min ever: %u bytes\r\n",
(unsigned)free_heap, (unsigned)min_ever);
uart_send_string(buf);
The xPortGetMinimumEverFreeHeapSize function returns the smallest amount of free heap that has existed since boot. If this number approaches zero, your system is close to running out of memory.
Static Allocation
The safest approach to memory on embedded systems is to eliminate dynamic allocation entirely. FreeRTOS supports this through its static allocation API. When configSUPPORT_STATIC_ALLOCATION is set to 1 in FreeRTOSConfig.h, you can create tasks, queues, semaphores, and timers using caller-provided buffers. The kernel never calls pvPortMalloc for these objects.
Enabling Static Allocation
FreeRTOSConfig.h
#defineconfigSUPPORT_STATIC_ALLOCATION1
/* You can disable dynamic allocation entirely if desired: */
#defineconfigSUPPORT_DYNAMIC_ALLOCATION0
When static allocation is enabled, FreeRTOS requires you to provide memory for the idle task and (if software timers are used) the timer task. You do this by implementing two callback functions:
/* Required when configSUPPORT_STATIC_ALLOCATION == 1 */
The key benefit is that all memory is visible in the linker map file. You can verify at compile time that your total static allocations fit in the available SRAM. No runtime surprises.
Stack Watermarking
Every FreeRTOS task has a fixed-size stack allocated at creation time. If you allocate too little, the task overflows its stack and corrupts adjacent memory. If you allocate too much, you waste precious SRAM. Stack watermarking helps you find the right balance.
How It Works
When FreeRTOS creates a task, it fills the entire stack with a known pattern: 0xA5A5A5A5 (the exact value depends on the port). As the task runs, function calls and local variables overwrite this pattern from the top of the stack downward. The uxTaskGetStackHighWaterMark function scans from the bottom of the stack upward, counting how many words still contain the fill pattern. The result is the high-water mark: the minimum number of unused stack words since the task started.
│ ---- watermark ---- │ <- Deepest point ever reached
│ local variables │
│ saved registers │
│ return addresses │
Low address │ stack pointer (SP) │ <- Current top of stack
└───────────────────────┘
A high-water mark of 20 words means the task came within 20 words (80 bytes on a 32-bit platform) of overflowing. As a rule of thumb, you want at least 20 to 30% of the stack to remain unused as a safety margin.
Monitoring Task
This task prints the high-water mark for every task in the system at regular intervals:
staticvoidvMonitorTask(void*pvParameters) {
charbuf[64];
for (;;) {
uart_send_string("\r\n--- Stack High-Water Marks ---\r\n");
snprintf(buf, sizeof(buf), " Heap min : %u bytes\r\n",
(unsigned)xPortGetMinimumEverFreeHeapSize());
uart_send_string(buf);
vTaskDelay(pdMS_TO_TICKS(5000));
}
}
Using vTaskList for a Formatted Task Table
FreeRTOS can produce a formatted table of all tasks if you enable configUSE_TRACE_FACILITY and configUSE_STATS_FORMATTING_FUNCTIONS in FreeRTOSConfig.h:
#defineconfigUSE_TRACE_FACILITY1
#defineconfigUSE_STATS_FORMATTING_FUNCTIONS1
Then call vTaskList:
chartask_list_buf[512];
vTaskList(task_list_buf);
uart_send_string("Name State Prio Stack Num\r\n");
uart_send_string(task_list_buf);
The output looks like this:
Name State Prio Stack Num
UartRx R 2 82 1
Command B 2 104 2
Monitor R 1 156 3
IDLE R 0 58 4
State codes: R = Ready, B = Blocked, S = Suspended, D = Deleted. The Stack column shows the high-water mark in words.
Stack Overflow Detection
Stack watermarking tells you how close a task came to overflow, but it is passive. You check it periodically and hope you catch problems before they cause damage. FreeRTOS also provides active stack overflow detection that catches overflows as they happen (or shortly after).
Method 1: Stack Pointer Check
Set configCHECK_FOR_STACK_OVERFLOW to 1 in FreeRTOSConfig.h:
#defineconfigCHECK_FOR_STACK_OVERFLOW1
At every context switch, the kernel checks whether the current task’s stack pointer has gone past the end of its allocated stack. If it has, the kernel calls vApplicationStackOverflowHook. This method is fast (one comparison per context switch) but can miss overflows that happen and recover within a single time slice. If the stack briefly overflows during a deep function call but returns before the next context switch, the damage is done but the check does not catch it.
Method 2: Pattern Check
Set configCHECK_FOR_STACK_OVERFLOW to 2:
#defineconfigCHECK_FOR_STACK_OVERFLOW2
In addition to the stack pointer check, the kernel verifies that the last 20 bytes of the stack still contain the fill pattern (0xA5A5A5A5). If any of those bytes have been overwritten, the overflow hook is called. This catches more cases than Method 1 because even transient overflows that corrupt the guard region are detected. The overhead is slightly higher (checking 20 bytes instead of one pointer).
The Overflow Hook
You must implement this function. It is called from the context of the tick interrupt, so it must not call any blocking functions:
When the overflow is detected, you will see the hook fire with the offending task’s name. The recursion depth at which it fires depends on the task’s stack size and the overhead per frame.
Memory Protection Unit (MPU)
The Memory Protection Unit is a hardware feature that restricts which memory regions a piece of code can access. When a task tries to read or write an address outside its permitted regions, the MPU generates a fault exception (MemManage or HardFault), stopping the offending code immediately rather than letting it silently corrupt memory.
STM32F1 Limitation
The STM32F103 (Cortex-M3) used on the Blue Pill does not include an MPU. The Cortex-M3 architecture defines the MPU as optional, and ST chose not to include it in the F1 value line. If you need hardware memory protection, you will need a Cortex-M4 or Cortex-M7 device (STM32F4, F7, H7) or an ESP32, all of which include an MPU or equivalent memory protection.
FreeRTOS MPU Port
FreeRTOS provides an MPU-aware port for Cortex-M devices with an MPU. In this configuration, the idle task and kernel code run in privileged mode with full memory access. User tasks run in unprivileged mode and can only access:
Their own stack
Up to three additional memory regions explicitly granted at task creation
Shared read-only regions (like flash for code and constants)
A task that tries to access another task’s stack or an ungranted peripheral register triggers a MemManage fault.
Conceptual Configuration
While we cannot demonstrate the full MPU port on the Blue Pill (it lacks the hardware), here is how you would configure it on an STM32F4:
/* Task definition with MPU regions (Cortex-M4/M7 only) */
Even without an MPU, the Cortex-M3 still faults on some invalid accesses. Writing to an address that does not map to any peripheral or SRAM region triggers a BusFault, which escalates to a HardFault if BusFault is not explicitly enabled. We can demonstrate this:
staticvoidtrigger_hardfault(void) {
uart_send_string("Writing to invalid address 0x60000000...\r\n");
/* Small delay so the UART output completes */
vTaskDelay(pdMS_TO_TICKS(50));
/* This address is not mapped to any memory or peripheral on STM32F1 */
/* In production: log registers, reset the device */
taskDISABLE_INTERRUPTS();
for (;;);
}
The key takeaway: on devices with an MPU, every task gets its own sandbox. On devices without one (like the Blue Pill), you rely on defensive programming, static analysis, and stack overflow detection to catch bugs before deployment.
Memory Pools
Dynamic allocation (even with heap_4 coalescing) carries fragmentation risk when block sizes vary. A memory pool eliminates this risk entirely by pre-allocating a fixed number of identically sized blocks. Allocation and deallocation are both O(1), with zero fragmentation by design.
Pool Using a FreeRTOS Queue
The simplest way to implement a memory pool in FreeRTOS is to use a queue of pointers. At initialization, you create N buffers and push their addresses into a queue. To allocate, a task takes a pointer from the queue. To free, it gives the pointer back. The queue handles all synchronization automatically.
if (xQueueReceive(xPoolQueue, &ptr, timeout)== pdPASS) {
return ptr;
}
returnNULL; /* Pool exhausted */
}
/* Free a block back to the pool */
voidpool_free(void*ptr) {
xQueueSend(xPoolQueue, &ptr, 0);
}
Why This Works
The queue enforces a maximum of N outstanding allocations. Every block is the same size, so there is no fragmentation. The queue’s built-in blocking means a task that tries to allocate from an empty pool can either wait for a block to be returned or timeout and handle the failure gracefully. ISR-safe variants (xQueueReceiveFromISR, xQueueSendFromISR) let you use the pool from interrupt context too.
In the complete project below, command buffers are allocated from a memory pool. Each UART command string gets a pool block, passes through the command queue, and returns to the pool after processing.
Complete Project: Memory-Safe Command Processor
This is the full application. All FreeRTOS objects are statically allocated. Three tasks cooperate: a UART receiver assembles incoming characters into command strings, a command processor parses and executes them, and a monitor task prints stack and heap statistics every 5 seconds. Command buffers come from a memory pool built on a queue of pointers.
Supported commands:
Command
Action
led on
Turn on the onboard LED (PC13)
led off
Turn off the onboard LED (PC13)
status
Print stack watermarks and heap statistics
stress
Allocate and free memory blocks to demonstrate fragmentation
overflow
Trigger a deliberate stack overflow for demonstration
ESP_LOGI(TAG, "Commands: led on, led off, status, stress, overflow");
/* ESP-IDF scheduler is already running; app_main can return */
}
Note the ESP32 differences: stack sizes are much larger (ESP32 tasks typically need 2048+ bytes), the UART driver is provided by ESP-IDF rather than bare-metal register access, and app_main returns normally because the ESP-IDF scheduler starts before app_main is called. The vApplicationGetIdleTaskMemory and vApplicationGetTimerTaskMemory callbacks are not needed because ESP-IDF handles idle and timer task allocation internally.
Data Flow
┌──────────────┐ char *ptr ┌──────────────┐
│ UartRx │────────────>│ Command │
│ (assembles │ xCmdQueue │ (parses and │
│ commands) │ depth = 8 │ executes) │
│ Priority 2 │ │ Priority 2 │
└──────┬───────┘ └──────┬────────┘
│ │
│ pool_alloc() │ pool_free()
v v
┌──────────────────────────────────────────┐
│ Memory Pool (8 x 64 bytes) │
│ Queue of pointers: take = alloc, │
│ give = free │
└──────────────────────────────────────────┘
┌──────────────┐
│ Monitor │ Prints stack watermarks and
│ (periodic) │ heap statistics every 5 seconds
│ Priority 1 │
└──────────────┘
The UartRx task allocates a buffer from the pool when the first character of a new command arrives. Once the command is complete (newline received), the buffer pointer is sent through xCmdQueue. The Command task receives the pointer, processes the command, and returns the buffer to the pool. Ownership of each buffer is always clear: exactly one task holds a reference at any time.
Project Structure
Directorymemory-safe-cmdproc/
Directorysrc/
main.c
uart.c
uart.h
clock.c
clock.h
Directoryinclude/
FreeRTOSConfig.h
Makefile
platformio.ini
FreeRTOSConfig.h Key Settings
These settings must be present for static allocation, stack monitoring, and overflow detection:
FreeRTOSConfig.h
#defineconfigUSE_PREEMPTION1
#defineconfigTICK_RATE_HZ1000
#defineconfigMAX_PRIORITIES5
#defineconfigMINIMAL_STACK_SIZE128
#defineconfigTOTAL_HEAP_SIZE ((size_t)(4*1024))
/* Static allocation (the core of this lesson) */
#defineconfigSUPPORT_STATIC_ALLOCATION1
#defineconfigSUPPORT_DYNAMIC_ALLOCATION1 /* Keep for heap stats demo */
/* Software timers (needed for timer task memory callback) */
#defineconfigUSE_TIMERS1
#defineconfigTIMER_TASK_STACK_DEPTH128
#defineconfigTIMER_TASK_PRIORITY3
#defineconfigTIMER_QUEUE_LENGTH5
The total heap size is set to 4 KB. Since all tasks, queues, and the mutex are statically allocated, the heap is only used for the stress test demonstration. In a production system with configSUPPORT_DYNAMIC_ALLOCATION set to 0, you would not need the heap at all.
PlatformIO Configuration
; platformio.ini
[env:bluepill]
platform = ststm32
board = bluepill_f103c8
framework = stm32cube
build_flags =
-DUSE_HAL_DRIVER
-DSTM32F103xB
[env:esp32]
platform = espressif32
board = esp32dev
framework = espidf
Experiments
Switch Heap Schemes and Compare
Build the project four times, linking heap_1, heap_2, heap_4, and heap_5 in turn. Run the “stress” command with each scheme. With heap_1 the stress test will fail (free is a no-op). With heap_2 and heap_4, compare the free heap reported after the stress cycle. Log your findings and note which scheme recovers all the memory.
Implement a Custom Allocator
Write a simple bump allocator that hands out memory from a fixed byte array, incrementing a pointer for each allocation and never freeing. Integrate it as a replacement for pvPortMalloc by defining your own pvPortMalloc and vPortFree functions. Measure how much faster it is than heap_4 by timing 1000 allocations.
Add a Memory Leak Detector
Modify the pool allocator to track which task allocated each block (store the task handle alongside the pointer). In the monitor task, print any blocks that have been held for longer than 10 seconds. This simulates a simple leak detector that identifies tasks holding resources too long.
Tune Stack Sizes with Watermarks
Start every task with a generous stack (512 words). Run all commands including “stress”. Check the high-water marks from the “status” output. Reduce each task’s stack to its measured peak usage plus a 30% margin. Rebuild and verify no overflows occur. Document the before and after SRAM usage.
Comments