Overview
Modern high-performance computing hardware generates significant thermal loads that, if unmanaged, lead to throttling, reduced component lifespan, and system instability. This project developed a closed-loop thermal management system using cascade PID control logic implemented in embedded C on an STM32 microcontroller.
Problem Statement
The target system — a prototype edge-computing module — sustained peak thermal loads of 85 W in a chassis constrained to 1.2 L volume. Passive cooling was insufficient, and naive fan speed control created unacceptable acoustic signatures under variable workloads.
Approach
A multi-zone temperature sensing array (8× NTC thermistors) fed into a cascade PID controller. The primary loop regulated junction temperature; a secondary loop modulated airflow across the heat exchanger. Initial gains were set using the Ziegler–Nichols method and refined through iterative hardware-in-the-loop testing.
Key design decisions included:
- Anti-windup clamping to prevent integrator saturation during prolonged thermal load steps
- Feed-forward compensation using workload telemetry from the host CPU
- Hysteresis bands around fan speed setpoints to suppress limit cycling
Results
- Reduced peak junction temperature by 14 °C under sustained load
- 31% reduction in thermal throttling events over a 4-hour test cycle
- Fan acoustic output reduced by ~6 dB(A) versus the prior on/off control strategy
- System settling time: < 200 ms to 90% setpoint recovery following a step input
Key Learnings
Thermal systems exhibit significant transport lag — integrator windup was a persistent challenge before anti-windup clamping was introduced. The multi-zone sensing approach proved critical; single-point measurement consistently missed hotspot migration as workload distribution shifted across processor cores.