Autonomous Thermal Management System

Overview

Modern high-performance computing hardware generates significant thermal loads that, if unmanaged, lead to throttling, reduced component lifespan, and system instability. This project developed a closed-loop thermal management system using cascade PID control logic implemented in embedded C on an STM32 microcontroller.

Problem Statement

The target system — a prototype edge-computing module — sustained peak thermal loads of 85 W in a chassis constrained to 1.2 L volume. Passive cooling was insufficient, and naive fan speed control created unacceptable acoustic signatures under variable workloads.

Approach

A multi-zone temperature sensing array (8× NTC thermistors) fed into a cascade PID controller. The primary loop regulated junction temperature; a secondary loop modulated airflow across the heat exchanger. Initial gains were set using the Ziegler–Nichols method and refined through iterative hardware-in-the-loop testing.

Key design decisions included:

  • Anti-windup clamping to prevent integrator saturation during prolonged thermal load steps
  • Feed-forward compensation using workload telemetry from the host CPU
  • Hysteresis bands around fan speed setpoints to suppress limit cycling

Results

  • Reduced peak junction temperature by 14 °C under sustained load
  • 31% reduction in thermal throttling events over a 4-hour test cycle
  • Fan acoustic output reduced by ~6 dB(A) versus the prior on/off control strategy
  • System settling time: < 200 ms to 90% setpoint recovery following a step input

Key Learnings

Thermal systems exhibit significant transport lag — integrator windup was a persistent challenge before anti-windup clamping was introduced. The multi-zone sensing approach proved critical; single-point measurement consistently missed hotspot migration as workload distribution shifted across processor cores.