Hardware Monitoring Live Charts - Guide
✅ Status: Fully Implemented with GPU Detection
Hardware monitoring is now fully functional with stable live charts for all metrics and improved GPU model detection.
Monitoring logic is shared in tools/hardware_monitor.py and used by both
classic benchmark flows and capability-driven agent flows.
📊 Implemented Metrics
GPU Detection and Model Info
The system automatically detects all installed GPUs:
-
NVIDIA GPUs
- Detection:
nvidia-smi --query-gpu=name - VRAM:
nvidia-smi --query-gpu=memory.total - Temperature:
nvidia-smi --query-gpu=temperature.gpu - Power:
nvidia-smi --query-gpu=power.draw
- Detection:
-
AMD GPUs
- rocm-smi detection:
rocm-smi --showproductname - Device ID mapping:
lspci -d 1002:{device_id} - Example:
1002:150e→ "Radeon Graphics (Ryzen 9 7950X3D)" - rocm-smi search path:
/usr/bin,/usr/local/bin,/opt/rocm-*/bin/ - VRAM:
rocm-smi --showmeminfo vram - GTT:
rocm-smi --showmeminfo gtt - Temperature:
rocm-smi --showtemp
- rocm-smi detection:
-
iGPU detection
- Extract from CPU string: regex
r'Radeon\s+(\d+[A-Za-z]*)' - Shows integrated Radeon graphics separately
- Prevents redundancy with dedicated GPUs
- Extract from CPU string: regex
GPU Metrics
-
🌡️ GPU Temperature (°C) - Red
- NVIDIA:
nvidia-smi --query-gpu=temperature.gpu - AMD:
rocm-smi --showtemp - Intel:
intel-gpu-top(if available)
- NVIDIA:
-
⚡ GPU Power (W) - Blue
- NVIDIA:
nvidia-smi --query-gpu=power.draw - AMD:
rocm-smi(Current Socket Graphics Package Power) - Intel: alternative measurement methods
- NVIDIA:
-
💾 GPU VRAM Usage (GB) - Green
- NVIDIA:
nvidia-smi --query-gpu=memory.used - AMD:
rocm-smi --showmeminfo vram(in bytes)
- NVIDIA:
-
🧠 GPU GTT Usage (GB) - Purple
- AMD only:
rocm-smi --showmeminfo gtt - System RAM that is used as VRAM
- Example: 2GB VRAM + 46GB GTT = 48GB effective
- AMD only:
System Metrics (with --enable-profiling)
-
🖥️ CPU Usage (%) - Orange
psutil.cpu_percent(interval=0.1)- 0-100% range
- System-wide, not per process
-
💾 System RAM Usage (GB) - Cyan
psutil.virtual_memory().used- Smoothing: moving average over 3 samples
- Prevents spikes from cache/buffer fluctuations
- Very stable curves
🔧 Activation
Hardware monitoring is automatically enabled with:
# WebApp with hardware monitoring
./run.py --webapp
# CLI with hardware monitoring
./run.py --enable-profiling
# Only with specific models
./run.py --limit 2 --enable-profiling
📝 Logger Output
When --enable-profiling is active, the benchmark prints metrics every second:
🌡️ GPU Temp: 45.3°C
⚡ GPU Power: 125.5W
💾 GPU VRAM: 8.2GB
🧠 GPU GTT: 0.0GB
🖥️ CPU: 35.2%
💾 RAM: 18.5GB
These outputs are:
- ✅ Saved in
~/.local/share/lm-studio-bench/logs/benchmark_YYYYMMDD_HHMMSS.log - ✅ Shown in the WebApp terminal
- ✅ Visualized as charts
🎯 Data Flow
Backend (cli/benchmark.py / agents/benchmark.py)
↓
Shared Module (tools/hardware_monitor.py)
↓
HardwareMonitor._monitor_loop()
├─ _get_temperature()
├─ _get_power_draw()
├─ _get_vram_usage()
├─ _get_gtt_usage()
├─ _get_cpu_usage()
└─ _get_ram_usage()
↓
logger.info() → stdout + log file
↓
WebApp Backend (app.py)
├─ _consume_output() Task (blocking readline)
├─ parse_hardware_metrics() (Regex patterns)
└─ hardware_history dict
↓
WebSocket
└─ Sends every 2 seconds (last 60 entries)
↓
Frontend (dashboard.html.jinja)
└─ 6 Plotly.js charts with live updates
Before each profiling run, HardwareMonitor.start() calls
_reset_measurements(). This clears prior temperature, power, VRAM, GTT,
CPU and RAM samples, so chart data and exported min/max/avg values only
reflect the current run.
🐛 Fixes and Optimizations
Fix 1: rocm-smi 7.0.1 Format Change
Problem: rocm-smi changed its output format Solution: regex parser extracts the last number from the line
match = re.search(r'[\d.]+\s*$', line.strip())
Fix 2: Logger Routing
Problem: hardware data did not appear in log files
Solution: print() → logger.info() for stdout + file
All hardware metrics are logged using Python's standard logging module:
logger.info(f"🌡️ GPU Temp: {temp:.1f}°C")
logger.info(f"💾 Memory: {vram_mb:.1f}MB VRAM + {gtt_mb:.1f}MB GTT")
This ensures metrics appear in both:
- stdout - Real-time display in terminal
- log files -
~/.local/share/lm-studio-bench/logs/benchmark_YYYYMMDD_HHMMSS.logfor permanent record - WebApp - Streamed via WebSocket to dashboard
Fix 3: WebApp Output Streaming
Problem: WebApp showed only 10% of the hardware data
Solution: asyncio.wait_for() → blocking readline() in executor
Fix 4: RAM Monitoring Spikes
Problem: RAM chart jumped between 1.8GB and 28.3GB Solution: moving average over 3 samples → very stable curve
Fix 5: Runtime Counter Does Not Stop
Problem: runtime counter continued after benchmark end
Solution: clearInterval(uptimeInterval) on completion
Fix 6: WebApp Initialization Race Conditions
Problem: links were not interactive, light mode on startup Solution: 3x DOMContentLoaded events → 1x consolidated event
📊 Chart Properties
All charts update every 2 seconds with:
- Min/Max/Avg statistics - real-time calculation
- Last 60 data points - about 2 minutes of history
- Responsive design - adapts to window size
- Dark mode - default for all charts
- Hover tooltips - show exact values on hover