Hardware Monitoring Live Charts - Guide

✅ Status: Fully Implemented with GPU Detection

Hardware monitoring is now fully functional with stable live charts for all metrics and improved GPU model detection.

Monitoring logic is shared in tools/hardware_monitor.py and used by both classic benchmark flows and capability-driven agent flows.

📊 Implemented Metrics

GPU Detection and Model Info

The system automatically detects all installed GPUs:

NVIDIA GPUs
- Detection: nvidia-smi --query-gpu=name
- VRAM: nvidia-smi --query-gpu=memory.total
- Temperature: nvidia-smi --query-gpu=temperature.gpu
- Power: nvidia-smi --query-gpu=power.draw
AMD GPUs
- rocm-smi detection: rocm-smi --showproductname
- Device ID mapping: lspci -d 1002:{device_id}
- Example: 1002:150e → "Radeon Graphics (Ryzen 9 7950X3D)"
- rocm-smi search path: /usr/bin, /usr/local/bin, /opt/rocm-*/bin/
- VRAM: rocm-smi --showmeminfo vram
- GTT: rocm-smi --showmeminfo gtt
- Temperature: rocm-smi --showtemp
iGPU detection
- Extract from CPU string: regex r'Radeon\s+(\d+[A-Za-z]*)'
- Shows integrated Radeon graphics separately
- Prevents redundancy with dedicated GPUs

GPU Metrics

🌡️ GPU Temperature (°C) - Red
- NVIDIA: nvidia-smi --query-gpu=temperature.gpu
- AMD: rocm-smi --showtemp
- Intel: intel-gpu-top (if available)
⚡ GPU Power (W) - Blue
- NVIDIA: nvidia-smi --query-gpu=power.draw
- AMD: rocm-smi (Current Socket Graphics Package Power)
- Intel: alternative measurement methods
💾 GPU VRAM Usage (GB) - Green
- NVIDIA: nvidia-smi --query-gpu=memory.used
- AMD: rocm-smi --showmeminfo vram (in bytes)
🧠 GPU GTT Usage (GB) - Purple
- AMD only: rocm-smi --showmeminfo gtt
- System RAM that is used as VRAM
- Example: 2GB VRAM + 46GB GTT = 48GB effective

System Metrics (with --enable-profiling)

🖥️ CPU Usage (%) - Orange
- psutil.cpu_percent(interval=0.1)
- 0-100% range
- System-wide, not per process
💾 System RAM Usage (GB) - Cyan
- psutil.virtual_memory().used
- Smoothing: moving average over 3 samples
- Prevents spikes from cache/buffer fluctuations
- Very stable curves

🔧 Activation

Hardware monitoring is automatically enabled with:

# WebApp with hardware monitoring
./run.py --webapp

# CLI with hardware monitoring
./run.py --enable-profiling

# Only with specific models
./run.py --limit 2 --enable-profiling

📝 Logger Output

When --enable-profiling is active, the benchmark prints metrics every second:

🌡️ GPU Temp: 45.3°C
⚡ GPU Power: 125.5W
💾 GPU VRAM: 8.2GB
🧠 GPU GTT: 0.0GB
🖥️ CPU: 35.2%
💾 RAM: 18.5GB

These outputs are:

✅ Saved in ~/.local/share/lm-studio-bench/logs/benchmark_YYYYMMDD_HHMMSS.log
✅ Shown in the WebApp terminal
✅ Visualized as charts

🎯 Data Flow

Backend (cli/benchmark.py / agents/benchmark.py)
   ↓
Shared Module (tools/hardware_monitor.py)
  ↓
HardwareMonitor._monitor_loop()
  ├─ _get_temperature()
  ├─ _get_power_draw()
  ├─ _get_vram_usage()
  ├─ _get_gtt_usage()
  ├─ _get_cpu_usage()
  └─ _get_ram_usage()
       ↓
logger.info() → stdout + log file
       ↓
WebApp Backend (app.py)
  ├─ _consume_output() Task (blocking readline)
  ├─ parse_hardware_metrics() (Regex patterns)
  └─ hardware_history dict
       ↓
WebSocket
  └─ Sends every 2 seconds (last 60 entries)
       ↓
Frontend (dashboard.html.jinja)
  └─ 6 Plotly.js charts with live updates

Before each profiling run, HardwareMonitor.start() calls _reset_measurements(). This clears prior temperature, power, VRAM, GTT, CPU and RAM samples, so chart data and exported min/max/avg values only reflect the current run.

🐛 Fixes and Optimizations

Fix 1: rocm-smi 7.0.1 Format Change

Problem: rocm-smi changed its output format Solution: regex parser extracts the last number from the line

match = re.search(r'[\d.]+\s*$', line.strip())

Fix 2: Logger Routing

Problem: hardware data did not appear in log files Solution: print() → logger.info() for stdout + file

All hardware metrics are logged using Python's standard logging module:

logger.info(f"🌡️ GPU Temp: {temp:.1f}°C")
logger.info(f"💾 Memory: {vram_mb:.1f}MB VRAM + {gtt_mb:.1f}MB GTT")

This ensures metrics appear in both:

stdout - Real-time display in terminal
log files - ~/.local/share/lm-studio-bench/logs/benchmark_YYYYMMDD_HHMMSS.log for permanent record
WebApp - Streamed via WebSocket to dashboard

Min/Max/Avg statistics - real-time calculation
Last 60 data points - about 2 minutes of history
Responsive design - adapts to window size
Dark mode - default for all charts
Hover tooltips - show exact values on hover

LM Studio Benchmark Docs