🚀 Quick Start Guide - LM Studio Benchmark Tool
Installation
cd ~/LM-Studio-Bench
# 1) Preview setup (no changes)
./setup.sh --dry-run
# 2) Prepare system + Python environment (recommended)
./setup.sh
# 3) Activate virtual environment
source .venv/bin/activate
If you skip setup.sh, use this manual fallback:
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
🌐 Web Dashboard (Recommended)
Start Web UI
./run.py --webapp
✅ Opens browser automatically at http://localhost:8080
✅ Live streaming of benchmark output via WebSocket
✅ Browse all cached results with interactive tables
✅ System info (GPU model detection, LM Studio health, hardware details)
✅ Dark mode by default with 27 theme options
✅ All CLI parameters available as web form with tooltips
✅ Advanced filtering (quantization, architecture, size, context-length)
✅ Separate logs:
~/.local/share/lm-studio-bench/logs/webapp_*.log and
~/.local/share/lm-studio-bench/logs/benchmark_*.log
✅ Linux tray control with dynamic status icon and quick actions
Dashboard Features:
- Start Benchmark: Configure and run benchmarks from web interface
- Filter by quantization, architecture, parameter size
- Rank results by speed, efficiency, TTFT, or VRAM
- Set hardware limits (max GPU temp, max power draw)
- Tooltip help for all options
- System Info: OS, Kernel, CPU, GPU (with detailed model names)
- LM Studio Health: Live healthcheck status (HTTP API + CLI fallback)
- Live Output: Real-time streaming with colored logs and progress
- Results Browser: Filter and sort all cached benchmark results
- Export: Download JSON/CSV/PDF/HTML reports
- Network Access: Access from other devices on same network
Linux Tray Control
When GTK/AppIndicator dependencies are installed, a tray controller starts with the web app.
- Dynamic status icon:
- Gray: idle
- Green: running
- Yellow: paused
- Red: API unreachable/error
- Smart controls:
- Start enabled in idle/error states
- Pause/Stop enabled only in running/paused states
- Auto refresh: status and controls refresh every 3 seconds
- Quit behavior: tray
Quittriggers graceful full shutdown
Network Access
# Access dashboard from other devices
http://your-ip:8080
# Example:
http://192.168.1.100:8080
💻 Command Line (CLI)
Simple Benchmark (All Models)
./run.py
✅ Tests all installed models with 3 runs each (~1-2 hours)
✅ Automatically saves results to ~/.local/share/lm-studio-bench/results/
✅ Clean output with emoji icons and formatted model lists
✅ Detailed logs saved to
~/.local/share/lm-studio-bench/logs/benchmark_YYYYMMDD_HHMMSS.log
Monitor Logs in Real-Time
# Watch benchmark execution
tail -f ~/.local/share/lm-studio-bench/logs/benchmark_*.log
# Watch web dashboard
tail -f ~/.local/share/lm-studio-bench/logs/webapp_*.log
# Search for errors
grep ERROR ~/.local/share/lm-studio-bench/logs/benchmark_*.log
Quick Test (3 NEW Models)
./run.py --limit 3 --runs 1
✅ Fast test with 3 NEW untested models (~5-10 minutes) ✅ Already tested models automatically loaded from cache ✅ Limit applies ONLY to new models, all cached models included
Development Mode (Fastest)
./run.py --dev-mode
✅ Automatically selects smallest model ✅ Single run for quick validation (~30 seconds) ✅ Perfect for testing changes
Test Single Model
./run.py --limit 1 --runs 1
✅ Single model benchmark (~1-2 minutes)
Advanced Features
1️⃣ Hardware Profiling (6 Live Charts)
Enable Complete Hardware Monitoring:
./run.py --enable-profiling --runs 1 --limit 3
Monitored Metrics:
- 🌡️ GPU Temperature (°C)
- ⚡ GPU Power (W)
- 💾 GPU VRAM (GB)
- 🧠 GPU GTT (GB) - AMD only
- 🖥️ System CPU usage (%)
- 💾 System RAM usage (GB)
✅ All metrics are displayed live in the WebApp ✅ 6 interactive Plotly.js charts with Min/Max/Avg stats ✅ Moving average for stable RAM curves ✅ Each metric is measured every second
With Safety Limits:
./run.py --enable-profiling --max-temp 85 --max-power 350
✅ Interrupts benchmark when limits are exceeded
2️⃣ AMD GTT Support (Shared System RAM)
Enable GTT (Default):
./run.py --limit 3
✅ Automatically uses VRAM + GTT (e.g. 2GB VRAM + 46GB GTT = 48GB) ✅ Enables larger models on AMD APUs/iGPUs ✅ Shown in logs: "💾 Memory: 0.4GB VRAM + 44.7GB GTT = 45.1GB total"
Disable GTT (VRAM-only):
./run.py --disable-gtt --limit 3
✅ Only uses dedicated VRAM ✅ More conservative offload levels ✅ Useful for benchmarking VRAM-only performance
3️⃣ Filtering Models
By Quantization:
./run.py --quants q4,q5 --limit 5
By Architecture:
./run.py --arch llama,mistral --limit 5
By Parameter Size:
./run.py --params 7B,8B --limit 5
By Context Length:
./run.py --min-context 32000 --limit 3
By Model Size:
./run.py --max-size 10 --limit 5
Vision Models Only:
./run.py --only-vision --runs 1
Regex-based Filtering (Include):
# Only Qwen or Phi models
./run.py --include-models "qwen|phi" --runs 1
# Only Llama 7B models
./run.py --include-models "llama.*7b" --runs 1
# Only Q4 quantizations
./run.py --include-models ".*q4.*" --runs 1
Regex-based Filtering (Exclude):
# Exclude uncensored models
./run.py --exclude-models "uncensored" --runs 1
# Exclude Q2 and Q3 quantizations
./run.py --exclude-models "q2|q3" --runs 1
# Exclude all vision models
./run.py --exclude-models ".*vision.*" --runs 1
Combined Filters (AND logic):
# Include llama, exclude q2, only tools
./run.py --include-models "llama" --exclude-models "q2" --only-tools --runs 1
# Vision models, 7B params, max 12GB
./run.py --only-vision --params 7B --max-size 12 --runs 1
3️⃣ Ranking & Sorting
Sort by Efficiency (Default: Speed):
./run.py --limit 5 --rank-by efficiency
Sort by TTFT (Lower = Better):
./run.py --limit 5 --rank-by ttft
Sort by VRAM Usage (Lower = Better):
./run.py --limit 5 --rank-by vram
4️⃣ Cache Management
View Cached Results:
./run.py --list-cache
✅ Shows all cached models with performance metrics
Force Retest (Ignore Cache):
./run.py --retest --limit 3
✅ Re-runs benchmarks even if cached
Regenerate Reports from Database:
./run.py --export-only
✅ Generates JSON/CSV/PDF/HTML from cached results in <1s ✅ No benchmarking - instant report generation ✅ Supports all filters (--params, --quants, --arch, etc.)
Examples:
# All cached models
./run.py --export-only
# Only 7B models from cache
./run.py --export-only --params 7B
# Q4 quantizations with historical comparison
./run.py --export-only --quants q4 --compare-with latest
✅ Retests models even if cached
Export Cache as JSON:
./run.py --export-cache my_backup.json
✅ Exports entire cache database
Cache Behavior:
- First run: Tests all models (~2 hours for 20 models)
- Second run: Loads from cache (~1 second!)
- Automatic invalidation on parameter changes (prompt, context, temperature)
- Shows "X of Y models cached" before starting
5️⃣ Historical Comparison & Trends
Compare with Latest Benchmark:
./run.py --limit 3 --runs 1 --compare-with latest
📊 Shows performance delta (%) vs previous run
Compare with Specific Benchmark:
./run.py --limit 3 --runs 1 --compare-with benchmark_results_20260104_170000.json
6️⃣ Custom Configuration
Adjust Number of Runs:
./run.py --runs 5 --limit 2
Custom Context Length:
./run.py --context 4096 --limit 2 --runs 1
Custom Prompt:
./run.py -P "Your custom prompt here" --limit 2 --runs 1
7️⃣ Presets (Fast Setup)
Show available presets:
./run.py --list-presets
Load a built-in preset:
# Default presets (readonly)
./run.py --preset default_classic # Classic benchmark (default)
./run.py --preset default_compatibility_test # Capability-driven test
# Other presets
./run.py --preset quick_test
./run.py --preset high_quality
./run.py --preset resource_limited
Load preset and override values:
./run.py --preset quick_test --runs 2 --context 2048
./run.py --preset default_classic --runs 5 --context 4096
Backwards Compatibility:
./run.py --preset default # Automatically loads default_classic
Notes:
- Default presets include explicit values for all benchmark form fields, so
preset comparisons do not show
nullvalues for missing keys. default_classicis optimized for full model benchmarking (3 runs)default_compatibility_test(alias:default_compatability_test) is optimized for focused capability testing (1 run)- Capability-driven runs over many installed models continue when a single model fails to load; the failed model is logged and skipped.
- Embedding models are retried automatically without KV-cache offload if LM Studio rejects that load option.
- Legacy keys in imported/user presets are normalized automatically
(
context_length/top_k/top_p/min_p-> current key names).
📊 Output Formats
Each benchmark generates 4 files:
JSON Format
{
"model_name": "qwen/qwen3-8b",
"quantization": "q4_k_m",
"avg_tokens_per_sec": 8.15,
"tokens_per_sec_per_gb": 1.74,
"speed_delta_pct": -0.2,
...
}
✅ Structured data for analysis
CSV Format
model_name,quantization,avg_tokens_per_sec,tokens_per_sec_per_gb,speed_delta_pct
qwen/qwen3-8b,q4_k_m,8.15,1.74,-0.2
✅ Excel/Sheets compatible
PDF Report
- Model rankings (sortable)
- Best-of-Quantization analysis
- Quantization comparison tables (Q4 vs Q5 vs Q6)
- Performance statistics & percentiles
- Delta display (Δ% column)
HTML Report (Interactive Plotly)
- Bar chart: Top 10 models
- Scatter plot: Size vs Performance
- Scatter plot: Efficiency analysis
- NEW: Trend chart showing performance over time
- Summary statistics with gradient backgrounds
📈 Feature Showcase
Example: Complete Analysis
./run.py \
--quants q4,q5,q6 \
--limit 5 \
--runs 1 \
--rank-by efficiency \
--compare-with latest
Output:
- ✅ Filters to 5 models with 3 quantizations each
- ✅ Ranks by efficiency (Tokens/s per GB)
- ✅ Shows delta vs previous benchmark
- ✅ Generates all 4 export formats
- ✅ Includes percentile statistics (P50, P95, P99)
- ✅ Shows quantization comparison
- ✅ Displays performance trends if history available
🎯 Key Metrics
| Metric | Description | Unit |
|---|---|---|
| Speed | Throughput | tokens/s |
| Efficiency | Speed per GB model size | tokens/s/GB |
| TTFT | Time to First Token | ms |
| Delta | Change vs previous | % |
| VRAM | Memory used | MB |
📁 File Structure
results/
├── benchmark_results_20260104_170000.json
├── benchmark_results_20260104_170000.csv
├── benchmark_results_20260104_170000.pdf
└── benchmark_results_20260104_170000.html
🐛 Troubleshooting
No models found
- Ensure LM Studio is installed and running
- Check
lms ls --jsonoutput
Server not responding
- Start LM Studio server manually
- Check
~/.lmstudio/server-logs/
Permission denied on results/
mkdir -p results/
chmod 755 results/
🔗 Related Files
FEATURES.md- Complete feature listPLAN.md- Implementation roadmaprequirements.txt- Python dependencieserrors.log- Debug information
Version: 1.0 (Phases 1-4 Complete) | Updated: 2026-01-04