LM Studio Benchmark Documentation
Welcome to the LM Studio Benchmark documentation! This tool helps you measure and compare token/s performance across all your locally installed LLM models and their quantizations.
What is this?
A Python benchmark tool for LM Studio with a modern web dashboard that:
- Automatically tests all local LLM models and quantizations
- Measures token/s speeds with warmup and multiple runs
- Exports results in JSON, CSV, PDF, and interactive HTML formats
- Detects GPU capabilities (NVIDIA, AMD, Intel) and monitors VRAM usage
- Provides a web dashboard with live charts and filtering options
- Includes Linux tray controls with live status icons and quick actions
Quick Links
- Quickstart Guide — Get started in 5 minutes
- Configuration Reference — All CLI arguments and config file options
- Architecture Documentation — System architecture with Mermaid diagrams, including testing architecture
- REST API Integration — Advanced features with LM Studio API v1
- Hardware Monitoring — GPU, CPU, RAM tracking
- LLM Metadata Guide — Model capabilities and metadata
- User Data & Configuration — XDG directory structure and config management
- Agent Integration — How to integrate with LM Studio Agents
Features at a Glance
✅ Multi-model benchmarking with intelligent GPU offload ✅ Vision & tool-calling model detection ✅ Progressive VRAM management (automatic fallback) ✅ Caching system (skip already-tested models) ✅ Filter by quantization, architecture, params, context length ✅ Live web dashboard with 27 themes ✅ Linux tray controller with dynamic benchmark status icons ✅ REST API mode with parallel inference support ✅ Download progress tracking, MCP integration, stateful chats ✅ Response caching with 10,000x+ speedup for repeated prompts
Getting Started
Check out the Quickstart Guide to begin benchmarking your models!