How to Run LM Studio on a Mini PC (2026):
Complete Setup Guide
LM Studio is the easiest way to run AI models locally — a graphical app that handles downloading, loading, and chatting with open-source models like Mistral 7B and Qwen3 14B without any command line. On a modern AMD mini PC, it works out of the box with GPU acceleration, no drivers to wrangle. This guide covers everything from install to your first conversation in about 20 minutes.
Download LM Studio from lmstudio.ai → install → open → search for a model → download → load → chat. On AMD Radeon mini PCs (780M/890M), GPU acceleration activates automatically via Vulkan — no manual setup. A 32GB Ryzen AI 9 HX 370 machine runs Mistral 7B at 30–40 tokens/sec, Qwen3 14B at 18–25 t/s, and Qwen3 32B at 8–12 t/s.
- 01 What Is LM Studio and Why Use It on a Mini PC?
- 02 Mini PC Requirements for LM Studio
- 03 How to Install LM Studio — Step by Step
- 04 Enabling AMD GPU Acceleration (Vulkan)
- 05 Best Models for Your Mini PC
- 06 Tokens/Sec Benchmarks by Mini PC
- 07 LM Studio vs Ollama — Which Should You Use?
- 08 Using LM Studio as a Local API Server
- 09 FAQ
What Is LM Studio and Why Use It on a Mini PC?
LM Studio is a free desktop application for downloading and running open-source AI models locally. Unlike Ollama (command-line) or llama.cpp (manual compilation), LM Studio has a full graphical interface — you click, download, and chat. Version 0.4.0 (January 2026) added a headless server mode alongside the GUI.
The key thing LM Studio does that matters for mini PC users: it handles all the complexity of running a large language model — quantization selection, GPU layer offloading, memory management — through a visual interface. You don’t need to know what GGUF, Q4_K_M, or n_gpu_layers means to get started. You click a model, click download, click load, and start chatting.
On AMD mini PCs, LM Studio uses the Vulkan backend to offload model layers to the Radeon iGPU (780M, 890M, or 8060S). This is done automatically when you set GPU offloading to Max in the settings. AMD’s own documentation confirms that LM Studio is officially supported on Ryzen AI and Ryzen AI Max processors as of April 2026.
/v1/chat stateful API endpoint. For mini PC home servers, this makes LM Studio a genuine alternative to Ollama for API-first deployments.Mini PC Requirements for LM Studio
LM Studio requires a 64-bit Windows, macOS, or Linux system with 8GB RAM minimum. For practical use on a mini PC, 16GB is the real minimum and 32GB is recommended — model size and RAM available are the same number.
32GB RAM: 7B–32B models. Qwen3 32B fits with room for the OS. This is the sweet spot for most mini PC users. ~8–12 t/s on 32B, 30–40 t/s on 7B with Radeon 890M.
64GB+ RAM: 70B models become possible (needs ~40GB). Peladn HO5 with 32GB can’t do 70B. GMKtec EVO-X2 (128GB) handles Llama 3.1 70B at 18–25 t/s.
The minimum system requirements per LM Studio’s documentation: 64-bit CPU, 8GB RAM, Windows 10/11 (64-bit), macOS 13.6+ or Ubuntu 22.04+. For AMD GPU acceleration: any Radeon iGPU on a Ryzen 7000 series or newer (including 780M, 890M, 8060S) is supported via Vulkan on Windows. You also need the latest AMD Adrenalin driver installed — get it from amd.com/en/support.
How to Install LM Studio — Step by Step
The installation takes under 5 minutes. No command line required. LM Studio installs to your user profile by default — no administrator rights needed on Windows.
Download LM Studio
Go to lmstudio.ai/download and click “Download for Windows”. The installer is approximately 400MB. Save it anywhere and run it — it installs to %APPDATA%\LM Studio by default, no admin required.
Launch and complete setup
On first launch, LM Studio asks where to store models. The default is C:\Users\[you]\.lmstudio\models. If your system drive is small, point it to an external NVMe SSD or a second M.2 drive — models range from 4GB to 40GB+ each.
Update AMD Adrenalin driver (important)
Go to amd.com/en/support, download and install the latest Adrenalin Edition driver for your Radeon iGPU. LM Studio’s Vulkan backend benefits significantly from up-to-date drivers — older drivers can halve inference speeds on AMD hardware. This step takes 5–10 minutes.
Search for and download a model
Click the search bar at the top of LM Studio (or press Ctrl+K) and type “Mistral 7B” or “Qwen3”. You’ll see results from Hugging Face. Select a model and pick the Q4_K_M quantization — this is the best balance of quality and file size. Click “Download”. Wait for the download to complete.
Load the model and start chatting
In “My Models”, click the model you downloaded, then click “Load”. The model loads into RAM/VRAM — this takes 10–30 seconds. Once loaded, go to the Chat tab. Type a message and press Enter. Your first response typically takes 5–10 seconds; subsequent turns are faster as the model is cached.
Enabling AMD GPU Acceleration (Vulkan)
LM Studio auto-detects AMD Radeon iGPUs via Vulkan on Windows. The only setting you need to change is GPU offloading — set it to “Max (GPU)” to push all model layers onto the Radeon iGPU. This improves speeds by 2–4× compared to CPU-only inference.
When you click “Load” on a model, LM Studio shows a model loader dialog. Here’s what to set for maximum AMD iGPU performance:
Open the model loader settings
Click the gear icon next to the “Load” button, or go to Settings → Performance. Make sure “Developer Mode” is enabled (Settings → Developer) to see all options.
Set GPU Offload to Max
Find “GPU Offload” and drag the slider all the way to “Max (GPU)”. On unified memory mini PCs (Radeon 780M, 890M, 8060S), the entire model loads into the unified RAM/VRAM pool. LM Studio will show the estimated memory required — make sure it’s within your available RAM.
Select the Vulkan backend
Under “Inference Backend” (Developer Mode required), select “Vulkan”. On AMD mini PCs this is the recommended backend for Windows. The Metal backend is for macOS only. CUDA is for NVIDIA only. If Vulkan is not listed, update your AMD Adrenalin driver.
Confirm allocation and load
LM Studio shows “X GB will be allocated to GPU”. For a 7B Q4_K_M model this is ~4–5GB, for 14B ~8–9GB, for 32B ~18–20GB. Click Load Model. You should see speeds of 30–40 t/s on a Radeon 890M for 7B models.
Best Models for Your Mini PC in LM Studio
The right model depends on your RAM. For 32GB mini PCs, Qwen3 14B is the best general-purpose model — excellent quality, 18–25 t/s on Radeon 890M. For coding, Qwen3 14B or Qwen3 7B. For fastest responses, Mistral 7B or Qwen3 7B. Always use Q4_K_M quantization on mini PCs unless you have 64GB+.
Tokens/Sec Benchmarks by Mini PC
The following benchmarks represent community-reported speeds using LM Studio with Vulkan backend on AMD mini PCs. They are not MiniPCDeals.net’s own measurements — we do not have test units. Figures are consistent across r/LocalLLaMA and llama.cpp community reports.
| Mini PC (Chip) | Mistral 7B Q4 | Qwen3 14B Q4 | Qwen3 32B Q4 | Llama 70B Q4 |
|---|---|---|---|---|
| Beelink EQ14 (N150 · 16GB) | ~8–12 t/s | ~5–8 t/s | Won’t fit (16GB) | Won’t fit |
| BOSGAME M4 (Ryzen 7 · 32GB · 780M) | ~18–25 t/s | ~12–16 t/s | ~6–9 t/s | Won’t fit (needs 40GB+) |
| SER9 Pro AI / Peladn HO5 (HX 370 · 32GB · 890M) | ~30–40 t/s | ~18–25 t/s | ~8–12 t/s | Won’t fit (needs 40GB+) |
| GMKtec EVO-X2 (AI Max+ 395 · 128GB · 8060S) | ~55–65 t/s | ~35–45 t/s | ~25–35 t/s | ~18–25 t/s |
Speeds with Vulkan backend, GPU offload set to Max. Based on community benchmarks from r/LocalLLaMA, llama.cpp GitHub, April 2026. Results vary by model version, driver version, and system state.
The speed table reveals a clear decision point. For interactive everyday use, anything above 15 t/s feels comfortable — responses appear at roughly typing speed. The BOSGAME M4 at 18–25 t/s on Mistral 7B is fine for chat and coding assistance. The Radeon 890M machines (30–40 t/s) feel noticeably more responsive. Below 10 t/s (N150 machines), the experience feels slow for longer responses but is usable for short tasks.
LM Studio vs Ollama — Which Should You Use?
LM Studio is better for beginners and anyone who wants a GUI. Ollama is better for developers who want a simple REST API and CLI integration. After LM Studio 0.4.0 added headless/server mode, the gap between them narrowed significantly.
LM Studio
Ollama
ollama run llama3In practice, many users install both: LM Studio for discovering and testing models visually, and Ollama for programmatic access from other applications. Both use the same underlying GGUF model files and both support the OpenAI-compatible API for connecting tools like VS Code extensions, Open WebUI, or custom Python scripts.
Using LM Studio as a Local API Server
LM Studio exposes an OpenAI-compatible REST API on port 1234. Any app that supports OpenAI’s API can connect to LM Studio instead, using your local model with no cloud dependency. This includes Cursor, Continue.dev, Open WebUI, and any Python app using the openai library.
To start the LM Studio server, click the “↔” server icon in the left sidebar and click “Start Server”. By default it runs on http://localhost:1234. LM Studio 0.4.0 also added the headless llmster daemon — useful for running LM Studio as a background service on a mini PC home server:
curl -fsSL https://lmstudio.ai/install.sh | bash# Start the daemon
lms daemon up# Load a model
lms load mistral-7b-instruct-v0.3-GGUF/mistral-7b-instruct-v0.3.Q4_K_M.gguf# Start the API server
lms server start# Chat in terminal
lms chatfrom openai import OpenAI# Replace localhost with your mini PC's IP for network access
client = OpenAI(base_url="http://localhost:1234/v1", api_key="lm-studio")response = client.chat.completions.create(
model="mistral-7b-instruct", # use whatever model is loaded
messages=[{"role": "user", "content": "Explain quantum entanglement simply."}]
)
print(response.choices[0].message.content)localhost with your mini PC’s IP (e.g., 192.168.1.x) in any client app. LM Studio’s server binds to 0.0.0.0 by default in headless mode, making it accessible from any device on your LAN.Frequently Asked Questions
http://[mini-PC-IP]:1234. You can use Open WebUI (a free web interface for local LLMs) as a browser-based chat interface. Access it from any device on your network without installing anything additional.LM Studio version information and features from lmstudio.ai/blog (LM Studio 0.4.0, January 28, 2026). AMD support confirmation from AMD Developer Resources (April 2026, Gemma 4 article). Token speed benchmarks from r/LocalLLaMA community reports and llama.cpp GitHub benchmark threads (Radeon 780M, 890M, 8060S, April 2026). LM Studio system requirements from official LM Studio documentation. Vulkan backend AMD support confirmed in LM Studio 0.3.19+ changelog.
