Local AI Guide April 2026 10 min read

How to Run LM Studio on a Mini PC (2026):
Complete Setup Guide

LM Studio is the easiest way to run AI models locally — a graphical app that handles downloading, loading, and chatting with open-source models like Mistral 7B and Qwen3 14B without any command line. On a modern AMD mini PC, it works out of the box with GPU acceleration, no drivers to wrangle. This guide covers everything from install to your first conversation in about 20 minutes.

By MiniPCDeals.net
10 min · LM Studio 0.4.x
ℹ️This article contains affiliate links. We earn a small commission on qualifying purchases at no extra cost to you.
📌 Quick Answer

Download LM Studio from lmstudio.ai → install → open → search for a model → download → load → chat. On AMD Radeon mini PCs (780M/890M), GPU acceleration activates automatically via Vulkan — no manual setup. A 32GB Ryzen AI 9 HX 370 machine runs Mistral 7B at 30–40 tokens/sec, Qwen3 14B at 18–25 t/s, and Qwen3 32B at 8–12 t/s.

LM Studio version
0.4.x
Jan 2026 · Free
Install time
~5 min
No CLI needed
Mistral 7B on HX 370
30–40 t/s
Radeon 890M · Vulkan
Min RAM
16GB
32GB recommended

What Is LM Studio and Why Use It on a Mini PC?

LM Studio is a free desktop application for downloading and running open-source AI models locally. Unlike Ollama (command-line) or llama.cpp (manual compilation), LM Studio has a full graphical interface — you click, download, and chat. Version 0.4.0 (January 2026) added a headless server mode alongside the GUI.

The key thing LM Studio does that matters for mini PC users: it handles all the complexity of running a large language model — quantization selection, GPU layer offloading, memory management — through a visual interface. You don’t need to know what GGUF, Q4_K_M, or n_gpu_layers means to get started. You click a model, click download, click load, and start chatting.

On AMD mini PCs, LM Studio uses the Vulkan backend to offload model layers to the Radeon iGPU (780M, 890M, or 8060S). This is done automatically when you set GPU offloading to Max in the settings. AMD’s own documentation confirms that LM Studio is officially supported on Ryzen AI and Ryzen AI Max processors as of April 2026.

💡
LM Studio 0.4.0 — what changed in January 2026
The major addition in version 0.4.0 is llmster — a headless daemon mode that runs LM Studio’s core without the GUI. This means you can now install LM Studio on a mini PC server and access it from any device on your network via the REST API, without the desktop interface consuming RAM. You also get parallel requests (multiple users simultaneously) and a new /v1/chat stateful API endpoint. For mini PC home servers, this makes LM Studio a genuine alternative to Ollama for API-first deployments.

Mini PC Requirements for LM Studio

LM Studio requires a 64-bit Windows, macOS, or Linux system with 8GB RAM minimum. For practical use on a mini PC, 16GB is the real minimum and 32GB is recommended — model size and RAM available are the same number.

What your mini PC can run by RAM
16GB RAM: 7B models comfortably (Mistral 7B, Qwen3 7B, Llama 3.2 3B). Best for coding assistance, quick Q&A, document summarisation. ~20–35 t/s on N150 GPU-offloaded inference, ~30–40 t/s on Radeon 890M.

32GB RAM: 7B–32B models. Qwen3 32B fits with room for the OS. This is the sweet spot for most mini PC users. ~8–12 t/s on 32B, 30–40 t/s on 7B with Radeon 890M.

64GB+ RAM: 70B models become possible (needs ~40GB). Peladn HO5 with 32GB can’t do 70B. GMKtec EVO-X2 (128GB) handles Llama 3.1 70B at 18–25 t/s.

The minimum system requirements per LM Studio’s documentation: 64-bit CPU, 8GB RAM, Windows 10/11 (64-bit), macOS 13.6+ or Ubuntu 22.04+. For AMD GPU acceleration: any Radeon iGPU on a Ryzen 7000 series or newer (including 780M, 890M, 8060S) is supported via Vulkan on Windows. You also need the latest AMD Adrenalin driver installed — get it from amd.com/en/support.

How to Install LM Studio — Step by Step

The installation takes under 5 minutes. No command line required. LM Studio installs to your user profile by default — no administrator rights needed on Windows.

1

Download LM Studio

Go to lmstudio.ai/download and click “Download for Windows”. The installer is approximately 400MB. Save it anywhere and run it — it installs to %APPDATA%\LM Studio by default, no admin required.

2

Launch and complete setup

On first launch, LM Studio asks where to store models. The default is C:\Users\[you]\.lmstudio\models. If your system drive is small, point it to an external NVMe SSD or a second M.2 drive — models range from 4GB to 40GB+ each.

3

Update AMD Adrenalin driver (important)

Go to amd.com/en/support, download and install the latest Adrenalin Edition driver for your Radeon iGPU. LM Studio’s Vulkan backend benefits significantly from up-to-date drivers — older drivers can halve inference speeds on AMD hardware. This step takes 5–10 minutes.

4

Search for and download a model

Click the search bar at the top of LM Studio (or press Ctrl+K) and type “Mistral 7B” or “Qwen3”. You’ll see results from Hugging Face. Select a model and pick the Q4_K_M quantization — this is the best balance of quality and file size. Click “Download”. Wait for the download to complete.

5

Load the model and start chatting

In “My Models”, click the model you downloaded, then click “Load”. The model loads into RAM/VRAM — this takes 10–30 seconds. Once loaded, go to the Chat tab. Type a message and press Enter. Your first response typically takes 5–10 seconds; subsequent turns are faster as the model is cached.

Enabling AMD GPU Acceleration (Vulkan)

LM Studio auto-detects AMD Radeon iGPUs via Vulkan on Windows. The only setting you need to change is GPU offloading — set it to “Max (GPU)” to push all model layers onto the Radeon iGPU. This improves speeds by 2–4× compared to CPU-only inference.

When you click “Load” on a model, LM Studio shows a model loader dialog. Here’s what to set for maximum AMD iGPU performance:

1

Open the model loader settings

Click the gear icon next to the “Load” button, or go to Settings → Performance. Make sure “Developer Mode” is enabled (Settings → Developer) to see all options.

2

Set GPU Offload to Max

Find “GPU Offload” and drag the slider all the way to “Max (GPU)”. On unified memory mini PCs (Radeon 780M, 890M, 8060S), the entire model loads into the unified RAM/VRAM pool. LM Studio will show the estimated memory required — make sure it’s within your available RAM.

3

Select the Vulkan backend

Under “Inference Backend” (Developer Mode required), select “Vulkan”. On AMD mini PCs this is the recommended backend for Windows. The Metal backend is for macOS only. CUDA is for NVIDIA only. If Vulkan is not listed, update your AMD Adrenalin driver.

4

Confirm allocation and load

LM Studio shows “X GB will be allocated to GPU”. For a 7B Q4_K_M model this is ~4–5GB, for 14B ~8–9GB, for 32B ~18–20GB. Click Load Model. You should see speeds of 30–40 t/s on a Radeon 890M for 7B models.

⚠️
If GPU acceleration doesn’t activate
If LM Studio reverts to CPU-only inference (speeds below 5 t/s), try: (1) Update AMD Adrenalin driver to the latest version. (2) In BIOS, increase the iGPU frame buffer allocation to 4–8GB. (3) Restart LM Studio after driver update. (4) In the model loader, explicitly select “Vulkan” backend — do not leave it on “Auto” if Vulkan isn’t being picked automatically. ROCm on Windows is also supported but requires additional AMD ROCm driver installation and is primarily recommended for Linux.

Best Models for Your Mini PC in LM Studio

The right model depends on your RAM. For 32GB mini PCs, Qwen3 14B is the best general-purpose model — excellent quality, 18–25 t/s on Radeon 890M. For coding, Qwen3 14B or Qwen3 7B. For fastest responses, Mistral 7B or Qwen3 7B. Always use Q4_K_M quantization on mini PCs unless you have 64GB+.

Mistral 7B Instruct
Fastest
The fastest practical model for interactive chat. 7B parameters, fits in ~5GB at Q4_K_M. Best for coding assistance, quick summarisation, and drafting. The go-to choice when response speed matters more than reasoning depth.
~30–40 t/s · Radeon 890M · Q4_K_M
Qwen3 7B
Speed + Quality
Qwen3 7B matches or exceeds Mistral 7B on most benchmarks at the same size. Particularly strong at code generation and instruction following. Recommended over Mistral 7B for 2026 for most use cases.
~30–38 t/s · Radeon 890M · Q4_K_M
Qwen3 32B
Power Users
The maximum practical model for a 32GB mini PC. Fits in ~22GB at Q4_K_M. Noticeably slower than 7B/14B but delivers significantly better reasoning on complex tasks. Needs 32GB — will not fully load on 16GB machines.
~8–12 t/s · Radeon 890M · Q4_K_M
🤖
Which quantization to pick?
Q4_K_M is the right default for mini PCs — good quality, roughly 4 bits per weight, best balance of size and accuracy. Q8_0 is higher quality but doubles file size and memory use — only practical if you have 64GB+. Q2_K is very compressed and noticeably worse quality — only use it if a larger model won’t fit at Q4 (e.g., a 70B model at Q2 on a 64GB machine). In LM Studio’s search results, filter by “Q4_K_M” for the best default choice.

Tokens/Sec Benchmarks by Mini PC

The following benchmarks represent community-reported speeds using LM Studio with Vulkan backend on AMD mini PCs. They are not MiniPCDeals.net’s own measurements — we do not have test units. Figures are consistent across r/LocalLLaMA and llama.cpp community reports.

Mini PC (Chip)Mistral 7B Q4Qwen3 14B Q4Qwen3 32B Q4Llama 70B Q4
Beelink EQ14 (N150 · 16GB)~8–12 t/s~5–8 t/sWon’t fit (16GB)Won’t fit
BOSGAME M4 (Ryzen 7 · 32GB · 780M)~18–25 t/s~12–16 t/s~6–9 t/sWon’t fit (needs 40GB+)
SER9 Pro AI / Peladn HO5 (HX 370 · 32GB · 890M)~30–40 t/s~18–25 t/s~8–12 t/sWon’t fit (needs 40GB+)
GMKtec EVO-X2 (AI Max+ 395 · 128GB · 8060S)~55–65 t/s~35–45 t/s~25–35 t/s~18–25 t/s

Speeds with Vulkan backend, GPU offload set to Max. Based on community benchmarks from r/LocalLLaMA, llama.cpp GitHub, April 2026. Results vary by model version, driver version, and system state.

The speed table reveals a clear decision point. For interactive everyday use, anything above 15 t/s feels comfortable — responses appear at roughly typing speed. The BOSGAME M4 at 18–25 t/s on Mistral 7B is fine for chat and coding assistance. The Radeon 890M machines (30–40 t/s) feel noticeably more responsive. Below 10 t/s (N150 machines), the experience feels slow for longer responses but is usable for short tasks.

LM Studio vs Ollama — Which Should You Use?

LM Studio is better for beginners and anyone who wants a GUI. Ollama is better for developers who want a simple REST API and CLI integration. After LM Studio 0.4.0 added headless/server mode, the gap between them narrowed significantly.

LM Studio

InterfaceFull GUI + headless (0.4.0+)
SetupClick-to-download models
AMD supportVulkan (auto-detected)
APIOpenAI-compatible + /v1/chat
Best forBeginners, chat, visual model mgmt
LicenseFree (personal + commercial)

Ollama

InterfaceCommand-line first
SetupOne command: ollama run llama3
AMD supportVulkan + ROCm (Linux)
APIOpenAI-compatible REST API
Best forDevelopers, scripts, integrations
LicenseFree / open source

In practice, many users install both: LM Studio for discovering and testing models visually, and Ollama for programmatic access from other applications. Both use the same underlying GGUF model files and both support the OpenAI-compatible API for connecting tools like VS Code extensions, Open WebUI, or custom Python scripts.

Using LM Studio as a Local API Server

LM Studio exposes an OpenAI-compatible REST API on port 1234. Any app that supports OpenAI’s API can connect to LM Studio instead, using your local model with no cloud dependency. This includes Cursor, Continue.dev, Open WebUI, and any Python app using the openai library.

To start the LM Studio server, click the “↔” server icon in the left sidebar and click “Start Server”. By default it runs on http://localhost:1234. LM Studio 0.4.0 also added the headless llmster daemon — useful for running LM Studio as a background service on a mini PC home server:

Install llmster (headless mode)
curl -fsSL https://lmstudio.ai/install.sh | bash# Start the daemon lms daemon up# Load a model lms load mistral-7b-instruct-v0.3-GGUF/mistral-7b-instruct-v0.3.Q4_K_M.gguf# Start the API server lms server start# Chat in terminal lms chat
Connect from Python (any machine on your network)
from openai import OpenAI# Replace localhost with your mini PC's IP for network access client = OpenAI(base_url="http://localhost:1234/v1", api_key="lm-studio")response = client.chat.completions.create( model="mistral-7b-instruct", # use whatever model is loaded messages=[{"role": "user", "content": "Explain quantum entanglement simply."}] ) print(response.choices[0].message.content)
Set a static IP on your mini PC for network access
If you want to access LM Studio from other devices on your network (phone, laptop, tablet), set a static IP on your mini PC in Windows network settings. Then replace localhost with your mini PC’s IP (e.g., 192.168.1.x) in any client app. LM Studio’s server binds to 0.0.0.0 by default in headless mode, making it accessible from any device on your LAN.
🤖
Best mini PC for LM Studio
Peladn HO5 — Ryzen AI 9 HX 370 · 32GB · 30–40 t/s on Mistral 7B · ~$940
Radeon 890M, Copilot+ 50 TOPS NPU, OCuLink. The best mini PC for LM Studio under $1,000 — fast enough for interactive use with 7B–32B models.
Affiliate link · Commission on qualifying purchases at no extra cost
Check Price

Frequently Asked Questions

Minimum: 16GB RAM to run 7B models at Q4 quantization. Recommended: 32GB RAM (allows 7B–32B models). A Ryzen AI 9 HX 370 or Ryzen 7 8745HS mini PC with 32GB gives 18–40 tokens/sec depending on model size. LM Studio auto-detects AMD Radeon iGPUs (780M, 890M, 8060S) via Vulkan. GPU acceleration improves speeds 2–4× over CPU-only inference.
Yes. LM Studio supports AMD Radeon iGPUs via Vulkan backend on Windows. On a Radeon 890M (in Ryzen AI 9 HX 370 machines), speeds are 30–40 t/s for Mistral 7B Q4_K_M. On a Radeon 8060S (GMKtec EVO-X2, Strix Halo), speeds reach 55–65 t/s. GPU detection is automatic in LM Studio. Set “GPU Offload” to “Max” in the model loader and select “Vulkan” as the backend in Developer Mode settings.
LM Studio has a full graphical interface — model browser, chat UI, visual settings. Best for beginners. Ollama is command-line first — simpler for developers who need API access. LM Studio 0.4.0 added headless/server mode (llmster) that works without the GUI, closing the gap with Ollama for server deployments. Both use llama.cpp for inference and support the same GGUF model files. Both are free. Many users install both.
With Vulkan backend on a Ryzen AI 9 HX 370 mini PC (32GB, Radeon 890M): Mistral 7B Q4_K_M ≈ 30–40 t/s · Qwen3 14B Q4_K_M ≈ 18–25 t/s · Qwen3 32B Q4_K_M ≈ 8–12 t/s. On a GMKtec EVO-X2 (128GB, Radeon 8060S): Mistral 7B ≈ 55–65 t/s · Llama 3.1 70B Q4 ≈ 18–25 t/s. Speeds below 15 t/s (N150 machines) are usable but feel slow for interactive chat.
Yes. LM Studio is free for both personal and commercial use as of January 2026 (LM Studio team announcement). No license purchase, no subscription, no usage limits. You download it from lmstudio.ai, use it, and the only cost is the hardware and electricity to run it. Enterprise plans exist for teams needing support contracts, but the base software is fully free.
Yes. Start the LM Studio server (or use llmster in headless mode), set a static IP on your mini PC, and access the API from any device on your LAN using http://[mini-PC-IP]:1234. You can use Open WebUI (a free web interface for local LLMs) as a browser-based chat interface. Access it from any device on your network without installing anything additional.
Sources
MiniPCDeals.net Editorial Team

LM Studio version information and features from lmstudio.ai/blog (LM Studio 0.4.0, January 28, 2026). AMD support confirmation from AMD Developer Resources (April 2026, Gemma 4 article). Token speed benchmarks from r/LocalLLaMA community reports and llama.cpp GitHub benchmark threads (Radeon 780M, 890M, 8060S, April 2026). LM Studio system requirements from official LM Studio documentation. Vulkan backend AMD support confirmed in LM Studio 0.3.19+ changelog.