How to Run Ollama on a Mini PC — Complete 2026 Setup Guide
Ollama is free, takes 5 minutes to install, and turns any modern mini PC into a private AI assistant. No cloud, no subscription, no data leaving your device. This guide covers the full setup on Windows — including how to enable AMD GPU acceleration with the Vulkan backend on Ryzen AI mini PCs.
Install Ollama in 3 steps: (1) Download OllamaSetup.exe from ollama.com/download and run it. (2) Open PowerShell and type ollama run mistral — your first model downloads and starts. (3) For AMD GPU acceleration on Ryzen AI mini PCs, add the environment variable OLLAMA_VULKAN=1 before restarting. That’s it. Total time: under 5 minutes.
What Is Ollama and Why Use It on a Mini PC?
Ollama is a free, open-source tool that makes running AI language models on your own hardware as simple as typing one command. Once installed, it downloads and manages models locally — nothing is sent to any server, ever.
Think of Ollama as Docker for AI models. You type ollama pull llama3 and it downloads Meta’s Llama model to your machine. You type ollama run mistral and you’re chatting with a local AI. No API key. No subscription. No internet required after the model is downloaded.
Mini PCs are ideal hosts for Ollama for three reasons. First, they run 24/7 at 15–35 watts — far cheaper than leaving a gaming desktop on. Second, their unified memory architecture (especially on AMD Ryzen AI chips) means the GPU and CPU share the same RAM pool, which matters enormously for AI inference. And third, a mini PC behind a monitor is invisible and silent — a dedicated local AI server that takes zero desk space.
ollama.com/library, and periodic update checks (which can be disabled). Your prompts, conversations, and responses never leave your machine. After download, all inference is entirely local. This is confirmed in Ollama’s published documentation and source code. For a broader look at the privacy landscape for local AI on mini PCs, see our local AI security guide.What Hardware Do You Need?
Ollama runs on any modern Windows PC with at least 8 GB of RAM. A mini PC with 32 GB RAM and a Ryzen AI 9 HX 370 (Radeon 890M iGPU) is the sweet spot for interactive use with 7B–32B models.
The most important spec for Ollama is RAM, not CPU speed. Model weights have to fit in memory. The table below shows what each RAM tier realistically handles:
| Mini PC RAM | Largest usable model | Speed on 7B (Vulkan) | Example mini PC |
|---|---|---|---|
| 16 GB | 7B–14B (Q4) | ~10–18 t/s | KAMRUI Pinova P2 |
| 32 GB | 32B (Q4) | ~14–20 t/s | Peladn HO5, Beelink SER9 Pro AI |
| 64 GB | 32B full + 70B Q4 | ~18–25 t/s | ACEMAGIC Retro X5 (upgraded) |
| 128 GB | 70B Q4, 235B Q2 | ~55–65 t/s | GMKtec EVO-X2 (Strix Halo) |
Speed figures for Radeon 890M / Radeon 8060S iGPU with Vulkan backend enabled. CPU-only is 2–5× slower. Figures sourced from community benchmarks and published Ollama documentation.
Installing Ollama on Windows — Step by Step
The Windows installation is a standard .exe installer. No admin rights needed, no manual PATH setup, no dependencies. Total time: under 2 minutes.
OllamaSetup.exe (approximately 200 MB). No account required.OllamaSetup.exe. Windows may show a SmartScreen warning — click “More info” then “Run anyway”. Ollama installs to your user folder (C:\Users\YourName\AppData\Local\Programs\Ollama) and does not require administrator privileges. Accept the defaults and click Install.Windows + R, type powershell, press Enter) and run:ollama --versionollama version 0.6.x. If you see “command not found”, close PowerShell and reopen it — it needs to reload the PATH after installation. If the problem persists, restart your PC.Enabling AMD GPU Acceleration (Vulkan Backend)
AMD integrated GPUs (Radeon 890M, Radeon 8060S) are not supported by AMD’s ROCm library on Windows for integrated graphics. The solution is Ollama’s Vulkan backend — an experimental but functional alternative that works with all AMD GPUs and requires no extra driver installation.
This is the most important configuration step for mini PC owners. Without it, Ollama falls back to CPU-only inference — which works but is 3–5× slower. With Vulkan enabled, the Radeon 890M delivers approximately 14–20 tokens/second on Mistral 7B, according to community benchmarks. The Radeon 8060S (GMKtec EVO-X2) is significantly faster at 55–65 t/s.
How to enable Vulkan — Step by step
Windows + R, type sysdm.cpl, press Enter. Click the Advanced tab, then Environment Variables. Under “System variables” (the lower panel), click New.Variable name: OLLAMA_VULKAN Variable value: 1
ollama run mistral. While it’s generating a response, open Task Manager (Ctrl + Shift + Esc), click the Performance tab, and select your GPU. If the Vulkan backend is working, you will see GPU memory usage increase as the model loads. You should see 4–6 GB of GPU memory in use for a 7B model.Pulling and Running Your First Model
One command pulls a model, one command runs it. After the initial download, models run instantly — no internet required.
Open PowerShell and run the following to download and start Mistral 7B — a good first choice that balances quality and speed:
# Download Mistral 7B (~4.1 GB) and start chatting immediately ollama run mistral
Ollama will download the model the first time. Progress is shown in the terminal. Once downloaded, you’ll see a chat prompt:
>>> Send a message (/? for help)Type any message and press Enter. Your mini PC is now running AI locally. To exit, type /bye or press Ctrl + D.
Useful commands to know
# List all models you've downloaded ollama list # Download a model without running it immediately ollama pull qwen3:14b # Check which models are currently loaded in memory ollama ps # Delete a model to free disk space ollama rm mistral # Check Ollama version ollama --version
C:\Users\YourName\.ollama\models. A 7B model takes approximately 4–5 GB; a 32B model takes approximately 20 GB. To move storage to a different drive (e.g. your second M.2 SSD), add a system environment variable: OLLAMA_MODELS = D:\OllamaModels (using your desired path). Restart Ollama after the change. Models downloaded after this change will go to the new location.Which Model Should You Run?
For a 32 GB mini PC, start with Mistral 7B for general use or Qwen3 14B for better reasoning. The model you choose depends on your RAM, use case, and how much you care about speed vs quality.
| Model | Size | Pull command | Best for | Min RAM |
|---|---|---|---|---|
| Mistral 7B | 4.1 GB | ollama pull mistral | General chat, writing, summaries | 16 GB |
| Llama 3.3 8B | 4.7 GB | ollama pull llama3.3:8b | Balanced quality / speed | 16 GB |
| Qwen3 14B | 9.0 GB | ollama pull qwen3:14b | Reasoning, analysis, code | 16 GB |
| Qwen2.5-Coder 7B | 4.7 GB | ollama pull qwen2.5-coder:7b | Programming assistance | 16 GB |
| DeepSeek-R1 8B | 4.9 GB | ollama pull deepseek-r1:8b | Step-by-step reasoning, maths | 16 GB |
| Qwen3 32B | 20 GB | ollama pull qwen3:32b | High-quality reasoning, complex tasks | 32 GB |
| Phi-4 Mini | 2.5 GB | ollama pull phi4-mini | Light tasks on 8–16 GB RAM | 8 GB |
The full model library is at ollama.com/library. Models listed there include all quantization variants — the default pull (e.g. ollama pull qwen3:14b) downloads the Q4_K_M quantization, which is the standard recommended quality/size trade-off.
For a complete comparison of how these models perform on specific mini PCs — including tokens/second benchmarks for the Peladn HO5, GMKtec EVO-X2, and Beelink SER9 Pro AI — see our best mini PC for local AI 2026 guide.
Add a Chat Interface with Open WebUI
The command line works, but Open WebUI gives you a ChatGPT-style browser interface connected to your local Ollama. It’s free, open-source, and runs on your mini PC alongside Ollama.
Open WebUI is the most popular front-end for Ollama. It runs as a local web application at http://localhost:3000 and automatically connects to your Ollama instance. You get conversation history, model switching, file uploads, and a familiar chat interface — all running locally.
The quickest way to install it is with Docker Desktop. If you don’t have Docker installed, download it from docker.com, then run:
# Run Open WebUI — connects to Ollama automatically docker run -d -p 3000:8080 \ --add-host=host.docker.internal:host-gateway \ -v open-webui:/app/backend/data \ --name open-webui \ --restart always \ ghcr.io/open-webui/open-webui:main
Open your browser and go to http://localhost:3000. Create an account (local only — no sign-up to any service), and Open WebUI will detect your Ollama models automatically.
OLLAMA_HOST = 0.0.0.0. This binds Ollama to all network interfaces. Important security note: this exposes Ollama to your entire local network without any authentication. Only do this on a trusted home network, and consider adding a Windows Firewall rule to restrict access to your local IP range (192.168.1.0/24). See our local AI security guide for the full details on safe configuration.Troubleshooting Common Issues
GPU not being used (0 MB GPU memory)
The most common issue after enabling Vulkan. Check that: (1) OLLAMA_VULKAN=1 is set as a System variable (not a user variable), (2) you fully restarted your PC after adding it, and (3) your AMD driver is up to date (Adrenalin 24.x or later). On Windows, AMD GPU drivers include Vulkan support by default — no additional installation is needed.
To double-check the variable is active, open PowerShell and type $env:OLLAMA_VULKAN. It should return 1. If it returns nothing, the variable is not set correctly.
“model requires more system memory” error
The model is larger than available RAM. Switch to a smaller quantization: instead of ollama pull qwen3:32b, try ollama pull qwen3:32b-q2_K for a compressed version. Or pull a smaller model entirely — see the model table above.
Very slow inference (2–5 tokens/second)
You’re likely running on CPU only. Confirm Vulkan is enabled (see above). Also check that your BIOS has allocated dedicated VRAM to the iGPU — the default 512 MB is too small for GPU inference. Set it to 4 GB in your BIOS settings.
“ollama: command not found”
Close PowerShell and reopen it — the PATH needs to reload after installation. If the problem persists after reopening, restart your PC. If still not found, go to C:\Users\YourName\AppData\Local\Programs\Ollama and confirm ollama.exe is there.
Slow first response, fast after that
This is expected. The first time you run a model, Ollama loads the entire model from disk into RAM — this takes 5–30 seconds depending on model size and SSD speed. Subsequent responses in the same session are fast because the model stays loaded in memory.
Frequently Asked Questions
OLLAMA_VULKAN=1) accelerates this to approximately 14–20 tokens/second on the Radeon 890M iGPU, based on community benchmarks. CPU-only mode is functional but noticeably slower for interactive use.OLLAMA_VULKAN=1 as a system environment variable and restart. AMD’s ROCm library does not reliably support integrated Radeon graphics on Windows — only select discrete Radeon RX 6800+ and 7000 series cards. Vulkan works for iGPUs like the Radeon 890M and 8060S. No additional AMD driver installation is required on Windows — Vulkan support is included in standard AMD Adrenalin drivers.Installation steps verified against Ollama’s official documentation (docs.ollama.com) and the Ollama GitHub repository. Vulkan backend availability confirmed from Phoronix coverage of Ollama 0.12.6 release and official Ollama GPU documentation. AMD Radeon 890M Vulkan performance figures sourced from community benchmarks published on dasroot.net (March 2026). The claim that ROCm does not support Radeon 890M integrated graphics is based on AMD’s official ROCm GPU support matrix.
