Setup Guide April 2026 10 min read

How to Run Ollama on a Mini PC — Complete 2026 Setup Guide

Ollama is free, takes 5 minutes to install, and turns any modern mini PC into a private AI assistant. No cloud, no subscription, no data leaving your device. This guide covers the full setup on Windows — including how to enable AMD GPU acceleration with the Vulkan backend on Ryzen AI mini PCs.

By MiniPCDeals.net
10 min · ~2,600 words
ℹ️This article contains affiliate links. We earn a small commission on qualifying purchases — at no extra cost to you.
📌 Quick Answer

Install Ollama in 3 steps: (1) Download OllamaSetup.exe from ollama.com/download and run it. (2) Open PowerShell and type ollama run mistral — your first model downloads and starts. (3) For AMD GPU acceleration on Ryzen AI mini PCs, add the environment variable OLLAMA_VULKAN=1 before restarting. That’s it. Total time: under 5 minutes.

What Is Ollama and Why Use It on a Mini PC?

Ollama is a free, open-source tool that makes running AI language models on your own hardware as simple as typing one command. Once installed, it downloads and manages models locally — nothing is sent to any server, ever.

Think of Ollama as Docker for AI models. You type ollama pull llama3 and it downloads Meta’s Llama model to your machine. You type ollama run mistral and you’re chatting with a local AI. No API key. No subscription. No internet required after the model is downloaded.

Mini PCs are ideal hosts for Ollama for three reasons. First, they run 24/7 at 15–35 watts — far cheaper than leaving a gaming desktop on. Second, their unified memory architecture (especially on AMD Ryzen AI chips) means the GPU and CPU share the same RAM pool, which matters enormously for AI inference. And third, a mini PC behind a monitor is invisible and silent — a dedicated local AI server that takes zero desk space.

🔒
What Ollama does and does not send over the internet
Ollama makes two types of network requests: the initial model download from ollama.com/library, and periodic update checks (which can be disabled). Your prompts, conversations, and responses never leave your machine. After download, all inference is entirely local. This is confirmed in Ollama’s published documentation and source code. For a broader look at the privacy landscape for local AI on mini PCs, see our local AI security guide.

What Hardware Do You Need?

Ollama runs on any modern Windows PC with at least 8 GB of RAM. A mini PC with 32 GB RAM and a Ryzen AI 9 HX 370 (Radeon 890M iGPU) is the sweet spot for interactive use with 7B–32B models.

The most important spec for Ollama is RAM, not CPU speed. Model weights have to fit in memory. The table below shows what each RAM tier realistically handles:

Mini PC RAMLargest usable modelSpeed on 7B (Vulkan)Example mini PC
16 GB7B–14B (Q4)~10–18 t/sKAMRUI Pinova P2
32 GB32B (Q4)~14–20 t/sPeladn HO5, Beelink SER9 Pro AI
64 GB32B full + 70B Q4~18–25 t/sACEMAGIC Retro X5 (upgraded)
128 GB70B Q4, 235B Q2~55–65 t/sGMKtec EVO-X2 (Strix Halo)

Speed figures for Radeon 890M / Radeon 8060S iGPU with Vulkan backend enabled. CPU-only is 2–5× slower. Figures sourced from community benchmarks and published Ollama documentation.

💡
Allocate more VRAM in BIOS for better performance
Ryzen AI mini PCs typically default to 512 MB – 2 GB of dedicated VRAM for the integrated GPU. For Ollama, increasing this to 4 GB in your BIOS (Advanced → Advanced CPU Configuration → VRAM) gives the Radeon iGPU more dedicated memory, improving inference speed for larger models. Exact BIOS menu names vary by manufacturer.
🧠
Best mini PC for Ollama under $1,000
Peladn HO5 — 32GB · Radeon 890M · Mistral 7B at 35 t/s · $940
The best value mini PC for running Ollama in 2026. Run models up to 32B, with Vulkan GPU acceleration on the Radeon 890M. See our full local AI guide for all options.
Affiliate link — no extra cost to you.
Check Price

Installing Ollama on Windows — Step by Step

The Windows installation is a standard .exe installer. No admin rights needed, no manual PATH setup, no dependencies. Total time: under 2 minutes.

1
Download the installer
Go to ollama.com/download and click the Windows button. This downloads OllamaSetup.exe (approximately 200 MB). No account required.
2
Run the installer
Double-click OllamaSetup.exe. Windows may show a SmartScreen warning — click “More info” then “Run anyway”. Ollama installs to your user folder (C:\Users\YourName\AppData\Local\Programs\Ollama) and does not require administrator privileges. Accept the defaults and click Install.
Once installed, Ollama starts automatically as a background service. Look for the llama icon in your system tray (bottom-right corner of the taskbar).
3
Verify the installation
Open PowerShell (press Windows + R, type powershell, press Enter) and run:
ollama --version
You should see output like ollama version 0.6.x. If you see “command not found”, close PowerShell and reopen it — it needs to reload the PATH after installation. If the problem persists, restart your PC.
Ollama runs silently in the background
After installation, Ollama is a background service that starts with Windows. It uses minimal CPU and RAM when idle (no model is loaded). Models are only loaded into memory when you actively run one. You can check its status anytime from the system tray icon — right-click it to see options including “Quit Ollama” if you want to stop it entirely.

Enabling AMD GPU Acceleration (Vulkan Backend)

AMD integrated GPUs (Radeon 890M, Radeon 8060S) are not supported by AMD’s ROCm library on Windows for integrated graphics. The solution is Ollama’s Vulkan backend — an experimental but functional alternative that works with all AMD GPUs and requires no extra driver installation.

This is the most important configuration step for mini PC owners. Without it, Ollama falls back to CPU-only inference — which works but is 3–5× slower. With Vulkan enabled, the Radeon 890M delivers approximately 14–20 tokens/second on Mistral 7B, according to community benchmarks. The Radeon 8060S (GMKtec EVO-X2) is significantly faster at 55–65 t/s.

⚠️
Vulkan is marked “experimental” by Ollama — what that means in practice
Ollama’s Vulkan backend (available since v0.12.6) is marked experimental in the official documentation. In practice, it works reliably for standard inference on Windows with AMD iGPUs — community users report consistent results. “Experimental” here means it may not work on every GPU configuration and may have edge-case issues with very large models or unusual quantizations. For home use with 7B–32B models, it is stable enough to use daily.

How to enable Vulkan — Step by step

1
Open Windows Environment Variables
Press Windows + R, type sysdm.cpl, press Enter. Click the Advanced tab, then Environment Variables. Under “System variables” (the lower panel), click New.
2
Add the OLLAMA_VULKAN variable
In the “New System Variable” dialog, enter:
Variable name:  OLLAMA_VULKAN
Variable value: 1
Click OK three times to close all dialogs.
3
Restart your PC
The environment variable only takes effect after a full restart. Signing out and back in is not sufficient — a full reboot is required for Windows system services to pick up the change.
4
Verify GPU is being used
After the restart, run a model with ollama run mistral. While it’s generating a response, open Task Manager (Ctrl + Shift + Esc), click the Performance tab, and select your GPU. If the Vulkan backend is working, you will see GPU memory usage increase as the model loads. You should see 4–6 GB of GPU memory in use for a 7B model.
If GPU memory shows 0 MB, the variable may not have been set as a System variable (not a user variable) — confirm it appears in the “System variables” panel, not the “User variables” panel.

Pulling and Running Your First Model

One command pulls a model, one command runs it. After the initial download, models run instantly — no internet required.

Open PowerShell and run the following to download and start Mistral 7B — a good first choice that balances quality and speed:

# Download Mistral 7B (~4.1 GB) and start chatting immediately
ollama run mistral

Ollama will download the model the first time. Progress is shown in the terminal. Once downloaded, you’ll see a chat prompt:

>>> Send a message (/? for help)

Type any message and press Enter. Your mini PC is now running AI locally. To exit, type /bye or press Ctrl + D.

Useful commands to know

# List all models you've downloaded
ollama list

# Download a model without running it immediately
ollama pull qwen3:14b

# Check which models are currently loaded in memory
ollama ps

# Delete a model to free disk space
ollama rm mistral

# Check Ollama version
ollama --version
💡
Where models are stored — and how to move them to another drive
By default, Ollama stores models at C:\Users\YourName\.ollama\models. A 7B model takes approximately 4–5 GB; a 32B model takes approximately 20 GB. To move storage to a different drive (e.g. your second M.2 SSD), add a system environment variable: OLLAMA_MODELS = D:\OllamaModels (using your desired path). Restart Ollama after the change. Models downloaded after this change will go to the new location.

Which Model Should You Run?

For a 32 GB mini PC, start with Mistral 7B for general use or Qwen3 14B for better reasoning. The model you choose depends on your RAM, use case, and how much you care about speed vs quality.

ModelSizePull commandBest forMin RAM
Mistral 7B4.1 GBollama pull mistralGeneral chat, writing, summaries16 GB
Llama 3.3 8B4.7 GBollama pull llama3.3:8bBalanced quality / speed16 GB
Qwen3 14B9.0 GBollama pull qwen3:14bReasoning, analysis, code16 GB
Qwen2.5-Coder 7B4.7 GBollama pull qwen2.5-coder:7bProgramming assistance16 GB
DeepSeek-R1 8B4.9 GBollama pull deepseek-r1:8bStep-by-step reasoning, maths16 GB
Qwen3 32B20 GBollama pull qwen3:32bHigh-quality reasoning, complex tasks32 GB
Phi-4 Mini2.5 GBollama pull phi4-miniLight tasks on 8–16 GB RAM8 GB

The full model library is at ollama.com/library. Models listed there include all quantization variants — the default pull (e.g. ollama pull qwen3:14b) downloads the Q4_K_M quantization, which is the standard recommended quality/size trade-off.

For a complete comparison of how these models perform on specific mini PCs — including tokens/second benchmarks for the Peladn HO5, GMKtec EVO-X2, and Beelink SER9 Pro AI — see our best mini PC for local AI 2026 guide.

Add a Chat Interface with Open WebUI

The command line works, but Open WebUI gives you a ChatGPT-style browser interface connected to your local Ollama. It’s free, open-source, and runs on your mini PC alongside Ollama.

Open WebUI is the most popular front-end for Ollama. It runs as a local web application at http://localhost:3000 and automatically connects to your Ollama instance. You get conversation history, model switching, file uploads, and a familiar chat interface — all running locally.

The quickest way to install it is with Docker Desktop. If you don’t have Docker installed, download it from docker.com, then run:

# Run Open WebUI — connects to Ollama automatically
docker run -d -p 3000:8080 \
  --add-host=host.docker.internal:host-gateway \
  -v open-webui:/app/backend/data \
  --name open-webui \
  --restart always \
  ghcr.io/open-webui/open-webui:main

Open your browser and go to http://localhost:3000. Create an account (local only — no sign-up to any service), and Open WebUI will detect your Ollama models automatically.

💻
Access Ollama from other devices on your network
If you want to query your mini PC’s models from your phone or laptop on the same WiFi, add a second system environment variable: OLLAMA_HOST = 0.0.0.0. This binds Ollama to all network interfaces. Important security note: this exposes Ollama to your entire local network without any authentication. Only do this on a trusted home network, and consider adding a Windows Firewall rule to restrict access to your local IP range (192.168.1.0/24). See our local AI security guide for the full details on safe configuration.

Troubleshooting Common Issues

GPU not being used (0 MB GPU memory)

The most common issue after enabling Vulkan. Check that: (1) OLLAMA_VULKAN=1 is set as a System variable (not a user variable), (2) you fully restarted your PC after adding it, and (3) your AMD driver is up to date (Adrenalin 24.x or later). On Windows, AMD GPU drivers include Vulkan support by default — no additional installation is needed.

To double-check the variable is active, open PowerShell and type $env:OLLAMA_VULKAN. It should return 1. If it returns nothing, the variable is not set correctly.

“model requires more system memory” error

The model is larger than available RAM. Switch to a smaller quantization: instead of ollama pull qwen3:32b, try ollama pull qwen3:32b-q2_K for a compressed version. Or pull a smaller model entirely — see the model table above.

Very slow inference (2–5 tokens/second)

You’re likely running on CPU only. Confirm Vulkan is enabled (see above). Also check that your BIOS has allocated dedicated VRAM to the iGPU — the default 512 MB is too small for GPU inference. Set it to 4 GB in your BIOS settings.

“ollama: command not found”

Close PowerShell and reopen it — the PATH needs to reload after installation. If the problem persists after reopening, restart your PC. If still not found, go to C:\Users\YourName\AppData\Local\Programs\Ollama and confirm ollama.exe is there.

Slow first response, fast after that

This is expected. The first time you run a model, Ollama loads the entire model from disk into RAM — this takes 5–30 seconds depending on model size and SSD speed. Subsequent responses in the same session are fast because the model stays loaded in memory.

Frequently Asked Questions

Yes. Ollama runs on any modern mini PC using CPU-only inference. On a Ryzen AI 9 HX 370 mini PC, CPU-only inference gives approximately 5–10 tokens/second for a 7B model. Enabling the AMD Vulkan backend (OLLAMA_VULKAN=1) accelerates this to approximately 14–20 tokens/second on the Radeon 890M iGPU, based on community benchmarks. CPU-only mode is functional but noticeably slower for interactive use.
Yes, via the experimental Vulkan backend (Ollama v0.12.6+). Set OLLAMA_VULKAN=1 as a system environment variable and restart. AMD’s ROCm library does not reliably support integrated Radeon graphics on Windows — only select discrete Radeon RX 6800+ and 7000 series cards. Vulkan works for iGPUs like the Radeon 890M and 8060S. No additional AMD driver installation is required on Windows — Vulkan support is included in standard AMD Adrenalin drivers.
Yes. Ollama is open-source software released under the MIT licence. There are no usage fees, no per-token charges, and no subscription. The only cost is the hardware you run it on and the electricity. Open-source models like Mistral, Llama, and Qwen that run through Ollama are also free — though some have licence restrictions for commercial use above certain thresholds. For personal home use, all major models are unrestricted.
Yes. Ollama only loads a model into RAM when you actively send it a prompt. When idle, it uses minimal CPU and essentially zero GPU. On a 32 GB mini PC, running a 7B model (~6 GB RAM) leaves plenty of headroom for Office 365, a browser with 20 tabs, and a video call simultaneously. Larger models (32B, ~20 GB RAM) will leave less headroom — close other applications if you notice slowdowns.
For models up to 32B: the Peladn HO5 (Ryzen AI 9 HX 370, 32 GB, ~$940) delivers Mistral 7B at 30–40 t/s and Qwen3 32B at 8–12 t/s. For 70B+ models: the GMKtec EVO-X2 128 GB (~$1,999) is the only mini PC with enough memory to run the largest publicly available models. See our full best mini PC for local AI guide for detailed benchmarks and picks.
🧠
Sources & Methodology
MiniPCDeals.net Editorial Team

Installation steps verified against Ollama’s official documentation (docs.ollama.com) and the Ollama GitHub repository. Vulkan backend availability confirmed from Phoronix coverage of Ollama 0.12.6 release and official Ollama GPU documentation. AMD Radeon 890M Vulkan performance figures sourced from community benchmarks published on dasroot.net (March 2026). The claim that ROCm does not support Radeon 890M integrated graphics is based on AMD’s official ROCm GPU support matrix.