Self-Host Ollama + Open WebUI: Run Local LLMs on Your Own Server
Run powerful open-source LLMs entirely on your own hardware. No API fees, zero data leaving your server, and it's way easier to set up than you'd think.
I’ve been messing with LLMs since early 2025, and honestly? The biggest pain point was always cost.
ChatGPT Plus is what, $20/month now? Claude Pro is another $20. If you want coding-specific models, throw in another subscription. Multiply that across a team or family, and you’re bleeding money just to ask a bot to rewrite emails.
I tried the API route too — pay-per-token seemed smart until I racked up a $80 bill in one weekend because I forgot to set a usage cap. My wallet still hasn’t forgiven me.
So I went looking for a self-hosted alternative. What I found was Ollama + Open WebUI, and honestly, it’s changed how I use AI entirely.
What Are We Building Here?
Two Docker containers, that’s it:
- Ollama — the engine that runs the models. Supports everything from tiny 1B-parameter models (fast, dumb) up to massive 70B models (slow, smart).
- Open WebUI — a beautiful, ChatGPT-like interface sitting on top of Ollama. Web search, file uploads, multi-model chats, the works.
No GPU required if you stick with smaller models. I run mine on a $40/month VPS with just CPU — works fine for coding assistance and writing.
Step 1: Get a Server
Before you do anything, you need a box to run this on. If you already have a homelab, great. If not, grab a VPS somewhere.
I use a VPS with 16GB RAM and 4 vCPUs. It’s overkill for small models but gives me room to run 7B-13B models comfortably. Minimum spec? 4GB RAM, 2 cores, 20GB disk. You’ll be surprised what runs on cheap hardware.
Step 2: Install Docker + Docker Compose
If you’ve hung around this blog, you probably already have Docker. If not:
curl -fsSL https://get.docker.com | sh
sudo usermod -aG docker $USER
Log out and back in, then verify with docker ps. Simple as that.
Step 3: Deploy Ollama and Open WebUI
Create a directory and a docker-compose.yml:
version: '3.8'
services:
ollama:
image: ollama/ollama:latest
container_name: ollama
volumes:
- ./ollama_data:/root/.ollama
ports:
- "11434:11434"
restart: unless-stopped
open-webui:
image: ghcr.io/open-webui/open-webui:main
container_name: open-webui
depends_on:
- ollama
volumes:
- ./open-webui_data:/app/backend/data
ports:
- "3000:8080"
environment:
- OLLAMA_BASE_URL=http://ollama:11434
restart: unless-stopped
Now run it:
docker compose up -d
That’s it. Two commands and you’re done. Open WebUI is now running on port 3000.
Step 4: Pull a Model
Head to http://your-server:3000, create an account, and you’ll see a clean chat interface. But it’s empty — you need to pull a model first.
I’d start with Llama 3.1 8B — it’s the sweet spot for most people. Fast, good at coding, decent reasoning:
docker exec -it ollama ollama pull llama3.1:8b
This downloads about 4.7GB. Grab a coffee. Or two.
Once it’s done, refresh Open WebUI and you’ll see the model in the dropdown. Start chatting.
Here’s what I run depending on the task:
| Model | Size | RAM Needed | Best For |
|---|---|---|---|
| Llama 3.1 8B | 4.7GB | 8GB | General purpose, coding |
| Mistral 7B | 4.1GB | 8GB | Fast, good at reasoning |
| Qwen 2.5 7B | 4.3GB | 8GB | Excellent at writing |
| DeepSeek Coder V2 | 8.5GB | 16GB | Code generation |
| Llama 3.1 70B | 39GB | 64GB | Best quality, needs serious hardware |
Why This Beats OpenAI (for Most Things)
I still use ChatGPT sometimes. But for daily work, my self-hosted setup wins for three reasons:
No censorship. I can ask my local model anything — analyze sensitive documents, discuss controversial code approaches, whatever. There’s no content policy shutting me down mid-sentence.
No data leaks. This is the big one. Every prompt I send to ChatGPT goes to some server in Virginia. My local model never leaves my VPS. If you’re working with client data, proprietary code, or anything private, this alone justifies the setup.
No surprise bills. $0/month, no matter how many queries I send. I’ve left agents running overnight on autopilot and woke up to zero charges. Try that with an API.
What I Wish I Knew Before Starting
A few things I learned the hard way:
Big models are slow on CPU. Like, painfully slow. A 70B model on CPU generates about 1 token per second. Llama 3.1 8B does 15-20 tokens/second on a decent CPU. Stick to 7B-13B unless you’ve got a GPU.
Open WebUI has a built-in RAG system. You can upload PDFs, documents, even whole codebases, and the model will answer questions from your data. Game changer for documentation.
Set up a reverse proxy. Don’t expose port 3000 directly. Slap Nginx Proxy Manager or Traefik in front, add HTTPS with let’s encrypt, and consider setting up authentication. Or better yet, put it behind a VPN:
Storage adds up quick. Llama 3.1 8B is 4.7GB. If you install a few models, you’ll chew through 20-30GB fast. Plan your disk accordingly.
What’s Next?
Once you’ve got the basics running, try:
- Enable web search in Open WebUI settings — your model can browse the internet for current info
- Try Ollama’s multimodal models — Llama 3.2 Vision can analyze images
- Set up model aliases so you can switch between cheap/fast and expensive/smart models on the fly
- Hook it up to n8n — I’ve got an automation that routes support emails through my local LLM for drafting replies
Self-hosting an LLM isn’t some exotic thing reserved for people with racks of GPUs. It’s a Docker compose file and a couple of commands. Everyone should run their own AI.
Go give it a shot. You’ll wonder why you didn’t do it sooner.
Stay in the loop 📬
Get self-hosting tutorials, tool reviews, and infrastructure tips delivered to your inbox. No spam, unsubscribe anytime.
Join 0 self-hosters. Free forever.