Self-Hosted Open WebUI: A Real Chat Interface for Your Local LLMs

Swift Host Team May 31, 2026 4 min read

You installed Ollama on a VPS, pulled a model, and typed prompts in the terminal. It works — and it feels nothing like the chat tools your team actually wants to use. That's the gap Open WebUI fills.

Open WebUI is a self-hosted web front end for local and remote LLMs. Point it at Ollama on the same machine, wire in an OpenAI-compatible API if you want, and people get a familiar chat UI: history, model switching, file uploads, user accounts. It's crossed 139k GitHub stars for a reason — this isn't a weekend hack anymore; it's what a lot of teams run in production behind a VPN.

What it actually does

At its core, Open WebUI is the interface layer. Ollama (or another backend) does the inference; Open WebUI handles everything humans touch.

Conversations are saved and searchable. You can swap models mid-thread — start with a fast 7B model for drafting, switch to something heavier for a careful pass. Upload PDFs, spreadsheets, or markdown and ask questions against them; RAG is built in, with embeddings often handled through Ollama models like nomic-embed-text. Teams use the # command to pull documents from a shared library without re-uploading every time.

There's multi-user auth with roles, custom system prompts, voice input, optional web search plugins, and hooks for image generation if you run ComfyUI or Automatic1111 nearby. You can also point it at external APIs — but the sweet spot is still models and documents that never leave your infrastructure.

Why self-host instead of ChatGPT at work?

Three reasons we hear from Canadian clients, in order of frequency:

Privacy. Legal memos, HR tickets, internal financials, customer support logs — none of that belongs in a shared US SaaS account if you can avoid it. Self-hosted Open WebUI + Ollama keeps prompts, uploads, and chat history on disks you control. Put the VPS in a Canadian data centre and you've got a straightforward story for PIPEDA-conscious teams.

Cost at volume. API pricing adds up when twenty people are chatting all day. Local models on your own hardware aren't free — you pay for RAM, GPU, and someone's time to maintain the stack — but there's no per-token meter running in the background.

Access to internal stuff. RAG over your wiki, contracts, or runbooks only works if the retrieval pipeline lives inside your network. Open WebUI on a private host, talking to Ollama, reading files from a volume you mount — that's a setup SaaS chat will never replicate without sending those documents upstream.

What running it takes

Docker is the path of least resistance. A typical layout: Ollama on the host or in a sibling container, Open WebUI in Docker with OLLAMA_BASE_URL pointed at it, and a persistent volume on /app/backend/data so you don't wipe users and chats on restart. That volume is non-negotiable — without it, you're rebuilding accounts after every container update.

Sizing depends on your models more than Open WebUI itself. The UI is lightweight; the RAM hunger comes from Ollama. A CPU-only box with 8 GB can run smaller models for a handful of users. Serious team use with 13B+ models or GPU inference wants 16–32 GB RAM, or a GPU instance with the :cuda image if you're doing local generation at speed.

Put HTTPS in front — reverse proxy with TLS, restrict admin signup, change default secrets on first boot. If the instance is on the public internet, enable auth and consider IP allowlisting or VPN; an open Open WebUI with Ollama behind it is an expensive invitation for crypto miners.

For production, plan on Postgres instead of the default SQLite if multiple users hammer it daily, and back up both the Open WebUI data volume and your Ollama model directory. Updates are frequent; read release notes before pulling :main blind in production.

Who it's for (and who should skip it)

Good fit: teams experimenting with local LLMs who need a real UI, agencies offering private AI sandboxes for clients, ops and support leads who want document Q&A without uploading to OpenAI, anyone pairing Ollama with Canadian hosting for data residency.

Maybe skip it: if one person is happy in the terminal and will never share access — Ollama alone is enough. If you need frontier-model quality (GPT-4-class reasoning on hard problems), local 7B models will disappoint; hybrid setups (Open WebUI front end, cloud API backend) exist, but that's a different privacy tradeoff.

Hosting it in Canada

We deploy Open WebUI on Docker stacks for clients who want Ollama, persistent volumes, backups, and TLS handled — on Canadian VPS or dedicated hardware, documented and monitored. GPU instances cost more, but for a team doing daily document Q&A, the alternative is often API spend that never stops growing.

Tell us how many users and which models you're targeting and we'll size something honest. No generic tier chart — just RAM, disk, and whether you actually need a GPU or you're fine with smaller models on CPU.

What it actually does

Why self-host instead of ChatGPT at work?

What running it takes

Who it's for (and who should skip it)

Hosting it in Canada

Related Articles

Self-Hosted OneDev: Git, CI/CD, and Kanban Without the GitLab Footprint

Self-Hosted Cachet: The Status Page Your Clients Check Before Calling You

Self-Hosted Gotify: Push Alerts to Your Phone Without a SaaS Middleman

Need Help With Your Hosting?