Self-Hosted Jan: ChatGPT Feel, Local Models, Your Hardware

Your team wants ChatGPT — the actual chat experience, not a curl command against Ollama. But legal said no to pasting client documents into a US SaaS account. IT said no to another $30/seat subscription. So someone installed Ollama, opened the terminal, and watched adoption die in a week.

Jan is the app-shaped answer. ~43k GitHub stars, Apache 2.0, and built to feel like the chat tools people already know — except the models run on your hardware when you want them to. Download Llama, Qwen, Gemma, or GPT-OSS builds from HuggingFace inside the app. No terminal required for the person asking questions.

What it actually does

Jan is a desktop application — Windows, macOS, Linux (deb and AppImage). Install it, pick a model, start chatting. That's the core loop. Under the hood it uses llama.cpp and Tauri, but most users never touch that layer.

Model hub built in. Browse and download quantized models from HuggingFace without hunting for GGUF files on forums. Jan handles the boring parts — which build fits your RAM, where files land on disk.

Custom assistants. System prompts and personas for recurring tasks: code review tone, support draft replies, internal wiki summarizer. Save them and swap without retyping instructions every session.

OpenAI-compatible local API. Flip on Settings → Local API Server and Jan listens on localhost:1337 with a /v1/chat/completions endpoint. Point n8n, your own scripts, or another app at it — same Bearer-token pattern as OpenAI. Keep the host on 127.0.0.1 for solo use; bind to 0.0.0.0 only when you mean to share it on a network you control.

MCP support. Model Context Protocol hooks for agent-style workflows — tools, search, integrations — without sending everything to a cloud orchestrator by default.

Hybrid when you choose. Jan can also connect to cloud APIs (OpenAI, Anthropic, Groq, Mistral, and others). The point isn't purity — it's that you decide which threads stay local and which hit a remote endpoint.

Why self-host local AI?

Prompts and documents stay on your disk. HR drafts, legal summaries, internal code — self-hosted inference means that text doesn't become someone else's training telemetry.

Canadian residency story. Run Jan on a GPU workstation in your office or on a Canadian VPS with enough RAM, and you can tell clients where inference happens. That's harder to argue when every chat goes through chatgpt.com.

Flat hardware cost. No per-token bill. You pay for the machine — 16 GB RAM for 7B-class models, 32 GB for 13B, GPU strongly recommended on Windows for usable speed — and run as many chats as the box allows.

Not the same as Open WebUI. We covered Open WebUI as a browser-based team chat front end for Ollama. Jan is desktop-first: better for individuals and small teams who want an app icon, not a self-managed web portal. Open WebUI wins for multi-user browser access; Jan wins for "install and talk" simplicity with a built-in model store.

What running it takes

Jan's primary install path is the desktop package from jan.ai or GitHub Releases — not a one-line Docker container for the main app.

Rough RAM guidance from Jan's own docs: 8 GB for 3B models, 16 GB for 7B, 32 GB for 13B. Windows users benefit from NVIDIA, AMD, or Intel Arc GPU acceleration. macOS 13.6+ with Apple Silicon is a common sweet spot for local inference.

On a Canadian VPS or dedicated GPU box: install the Linux build, download models to local storage, enable the API server if other services on the same host need completions. Put TLS and auth in front if you expose port 1337 beyond localhost — an open local LLM API is a gift to crypto miners.

Team/server deployments: Jan also maintains Jan Server — a Docker Compose stack with Kong gateway, OpenAI-compatible LLM API, MCP tools, Keycloak auth, and optional vLLM for GPU inference. That's the path when you need multi-user OAuth and observability, not just a desktop install on one laptop. Plan 8 GB RAM minimum for the server stack; GPU if you're serving local models at scale.

Back up the model directory and Jan's data folder before OS upgrades. Quantized models are large — budget disk, not just RAM.

Who it's for (and who should skip it)

Good fit: professionals who want a ChatGPT-like app without cloud dependency, developers testing apps against a local OpenAI-compatible API, small teams on Canadian GPU hardware, anyone who tried Ollama in the terminal and wanted a real UI.

Maybe skip it: if you need a shared browser portal for twenty users — Open WebUI or Jan Server fits better. If you need frontier-model quality on hard reasoning tasks, local 7B models will frustrate you unless you hybrid with cloud APIs. If nobody will patch the machine or manage GPU drivers, SaaS is less ops.

Hosting it in Canada

We provision Canadian GPU and high-RAM VPS instances for clients running local LLM stacks — Jan desktop, Jan Server, Ollama, or Open WebUI alongside them. TLS, firewall rules, and backup scope for model storage included in the conversation, not an afterthought.

Tell us how many users and which model sizes you're targeting — we'll size RAM and GPU honestly, whether Jan lives on a workstation or a server in Montreal.

Tags:
  • Jan
  • LLM
  • Local AI
  • Self-Hosted
  • Privacy

Need Help With Your Hosting?

Tell us about your application — we respond within 1 hour with honest recommendations.