AI on Your Infrastructure
Self-hosted vs cloud API, privacy, Canadian data paths.
Learning objectives
- Compare cloud LLM API vs self-hosted open models
- Map data residency for Canadian workloads
- List hardware basics for local inference
Two deployment paths
| Approach | Pros | Cons |
|---|---|---|
| Cloud API (OpenAI, Anthropic, etc.) | Best quality, no GPU ops, fast to ship | Data leaves your network; US terms; usage billing |
| Self-hosted (Ollama, vLLM, llama.cpp on your VPS) | Data stays in Canada; fixed cost; air-gap possible | Weaker models on small hardware; you patch and monitor |
| Canadian hosted API (regional providers) | Balance of quality + residency | Smaller model choice; verify subprocessors |
Self-host sketch on Workshop Co. Proxmox
# LXC or VM with GPU passthrough (optional)
# CPU-only: smaller models (7B quantized) on 16 GB RAM — slow but private
curl -fsSL https://ollama.com/install.sh | sh
ollama pull llama3.2:3b
ollama run llama3.2:3b
# Your PHP/Node app calls http://127.0.0.1:11434/api/generate
# Never expose Ollama port to the public internet without auth
For PIPEDA-sensitive FAQ (names, emails in chat logs), self-host on Swift Host Canadian VPS or use a provider with contractual Canadian processing. Default US API may be fine for generic marketing copy only.
Security checklist
- API keys in env vars, rotated quarterly
- Rate limit public chat widget (prevent token drain attacks)
- Log prompts without storing full card numbers
- Firewall inference port to localhost or VPN
- Disclose “AI assistant” to users — no fake human support
Decision matrix
Workshop Co. FAQ — no customer names in prompts, public website only. Pick cloud API vs self-host and justify in 3 bullets.
Sample: cloud API OK
- No PII in prompts — only public class schedule JSON
- Low volume — API cost < $5/mo
- Accept vendor terms; add “do not submit personal info” disclaimer
If chat collects email for follow-up → self-host or Canadian API + retention policy.
Quick quiz
Why must Ollama not listen on 0.0.0.0:11434 on a public VPS?
Answer
Anyone could run prompts on your GPU/CPU — abuse, cost, and prompt injection into your network. Bind localhost or protect with SSH tunnel / auth proxy.