Chapter 6

AI on Your Infrastructure

Self-hosted vs cloud API, privacy, Canadian data paths.

Learning objectives

  • Compare cloud LLM API vs self-hosted open models
  • Map data residency for Canadian workloads
  • List hardware basics for local inference

Two deployment paths

ApproachProsCons
Cloud API (OpenAI, Anthropic, etc.) Best quality, no GPU ops, fast to ship Data leaves your network; US terms; usage billing
Self-hosted (Ollama, vLLM, llama.cpp on your VPS) Data stays in Canada; fixed cost; air-gap possible Weaker models on small hardware; you patch and monitor
Canadian hosted API (regional providers) Balance of quality + residency Smaller model choice; verify subprocessors

Self-host sketch on Workshop Co. Proxmox

# LXC or VM with GPU passthrough (optional)
# CPU-only: smaller models (7B quantized) on 16 GB RAM — slow but private

curl -fsSL https://ollama.com/install.sh | sh
ollama pull llama3.2:3b
ollama run llama3.2:3b

# Your PHP/Node app calls http://127.0.0.1:11434/api/generate
# Never expose Ollama port to the public internet without auth
Canadian angle

For PIPEDA-sensitive FAQ (names, emails in chat logs), self-host on Swift Host Canadian VPS or use a provider with contractual Canadian processing. Default US API may be fine for generic marketing copy only.

Security checklist

  • API keys in env vars, rotated quarterly
  • Rate limit public chat widget (prevent token drain attacks)
  • Log prompts without storing full card numbers
  • Firewall inference port to localhost or VPN
  • Disclose “AI assistant” to users — no fake human support

Decision matrix

Workshop Co. FAQ — no customer names in prompts, public website only. Pick cloud API vs self-host and justify in 3 bullets.

Sample: cloud API OK
  • No PII in prompts — only public class schedule JSON
  • Low volume — API cost < $5/mo
  • Accept vendor terms; add “do not submit personal info” disclaimer

If chat collects email for follow-up → self-host or Canadian API + retention policy.

Quick quiz

Why must Ollama not listen on 0.0.0.0:11434 on a public VPS?

Answer

Anyone could run prompts on your GPU/CPU — abuse, cost, and prompt injection into your network. Bind localhost or protect with SSH tunnel / auth proxy.