Deployment
This guide covers deploying the LLMG gateway to production using Docker Compose, Kubernetes, and environment-specific configurations.
Docker Compose
Section titled “Docker Compose”A complete docker-compose.yml for production deployment:
version: '3.8'
services: llmg: image: ghcr.io/modpotatodotdev/llmg:latest ports: - "8080:8080" environment: # Required: At least one provider API key - OPENAI_API_KEY=${OPENAI_API_KEY} - ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY} - GROQ_API_KEY=${GROQ_API_KEY}
# Optional: Additional providers - MISTRAL_API_KEY=${MISTRAL_API_KEY} - COHERE_API_KEY=${COHERE_API_KEY} - DEEPSEEK_API_KEY=${DEEPSEEK_API_KEY}
# Server configuration - LLMG_PORT=8080 - LLMG_HOST=0.0.0.0 - LLMG_LOG_LEVEL=info
# Optional: Enable CORS for browser clients - LLMG_CORS=true volumes: # Optional: Mount custom config file - ./llmg.toml:/app/llmg.toml:ro healthcheck: test: ["CMD", "curl", "-f", "http://localhost:8080/health"] interval: 30s timeout: 5s retries: 3 start_period: 10s restart: unless-stopped deploy: resources: limits: cpus: '2.0' memory: 512M reservations: cpus: '0.5' memory: 128MCreate a .env file in the same directory:
# .env - NEVER commit this file to version controlOPENAI_API_KEY=sk-your-openai-key-hereANTHROPIC_API_KEY=sk-ant-your-anthropic-key-hereGROQ_API_KEY=gsk_your_groq_key_hereDeploy with:
docker-compose up -dEnvironment Variables
Section titled “Environment Variables”Server Configuration
Section titled “Server Configuration”| Variable | Default | Description |
|---|---|---|
LLMG_PORT | 8080 | Server port |
LLMG_HOST | 0.0.0.0 | Bind address (127.0.0.1 for local-only) |
LLMG_LOG_LEVEL | info | Log level: error, warn, info, debug, trace |
LLMG_CORS | false | Enable CORS headers |
Provider API Keys
Section titled “Provider API Keys”Set the API key for each provider you want to enable. The gateway auto-discovers providers based on which environment variables are present.
| Variable | Provider |
|---|---|
OPENAI_API_KEY | OpenAI |
ANTHROPIC_API_KEY | Anthropic |
GROQ_API_KEY | Groq |
MISTRAL_API_KEY | Mistral |
COHERE_API_KEY | Cohere |
DEEPSEEK_API_KEY | DeepSeek |
OPENROUTER_API_KEY | OpenRouter |
XAI_API_KEY | xAI (Grok) |
AZURE_OPENAI_API_KEY | Azure OpenAI |
See the full provider list for all supported providers.
Security Best Practices
Section titled “Security Best Practices”- Store API keys in environment variables or a secrets manager
- Never commit
.envfiles to version control - Use different API keys for different environments (dev/staging/prod)
- Rotate keys regularly
- Restrict API key permissions at the provider level when possible
Configuration File
Section titled “Configuration File”For complex deployments, create an llmg.toml file:
[server]port = 8080host = "0.0.0.0"timeout = 60cors = true
[logging]level = "info"verbose = false
[providers.openai]enabled = trueapi_key = "${OPENAI_API_KEY}"default_model = "gpt-4"
[providers.anthropic]enabled = trueapi_key = "${ANTHROPIC_API_KEY}"default_model = "claude-sonnet-4-20250514"
[aliases]gpt-4 = "openai/gpt-4"claude = "anthropic/claude-3-opus-20240229"Mount the config file in Docker Compose:
volumes: - ./llmg.toml:/app/llmg.toml:roRate Limiting
Section titled “Rate Limiting”Configure rate limiting to protect your gateway and providers:
Environment Variables
Section titled “Environment Variables”| Variable | Default | Description |
|---|---|---|
LLMG_RATE_LIMIT_ENABLED | false | Enable rate limiting |
LLMG_RATE_LIMIT_RPS | 0 | Global requests per second (0 = unlimited) |
LLMG_RATE_LIMIT_BURST | 100 | Global burst capacity |
Config File
Section titled “Config File”[rate_limit]enabled = truerequests_per_second = 100burst_capacity = 200
# Per-provider limits[rate_limit.providers.openai]requests_per_second = 50burst_capacity = 100
[rate_limit.providers.anthropic]requests_per_second = 30burst_capacity = 60When rate limits are exceeded, the gateway returns HTTP 429 with a Retry-After header.
Kubernetes
Section titled “Kubernetes”Basic Deployment
Section titled “Basic Deployment”apiVersion: apps/v1kind: Deploymentmetadata: name: llmg-gateway labels: app: llmg-gatewayspec: replicas: 3 selector: matchLabels: app: llmg-gateway template: metadata: labels: app: llmg-gateway spec: containers: - name: llmg image: ghcr.io/modpotatodotdev/llmg:latest ports: - containerPort: 8080 env: - name: OPENAI_API_KEY valueFrom: secretKeyRef: name: llmg-secrets key: openai-api-key - name: ANTHROPIC_API_KEY valueFrom: secretKeyRef: name: llmg-secrets key: anthropic-api-key - name: LLMG_LOG_LEVEL value: "info" resources: requests: memory: "128Mi" cpu: "100m" limits: memory: "512Mi" cpu: "500m" livenessProbe: httpGet: path: /health port: 8080 initialDelaySeconds: 10 periodSeconds: 30 readinessProbe: httpGet: path: /health port: 8080 initialDelaySeconds: 5 periodSeconds: 10Service
Section titled “Service”apiVersion: v1kind: Servicemetadata: name: llmg-gatewayspec: selector: app: llmg-gateway ports: - port: 80 targetPort: 8080 type: ClusterIPSecrets
Section titled “Secrets”Create secrets using kubectl:
kubectl create secret generic llmg-secrets \ --from-literal=openai-api-key=sk-your-key \ --from-literal=anthropic-api-key=sk-ant-your-keyOr use a secrets manifest:
apiVersion: v1kind: Secretmetadata: name: llmg-secretstype: OpaquestringData: openai-api-key: sk-your-key-here anthropic-api-key: sk-ant-your-key-hereHorizontal Pod Autoscaler
Section titled “Horizontal Pod Autoscaler”apiVersion: autoscaling/v2kind: HorizontalPodAutoscalermetadata: name: llmg-gateway-hpaspec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: llmg-gateway minReplicas: 2 maxReplicas: 10 metrics: - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 70 - type: Resource resource: name: memory target: type: Utilization averageUtilization: 80Ingress (NGINX)
Section titled “Ingress (NGINX)”apiVersion: networking.k8s.io/v1kind: Ingressmetadata: name: llmg-gateway-ingress annotations: nginx.ingress.kubernetes.io/rate-limit: "100" nginx.ingress.kubernetes.io/rate-limit-window: "1m" cert-manager.io/cluster-issuer: "letsencrypt-prod"spec: tls: - hosts: - llmg.yourdomain.com secretName: llmg-tls rules: - host: llmg.yourdomain.com http: paths: - path: / pathType: Prefix backend: service: name: llmg-gateway port: number: 80Health Checks
Section titled “Health Checks”The gateway provides a health endpoint at /health:
curl http://localhost:8080/healthReturns HTTP 200 with JSON response:
{ "status": "healthy", "version": "0.1.9"}Use this for load balancer health checks and monitoring.
Monitoring
Section titled “Monitoring”Structured Logging
Section titled “Structured Logging”The gateway uses tracing for structured logging. Set the log level to control verbosity:
LLMG_LOG_LEVEL=info # Default: info, debug, trace for more detailPipe logs to your observability stack (e.g. Datadog, Loki, CloudWatch) for monitoring request patterns, error rates, and latency.
Production Checklist
Section titled “Production Checklist”- API keys stored securely (secrets manager, not in repo)
- Rate limiting enabled
- Health checks configured
- Logging level set appropriately (
infoorwarn) - Resource limits defined (CPU/memory)
- TLS/HTTPS enabled
- Monitoring and alerting configured
- Backup strategy for configuration
- Disaster recovery plan documented
Troubleshooting
Section titled “Troubleshooting”Gateway won’t start
Section titled “Gateway won’t start”Check logs:
docker logs <container-id>Common issues:
- Missing API keys (at least one provider required)
- Port already in use
- Invalid configuration file syntax
Requests failing
Section titled “Requests failing”Verify:
- Requests include the
Authorization: Bearer <token>header (see Authentication) - API keys are valid and have quota
- Provider is enabled in configuration
- Network connectivity to provider APIs
High memory usage
Section titled “High memory usage”- Reduce
burst_capacityin rate limiting - Lower max request size limits
- Monitor for memory leaks with profiling
Connection timeouts
Section titled “Connection timeouts”- Increase
timeoutin server configuration - Check provider API status
- Verify network policies (Kubernetes)