Production Deployment & Scaling

1. จาก Laptop Prototype สู่ Production System

การเปลี่ยนจาก AI agent ที่รันบน laptop ส่วนตัวไปสู่ระบบ production ที่รัน 24/7 นั้นไม่ใช่แค่การ copy-paste code ไปยัง server ใหม่ — มันเป็นการเปลี่ยนแปลงครั้งใหญ่ที่ต้องคิดใหม่ทั้งหมดเรื่อง reliability, security, scalability และ cost management

เมื่อ OpenClaw agent ของคุณเริ่มมีผู้ใช้มากขึ้น และต้องการความมั่นคงในการรัน คุณจะพบกับความท้าทายใหม่ ๆ ที่ไม่เคยคิดในช่วง prototype:

Uptime requirements: Agent ต้องรันต่อเนื่อง ไม่ใช่เปิด-ปิดตาม laptop
Multi-user scaling: จัดการ multiple agents สำหรับ users หลาย ๆ คน
Resource management: Memory, CPU, และ API cost ที่เพิ่มขึ้นตามจำนวนผู้ใช้
Data persistence: Backup, recovery และการจัดการ workspace
Security hardening: ป้องกัน unauthorized access และ data breach

ในบทความนี้เราจะมาดูทุกด้านของการ deploy OpenClaw บน production จาก infrastructure selection ไปจนถึง lessons learned จากการรัน 45 agents พร้อมกัน

2. Choosing Your Infrastructure

การเลือก infrastructure เป็นจุดเริ่มต้นสำคัญ สำหรับ OpenClaw คุณต้องมี server ที่มี CPU และ RAM เพียงพอสำหรับการรัน Node.js applications หลาย ๆ ตัวพร้อม ๆ กัน

VPS Provider Comparison

Provider	Instance Type	vCPU	RAM	Storage	Price/Month	Network
Hetzner	CPX41	8 vCPU	16 GB	240 GB SSD	$29	20 TB
DigitalOcean	Premium 8GB	4 vCPU	8 GB	160 GB SSD	$48	5 TB
AWS Lightsail	$40 Plan	2 vCPU	8 GB	160 GB SSD	$40	4 TB
Hetzner	CCX33	8 vCPU (Dedicated)	32 GB	240 GB SSD	$49	20 TB
Hostinger	KVM 4	4 vCPU	16 GB	200 GB NVMe	$16	8 TB

Recommended Specifications

จากประสบการณ์การรัน OpenClaw แบบ multi-user นี่คือ spec ที่แนะนำ:

Minimum: 4 vCPU, 8GB RAM สำหรับ 5-10 agents
Recommended: 8 vCPU, 16GB RAM สำหรับ 15-30 agents
High-scale: 8+ vCPU, 32GB RAM สำหรับ 30+ agents
Storage: 100GB+ SSD (workspace data, logs, backups)
Network: 5TB+ bandwidth (API calls, file downloads)

Why Hetzner? สำหรับ OpenClaw deployment, Hetzner ให้ performance-to-price ratio ที่ดีที่สุด โดยเฉพาะ CPX และ CCX series ที่มี CPU power เพียงพอสำหรับ concurrent agent processing

3. Docker Deployment Architecture

OpenClaw ถูกออกแบบมาให้รันบน Docker containers ตั้งแต่แรก ซึ่งทำให้การ deploy บน production เป็นเรื่องง่าย แต่คุณต้องเข้าใจ architecture ของระบบก่อน:

Container Architecture

Gateway Container: หลัก OpenClaw process ที่จัดการ sessions และ routing
Sandbox Containers: แยก environment สำหรับการรัน agent tools
Shared Volumes: workspace data, configurations, และ logs
Network Bridge: internal communication ระหว่าง containers

Production Docker Compose Setup

version: '3.8'

services:
  openclaw-gateway:
    image: openclaw/gateway:latest
    container_name: openclaw_gateway
    restart: unless-stopped
    ports:
      - "3000:3000"  # Gateway API
      - "8080:8080"  # Web UI (if enabled)
    volumes:
      - ./workspace:/workspace
      - ./config:/app/config
      - ./logs:/app/logs
      - /var/run/docker.sock:/var/run/docker.sock
    environment:
      - NODE_ENV=production
      - OPENCLAW_CONFIG_PATH=/app/config/openclaw.json
      - OPENCLAW_LOG_LEVEL=info
      - OPENCLAW_MAX_AGENTS=30
      - OPENCLAW_SANDBOX_LIMIT=10
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:3000/health"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 60s
    depends_on:
      - redis
      - postgres

  redis:
    image: redis:7-alpine
    container_name: openclaw_redis
    restart: unless-stopped
    volumes:
      - redis_data:/data
    command: redis-server --appendonly yes --maxmemory 1gb

  postgres:
    image: postgres:15-alpine
    container_name: openclaw_postgres
    restart: unless-stopped
    environment:
      - POSTGRES_DB=openclaw
      - POSTGRES_USER=openclaw
      - POSTGRES_PASSWORD=${DB_PASSWORD}
    volumes:
      - postgres_data:/var/lib/postgresql/data
      - ./backups:/backups

  nginx:
    image: nginx:alpine
    container_name: openclaw_nginx
    restart: unless-stopped
    ports:
      - "80:80"
      - "443:443"
    volumes:
      - ./nginx.conf:/etc/nginx/nginx.conf:ro
      - ./ssl:/etc/nginx/ssl:ro
    depends_on:
      - openclaw-gateway

volumes:
  redis_data:
  postgres_data:

Health Checks และ Monitoring

การตั้ง health checks อย่างถูกต้องเป็นสิ่งสำคัญ เพื่อให้ Docker สามารถ restart containers ที่มีปัญหาได้อัตโนมัติ:

# Health check endpoint ใน OpenClaw Gateway
GET /health

Response:
{
  "status": "healthy",
  "uptime": 3600,
  "activeAgents": 15,
  "sandboxes": {
    "active": 8,
    "limit": 10
  },
  "memory": {
    "used": "2.1GB",
    "available": "16GB"
  }
}

4. Installation & Setup on VPS

นี่คือ step-by-step process สำหรับการติดตั้ง OpenClaw บน production VPS:

Step 1: OS Setup (Ubuntu 22.04 LTS)

# Update system
sudo apt update && sudo apt upgrade -y

# Install essential packages
sudo apt install -y curl git ufw fail2ban htop

# Configure firewall
sudo ufw allow 22/tcp  # SSH
sudo ufw allow 80/tcp  # HTTP
sudo ufw allow 443/tcp # HTTPS
sudo ufw enable

# Create openclaw user
sudo adduser openclaw
sudo usermod -aG docker openclaw
sudo usermod -aG sudo openclaw

Step 2: Docker Installation

# Install Docker
curl -fsSL https://get.docker.com -o get-docker.sh
sudo sh get-docker.sh

# Install Docker Compose
sudo curl -L "https://github.com/docker/compose/releases/latest/download/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose
sudo chmod +x /usr/local/bin/docker-compose

# Add user to docker group
sudo usermod -aG docker openclaw

Step 3: Node.js และ OpenClaw

# Install Node.js 18+
curl -fsSL https://deb.nodesource.com/setup_18.x | sudo -E bash -
sudo apt-get install -y nodejs

# Install OpenClaw globally
sudo npm install -g openclaw

# Create workspace directory
sudo mkdir -p /opt/openclaw
sudo chown openclaw:openclaw /opt/openclaw
cd /opt/openclaw

# Initialize OpenClaw
openclaw init --production

Step 4: SystemD Service Configuration

# /etc/systemd/system/openclaw.service
[Unit]
Description=OpenClaw AI Agent Gateway
After=docker.service
Requires=docker.service

[Service]
Type=simple
User=openclaw
WorkingDirectory=/opt/openclaw
Environment=NODE_ENV=production
Environment=OPENCLAW_CONFIG_PATH=/opt/openclaw/config/openclaw.json
ExecStart=/usr/bin/npm start
Restart=always
RestartSec=10
StandardOutput=syslog
StandardError=syslog
SyslogIdentifier=openclaw

[Install]
WantedBy=multi-user.target

# Enable and start service
sudo systemctl enable openclaw
sudo systemctl start openclaw
sudo systemctl status openclaw

# Check logs
sudo journalctl -u openclaw -f

Step 5: SSL และ Domain Setup

# Install Certbot for Let's Encrypt
sudo apt install certbot python3-certbot-nginx

# Get SSL certificate
sudo certbot --nginx -d your-domain.com

# Configure automatic renewal
sudo crontab -e
# Add: 0 12 * * * /usr/bin/certbot renew --quiet

5. Multi-User & Multi-Channel Scaling

การรัน OpenClaw สำหรับ users หลายคนต้องการการวางแผนเรื่อง agent isolation และ resource sharing อย่างรอบคอบ

Agent Isolation Patterns

Per-User Workspaces: แต่ละ user มี workspace แยก
Memory Isolation: agent memory ไม่ cross-contaminate
API Key Management: user สามารถใช้ API keys ของตัวเองได้
Resource Limits: จำกัด CPU/Memory per user

Multi-Channel Configuration

# openclaw.json - Multi-channel setup
{
  "channels": {
    "telegram": {
      "enabled": true,
      "botToken": "${TELEGRAM_BOT_TOKEN}",
      "allowedUsers": ["user1", "user2"],
      "workspace": "/workspace/telegram"
    },
    "discord": {
      "enabled": true,
      "botToken": "${DISCORD_BOT_TOKEN}",
      "guildId": "your-guild-id",
      "workspace": "/workspace/discord"
    },
    "signal": {
      "enabled": true,
      "phoneNumber": "+66xxxxxxxxx",
      "workspace": "/workspace/signal"
    }
  },
  "agents": {
    "maxConcurrent": 30,
    "memoryLimit": "2GB",
    "timeoutMinutes": 60,
    "sandboxLimit": 10
  }
}

Workspace Management

# Workspace structure for multi-user
/opt/openclaw/workspace/
├── telegram/
│   ├── user1/
│   │   ├── memory/
│   │   ├── files/
│   │   └── config/
│   └── user2/
├── discord/
│   ├── guild1/
│   └── guild2/
└── signal/
    └── groups/

6. Monitoring & Cost Tracking

การ monitor OpenClaw บน production ไม่ใช่แค่ดู uptime แต่ต้องติดตาม API costs, resource usage และ performance metrics

Cost Tracking Implementation

# Cost tracking script
#!/bin/bash
# /opt/openclaw/scripts/cost-monitor.sh

DATE=$(date +%Y-%m-%d)
COST_LOG="/opt/openclaw/logs/costs-$DATE.log"

# Get session status for all active agents
openclaw gateway status --json | jq '.agents[] | {
  user: .user,
  model: .model,
  tokens: .totalTokens,
  cost: .estimatedCost
}' >> $COST_LOG

# Daily cost summary
python3 /opt/openclaw/scripts/cost-summary.py $COST_LOG

Real Monthly Cost Breakdown (45 Agents)

Category	Service/Model	Usage	Cost	Percentage
Infrastructure	Hetzner CCX33	1 server	$49	12%
LLM API	Claude 3 Opus	15M tokens	$225	56%
LLM API	Claude 3 Sonnet	25M tokens	$75	19%
Other APIs	Search, Maps, etc.	Various	$35	9%
Storage/Backup	S3, Backups	500GB	$15	4%
Total Monthly			$399	100%

Monitoring Dashboard Setup

# Prometheus + Grafana monitoring
version: '3.8'

services:
  prometheus:
    image: prom/prometheus
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
      - prometheus_data:/prometheus
    ports:
      - "9090:9090"

  grafana:
    image: grafana/grafana
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=${GRAFANA_PASSWORD}
    volumes:
      - grafana_data:/var/lib/grafana
    ports:
      - "3001:3000"

  node-exporter:
    image: prom/node-exporter
    ports:
      - "9100:9100"

7. Backup Strategies

การ backup OpenClaw system ต้องครอบคลุมทั้ง application data, user workspaces, agent memory และ configuration

What to Backup

Workspace Data: user files, agent memory, conversation history
Configuration: openclaw.json, API keys, channel settings
Database: PostgreSQL data (หากใช้)
Logs: important for troubleshooting และ cost tracking
SSL Certificates: Let's Encrypt certificates และ private keys

Automated Backup Script

#!/bin/bash # /opt/openclaw/scripts/backup.sh BACKUP_DATE=$(date +%Y%m%d_%H%M%S) BACKUP_DIR="/opt/backups/openclaw_$BACKUP_DATE" S3_BUCKET="your-backup-bucket" # Create backup directory mkdir -p $BACKUP_DIR # Backup workspace tar -czf $BACKUP_DIR/workspace.tar.gz /opt/openclaw/workspace/ # Backup configuration cp -r /opt/openclaw/config $BACKUP_DIR/ # Backup database (if using PostgreSQL) docker exec openclaw_postgres pg_dump -U openclaw openclaw > $BACKUP_DIR/database.sql # Backup logs (last 7 days) find /opt/openclaw/logs -name "*.log" -mtime -7 -exec cp {} $BACKUP_DIR/ \; # Upload to S3 aws s3 sync $BACKUP_DIR s3://$S3_BUCKET/backups/ # Keep only last 30 days locally find /opt/backups -name "openclaw_*" -mtime +30 -delete # Send notification echo "Backup completed: $BACKUP_DATE" | mail -s "OpenClaw Backup" [email protected]

Git-based Backup for Code

# Setup git backup for workspace cd /opt/openclaw/workspace git init git remote add origin [email protected]:yourorg/openclaw-workspace.git # Daily git backup #!/bin/bash cd /opt/openclaw/workspace git add -A git commit -m "Daily backup $(date +%Y-%m-%d)" git push origin main

Security Warning: อย่า commit API keys หรือ sensitive data ลง git repository โดยตรง ใช้ .gitignore และ environment variables แทน

8. Upgrade Workflow

การ upgrade OpenClaw บน production ต้องทำอย่างระมัดระวัง เพื่อหลีกเลี่ยง downtime และ data loss

Pre-Upgrade Checklist

✅ Backup current system completely
✅ Read changelog และ breaking changes
✅ Test upgrade บน staging environment
✅ Schedule maintenance window
✅ Notify users about downtime

Rolling Upgrade Process

# Zero-downtime upgrade script
#!/bin/bash

# 1. Backup current system
/opt/openclaw/scripts/backup.sh

# 2. Stop accepting new sessions
openclaw gateway maintenance --enable

# 3. Wait for active sessions to complete (max 30 minutes)
timeout 1800 openclaw gateway wait-idle

# 4. Update OpenClaw
npm update -g openclaw

# 5. Test configuration
openclaw config validate

# 6. Restart gateway
sudo systemctl restart openclaw

# 7. Verify health
sleep 30
openclaw gateway health || exit 1

# 8. Re-enable service
openclaw gateway maintenance --disable

echo "Upgrade completed successfully"

Rollback Strategy

# Emergency rollback script
#!/bin/bash

PREVIOUS_VERSION="2.1.5"  # Keep track of last working version

# 1. Stop current service
sudo systemctl stop openclaw

# 2. Reinstall previous version
npm uninstall -g openclaw
npm install -g openclaw@$PREVIOUS_VERSION

# 3. Restore backup if needed
LATEST_BACKUP=$(ls -t /opt/backups/openclaw_* | head -n1)
tar -xzf $LATEST_BACKUP/workspace.tar.gz -C /

# 4. Restart service
sudo systemctl start openclaw

9. Security Hardening for Production

OpenClaw บน production เป็น high-value target เพราะมีข้อมูล sensitive และ API access หลายระบบ การ harden security จึงเป็นสิ่งจำเป็น

Firewall Configuration

# UFW firewall rules
sudo ufw --force reset
sudo ufw default deny incoming
sudo ufw default allow outgoing

# Allow SSH (custom port recommended)
sudo ufw allow 2222/tcp

# Allow HTTP/HTTPS
sudo ufw allow 80/tcp
sudo ufw allow 443/tcp

# OpenClaw Gateway (internal only)
sudo ufw allow from 10.0.0.0/8 to any port 3000

# Enable firewall
sudo ufw enable

SSH Hardening

# /etc/ssh/sshd_config
Port 2222
PermitRootLogin no
PasswordAuthentication no
PubkeyAuthentication yes
MaxAuthTries 3
ClientAliveInterval 300
ClientAliveCountMax 2

# Allow only specific users
AllowUsers openclaw

# Restart SSH
sudo systemctl restart sshd

Fail2Ban Configuration

# /etc/fail2ban/jail.local
[DEFAULT]
bantime = 3600
findtime = 600
maxretry = 3

[sshd]
enabled = true
port = 2222

[nginx-http-auth]
enabled = true

[openclaw-api]
enabled = true
port = 3000
filter = openclaw-api
logpath = /opt/openclaw/logs/gateway.log
maxretry = 5

API Keys และ Secrets Management

# Use environment variables for secrets
# /opt/openclaw/.env (chmod 600)
ANTHROPIC_API_KEY=sk-...
OPENAI_API_KEY=sk-...
TELEGRAM_BOT_TOKEN=bot:...
DB_PASSWORD=secure_password

# Load in systemd service
Environment=ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY}

# Rotate API keys monthly
#!/bin/bash
# /opt/openclaw/scripts/rotate-keys.sh
# Automated key rotation script

10. Performance Optimization

การ optimize performance ของ OpenClaw เป็นการหา balance ระหว่าง response time, cost และ resource utilization

Context Window Management

# openclaw.json - Performance settings
{
  "performance": {
    "contextWindow": {
      "maxTokens": 8000,
      "compactionThreshold": 6000,
      "summaryRatio": 0.3
    },
    "caching": {
      "enabled": true,
      "ttl": 3600,
      "maxSize": "500MB"
    },
    "concurrency": {
      "maxAgents": 30,
      "maxSandboxes": 10,
      "queueTimeout": 30000
    }
  }
}

Model Selection Strategy

Use Case	Model	Cost per 1K tokens	When to Use
Simple Q&A	Claude 3 Haiku	$0.25	Quick responses, basic tasks
General Tasks	Claude 3 Sonnet	$3	Most production workloads
Complex Reasoning	Claude 3 Opus	$15	Research, coding, analysis
Code Generation	GPT-4	$10	Specialized coding tasks

Caching Strategies

# Redis caching for common queries
const redis = require('redis');
const client = redis.createClient();

// Cache expensive API calls
async function getCachedResponse(query, model) {
  const cacheKey = `llm:${model}:${hashQuery(query)}`;
  
  // Check cache first
  const cached = await client.get(cacheKey);
  if (cached) {
    return JSON.parse(cached);
  }
  
  // Call LLM API
  const response = await callLLM(query, model);
  
  // Cache for 1 hour
  await client.setex(cacheKey, 3600, JSON.stringify(response));
  
  return response;
}

11. Lessons from Running 45 Agents

หลังจากรัน OpenClaw บน production กับ 45 agents พร้อมกันมา 6 เดือน นี่คือ insights สำคัญที่ได้เรียนรู้:

What Actually Breaks in Production

Memory Leaks: Agent memory เพิ่มขึ้นเรื่อย ๆ หาก context ไม่ได้ compact ตาม schedule
API Rate Limits: Anthropic API มี rate limit ที่เข้มงวดมากกว่า development
Disk Space: Log files และ workspace data เพิ่มขึ้นเร็วกว่าคาด (20GB/เดือน)
Docker Socket Issues: การ mount Docker socket ทำให้มี permission problems เป็นครั้งคราว
Session Persistence: Agent sessions หาย when gateway restart ถ้าไม่ได้ persist ลง database

Cost Surprises

Model Selection Impact: การใช้ Opus แทน Sonnet เพิ่มค่าใช้จ่าย 5 เท่า แต่ performance ดีขึ้นแค่ 20-30%
Context Window Costs: Long conversations ที่ไม่ได้ compact ทำให้ token usage พุ่งทะยาน
Failed Requests: API calls ที่ fail ก็ยังถูกคิดเงิน ต้องใส่ retry logic ที่ดี
Background Tasks: Heartbeat และ monitoring tasks กิน tokens มากกว่าคาด

Maintenance Overhead

User Support: 20% ของเวลาไปกับการตอบคำถาม users เรื่อง "agent ไม่ตอบ"
API Key Rotation: ต้องทำ monthly ซึ่งต้อง coordinate กับหลาย services
Backup Monitoring: Backup scripts fail เงียบ ๆ หากไม่ได้ setup alerting ดี ๆ
Dependency Updates: OpenClaw มี breaking changes บ่อย ต้องติดตามอย่างใกล้ชิด

Things Nobody Tells You

Cold Start Problem: Agent ตัวแรกของ user ใหม่ต้องใช้เวลา 30-60 วิ load workspace
Time Zone Confusion: Multi-timezone users ทำให้ scheduling tasks ซับซ้อน
Model Availability: Claude API มี outage หรือ degraded performance เป็นครั้งคราว
Resource Competition: Multiple agents รันพร้อมกันแย่ง CPU ทำให้ response time ช้า
Log Analysis: การ debug production issues ยากมากโดยไม่มี proper structured logging

Pro Tip: เริ่มต้นด้วย 10-15 agents ก่อน scaling ขึ้น คุณจะเจอปัญหาหลาย ๆ อย่างที่ไม่ได้คิดไว้ในระดับ small scale

12. Key Takeaways

🎯 สิ่งสำคัญที่ต้องจำ

Infrastructure matters: Hetzner ให้ value ดีที่สุดสำหรับ OpenClaw workloads
Start small, scale gradually: 10-15 agents ก่อน จากนั้นค่อย ๆ เพิ่ม
Monitor costs religiously: LLM API costs พุ่งเร็วกว่าที่คิด ต้องติดตามทุกวัน
Backup everything, test restores: Workspace data คือสิ่งสำคัญที่สุด อย่าให้หาย
Context window management: เป็น key factor สำหรับ performance และ cost control
Security from day one: OpenClaw มี access หลาย systems ต้อง harden ตั้งแต่แรก
Model selection strategy: ใช้ Sonnet เป็นหลัก Opus เฉพาะ tasks ที่ซับซ้อน
User education is crucial: Users ต้องเข้าใจ limitations และ best practices
Staging environment: ต้องมี test environment ก่อนทุก production changes
Community support: OpenClaw community มี insights เยอะ ติดตาม Discord/Forums

13. Series Navigation

นี่คือบทความสุดท้าย (Post #7) ของ series "OpenClaw for Organizations 2026" ที่ครอบคลุมทุก aspect ของการนำ OpenClaw มาใช้ในองค์กร ตั้งแต่การเริ่มต้นไปจนถึง production deployment ที่มั่นคง

หากคุณยังไม่ได้อ่านบทความก่อนหน้า แนะนำให้อ่านตามลำดับเพื่อให้ได้ understanding ที่สมบูรณ์:

📚 OpenClaw for Organizations 2026

#1 OpenClaw 101 #2 Agent Teams #3 Memory & Knowledge #4 Security & Access #5 Integrations #6 Skills & Automation #7 Production & Scale