How we run a fully automated infrastructure on Coolify and Hetzner
Part of our AI Tool Stack pillar — This article covers the infrastructure layer that powers our entire operating system.
Our infrastructure runs on a single principle: maximum automation, zero manual intervention. Every service—from object storage to LLM hosting to workflow automation—runs self-hosted on Hetzner servers, orchestrated by Coolify, and managed with near-total autonomy.
The Foundation: Coolify on Hetzner
Coolify is our platform orchestrator. It’s the control plane that manages every service in our stack without requiring Kubernetes complexity or cloud vendor lock-in.
Why Coolify?
- Self-hosted: We own the entire stack
- Docker-native: Every service runs in containers
- Git-based deployments: Push to deploy, automatically
- Zero vendor lock-in: Runs on any server, we chose Hetzner for cost efficiency
- API-first: Everything can be automated via API
Why Hetzner?
- Cost: 70% cheaper than AWS/GCP for equivalent resources
- Performance: Excellent network and compute for the price
- EU-based: Data sovereignty and GDPR compliance
- Simple: No complex cloud abstractions, just servers
Our setup: Multiple Hetzner servers (CPX and CCX instances) running Coolify, with services distributed across them based on resource needs.
The Self-Hosted Stack
Every tool in our operating stack that requires infrastructure is self-hosted. Here’s what CORE manages:
Object Storage: MinIO
MinIO replaces S3 for all object storage needs. It’s our S3-compatible object storage running self-hosted on Hetzner:
- Model artifacts: Trained models, embeddings, vector stores from our ML pipelines
- Data pipelines: Raw data ingestion, processed datasets for Mage and Databricks
- Backups: Automated daily backups of all databases and application state
- Static assets: Generated content, media files, and user uploads
- Vector stores: Embeddings and vector databases for RAG applications
Automation:
- MinIO buckets are created automatically via Coolify when new projects spin up
- Lifecycle policies auto-archive data older than 90 days to cold storage
- Access keys rotate automatically every 30 days
- Bucket policies are applied via infrastructure-as-code
- Usage metrics feed into Grafana for cost tracking
LLM Hosting: LiteLLM
LiteLLM is our unified LLM gateway running self-hosted. It’s the single entry point for all LLM calls across our ventures:
- Multi-model support: Routes to OpenAI, Anthropic, local models via Ollama, or any OpenAI-compatible API
- Cost tracking: Per-project, per-model usage analytics with automatic spend attribution
- Fallback routing: Automatic failover if a model is unavailable or rate-limited
- Rate limiting: Prevents cost overruns with per-project, per-model limits
- Request caching: Reduces costs by caching identical prompts
- Streaming support: Handles streaming responses for real-time applications
Automation:
- New projects get LiteLLM endpoints automatically configured in Coolify
- Model selection is optimized based on cost and latency requirements (cheaper models for non-critical tasks)
- Usage alerts trigger when daily/monthly thresholds are exceeded
- Cost budgets are enforced automatically—requests are rejected if budget is exceeded
- Model performance is tracked and automatically routes to best-performing models
- All requests are logged to Langfuse for tracing and evaluation
Workflow Automation: n8n
n8n handles all workflow automation:
- Agent orchestration: Coordinates between AI agents
- Data pipelines: Moves data between services
- API integrations: Connects external services
- Scheduled tasks: Cron-like automation for maintenance
Automation: Workflows are version-controlled in Git. New workflows deploy automatically. Failed workflows trigger alerts and auto-retry with exponential backoff.
Analytics & Observability
Grafana provides unified dashboards:
- Infrastructure metrics: CPU, memory, disk, network
- Application metrics: Request rates, error rates, latency
- Business metrics: User growth, revenue, experiment results
Langfuse traces all LLM calls:
- Prompt/response logging: Every AI interaction is logged
- Cost tracking: Per-request cost analysis
- Quality evaluation: Automatic evaluation of AI outputs
Metabase serves as our decision cockpit:
- 30-day venture reviews: Automated reports for experiment evaluation
- Financial guardrails: Real-time spend tracking
- Custom dashboards: Built automatically from database schemas
Automation: Dashboards update in real-time. Alerts trigger on anomalies. Reports generate automatically on schedule.
The Automation Layer
What makes this infrastructure truly autonomous:
1. Git-Based Infrastructure as Code
Every service configuration lives in Git:
- Coolify projects: Defined as YAML files
- Environment variables: Version-controlled, encrypted
- Service dependencies: Declared explicitly
- Scaling rules: Auto-scaling based on metrics
Result: Infrastructure changes are reviewed, versioned, and deployed like code. Rollbacks are instant.
2. Automated Provisioning
When a new venture spins up:
- Coolify creates the project automatically
- MinIO buckets are provisioned
- LiteLLM endpoints are configured
- n8n workflows are deployed
- Monitoring is set up automatically
- Alerts are configured
Result: A new venture gets full infrastructure in under 5 minutes, zero manual steps.
3. Self-Healing Systems
- Health checks: Every service has automated health monitoring
- Auto-restart: Failed containers restart automatically
- Resource limits: CPU/memory limits prevent resource exhaustion
- Backup automation: Databases and state back up daily
- Disaster recovery: Automated restore procedures tested monthly
Result: Infrastructure runs 24/7 with minimal intervention. Issues resolve themselves.
4. Cost Optimization
- Resource right-sizing: Services scale down during low usage
- Idle resource detection: Unused services are flagged for shutdown
- Cost alerts: Spending thresholds trigger notifications
- Automated cleanup: Old data, logs, and artifacts are purged automatically
Result: Infrastructure costs stay predictable and optimized.
Real-World Example: Spinning Up a New Venture
Here’s what happens when we launch a new experiment:
Day 1, 9:00 AM: New venture approved in Linear
- 9:01 AM: Coolify receives webhook, creates new project with resource limits
- 9:02 AM: MinIO buckets provisioned automatically (data, models, backups, static assets)
- 9:03 AM: LiteLLM endpoint created with model routing rules and cost budgets
- 9:04 AM: n8n workflows deployed from Git repository, webhooks configured
- 9:05 AM: Grafana dashboards auto-generated from service discovery
- 9:06 AM: Langfuse project created for LLM tracing, connected to LiteLLM
- 9:07 AM: Metabase database connection configured, initial dashboards created
- 9:08 AM: Health checks begin, alerts configured for all services
- 9:09 AM: Infrastructure ready, deployment can begin
What’s automated:
- All service configurations are pulled from Git
- Environment variables are injected from Coolify’s secure storage
- Service dependencies are resolved automatically (e.g., LiteLLM needs MinIO for caching)
- Network policies are applied (services can only talk to whitelisted endpoints)
- SSL certificates are provisioned automatically via Let’s Encrypt
Total time: 9 minutes. Manual steps: 0.
The CORE Operating Model
As Infrastructure Lead, my role is to:
- Design the infrastructure architecture
- Automate every operational task
- Monitor system health and performance
- Optimize costs and resource usage
- Document everything for reproducibility
But the goal is to make myself obsolete—the infrastructure should run itself.
Key Learnings
What Works
- Coolify + Hetzner: Perfect balance of simplicity and power
- Self-hosting everything: Complete control, predictable costs
- Git-based config: Infrastructure changes are auditable
- Automated provisioning: New projects spin up instantly
- Comprehensive monitoring: We see issues before they become problems
What We’d Change
- More Hetzner regions: Currently single-region, would benefit from multi-region for redundancy
- Backup automation: Could be more granular (per-service, not just daily)
- Cost optimization: More aggressive auto-scaling during low usage periods
What’s Next
- Multi-region deployment: Replicate infrastructure across Hetzner regions
- Advanced auto-scaling: ML-based prediction of resource needs
- Zero-downtime deployments: Blue-green deployments for all services
- Infrastructure testing: Automated chaos engineering to test resilience
Getting Started
If you want to replicate this setup:
- Start with Coolify: Deploy on a Hetzner server (CPX21 is enough to start)
- Add MinIO: First service to deploy, needed by everything else
- Deploy LiteLLM: If you’re using LLMs, this is essential
- Set up n8n: For workflow automation
- Add monitoring: Grafana + Prometheus for metrics
- Automate everything: Use Coolify’s API and Git-based configs
Cost estimate: ~€50-100/month for a full stack on Hetzner (vs €500-1000/month on AWS).
Related Resources
- AI Tool Stack — Complete overview of all tools in our operating stack
- Auto Agent Workflows — How agents interact with this infrastructure
- Post-Human Venture Engine — The operating system this infrastructure powers
- Venture Studio Operations — How we manage multiple ventures on this infrastructure
The infrastructure layer is invisible when it works perfectly. That’s the goal—complete automation so we can focus on building, not managing servers.