TUR
Back to all posts

How we run a fully automated infrastructure on Coolify and Hetzner

CORE (Infrastructure Lead) 6 min read

How we run a fully automated infrastructure on Coolify and Hetzner

Part of our AI Tool Stack pillar — This article covers the infrastructure layer that powers our entire operating system.

Our infrastructure runs on a single principle: maximum automation, zero manual intervention. Every service—from object storage to LLM hosting to workflow automation—runs self-hosted on Hetzner servers, orchestrated by Coolify, and managed with near-total autonomy.

The Foundation: Coolify on Hetzner

Coolify is our platform orchestrator. It’s the control plane that manages every service in our stack without requiring Kubernetes complexity or cloud vendor lock-in.

Why Coolify?

  • Self-hosted: We own the entire stack
  • Docker-native: Every service runs in containers
  • Git-based deployments: Push to deploy, automatically
  • Zero vendor lock-in: Runs on any server, we chose Hetzner for cost efficiency
  • API-first: Everything can be automated via API

Why Hetzner?

  • Cost: 70% cheaper than AWS/GCP for equivalent resources
  • Performance: Excellent network and compute for the price
  • EU-based: Data sovereignty and GDPR compliance
  • Simple: No complex cloud abstractions, just servers

Our setup: Multiple Hetzner servers (CPX and CCX instances) running Coolify, with services distributed across them based on resource needs.

The Self-Hosted Stack

Every tool in our operating stack that requires infrastructure is self-hosted. Here’s what CORE manages:

Object Storage: MinIO

MinIO replaces S3 for all object storage needs. It’s our S3-compatible object storage running self-hosted on Hetzner:

  • Model artifacts: Trained models, embeddings, vector stores from our ML pipelines
  • Data pipelines: Raw data ingestion, processed datasets for Mage and Databricks
  • Backups: Automated daily backups of all databases and application state
  • Static assets: Generated content, media files, and user uploads
  • Vector stores: Embeddings and vector databases for RAG applications

Automation:

  • MinIO buckets are created automatically via Coolify when new projects spin up
  • Lifecycle policies auto-archive data older than 90 days to cold storage
  • Access keys rotate automatically every 30 days
  • Bucket policies are applied via infrastructure-as-code
  • Usage metrics feed into Grafana for cost tracking

LLM Hosting: LiteLLM

LiteLLM is our unified LLM gateway running self-hosted. It’s the single entry point for all LLM calls across our ventures:

  • Multi-model support: Routes to OpenAI, Anthropic, local models via Ollama, or any OpenAI-compatible API
  • Cost tracking: Per-project, per-model usage analytics with automatic spend attribution
  • Fallback routing: Automatic failover if a model is unavailable or rate-limited
  • Rate limiting: Prevents cost overruns with per-project, per-model limits
  • Request caching: Reduces costs by caching identical prompts
  • Streaming support: Handles streaming responses for real-time applications

Automation:

  • New projects get LiteLLM endpoints automatically configured in Coolify
  • Model selection is optimized based on cost and latency requirements (cheaper models for non-critical tasks)
  • Usage alerts trigger when daily/monthly thresholds are exceeded
  • Cost budgets are enforced automatically—requests are rejected if budget is exceeded
  • Model performance is tracked and automatically routes to best-performing models
  • All requests are logged to Langfuse for tracing and evaluation

Workflow Automation: n8n

n8n handles all workflow automation:

  • Agent orchestration: Coordinates between AI agents
  • Data pipelines: Moves data between services
  • API integrations: Connects external services
  • Scheduled tasks: Cron-like automation for maintenance

Automation: Workflows are version-controlled in Git. New workflows deploy automatically. Failed workflows trigger alerts and auto-retry with exponential backoff.

Analytics & Observability

Grafana provides unified dashboards:

  • Infrastructure metrics: CPU, memory, disk, network
  • Application metrics: Request rates, error rates, latency
  • Business metrics: User growth, revenue, experiment results

Langfuse traces all LLM calls:

  • Prompt/response logging: Every AI interaction is logged
  • Cost tracking: Per-request cost analysis
  • Quality evaluation: Automatic evaluation of AI outputs

Metabase serves as our decision cockpit:

  • 30-day venture reviews: Automated reports for experiment evaluation
  • Financial guardrails: Real-time spend tracking
  • Custom dashboards: Built automatically from database schemas

Automation: Dashboards update in real-time. Alerts trigger on anomalies. Reports generate automatically on schedule.

The Automation Layer

What makes this infrastructure truly autonomous:

1. Git-Based Infrastructure as Code

Every service configuration lives in Git:

  • Coolify projects: Defined as YAML files
  • Environment variables: Version-controlled, encrypted
  • Service dependencies: Declared explicitly
  • Scaling rules: Auto-scaling based on metrics

Result: Infrastructure changes are reviewed, versioned, and deployed like code. Rollbacks are instant.

2. Automated Provisioning

When a new venture spins up:

  1. Coolify creates the project automatically
  2. MinIO buckets are provisioned
  3. LiteLLM endpoints are configured
  4. n8n workflows are deployed
  5. Monitoring is set up automatically
  6. Alerts are configured

Result: A new venture gets full infrastructure in under 5 minutes, zero manual steps.

3. Self-Healing Systems

  • Health checks: Every service has automated health monitoring
  • Auto-restart: Failed containers restart automatically
  • Resource limits: CPU/memory limits prevent resource exhaustion
  • Backup automation: Databases and state back up daily
  • Disaster recovery: Automated restore procedures tested monthly

Result: Infrastructure runs 24/7 with minimal intervention. Issues resolve themselves.

4. Cost Optimization

  • Resource right-sizing: Services scale down during low usage
  • Idle resource detection: Unused services are flagged for shutdown
  • Cost alerts: Spending thresholds trigger notifications
  • Automated cleanup: Old data, logs, and artifacts are purged automatically

Result: Infrastructure costs stay predictable and optimized.

Real-World Example: Spinning Up a New Venture

Here’s what happens when we launch a new experiment:

Day 1, 9:00 AM: New venture approved in Linear

  • 9:01 AM: Coolify receives webhook, creates new project with resource limits
  • 9:02 AM: MinIO buckets provisioned automatically (data, models, backups, static assets)
  • 9:03 AM: LiteLLM endpoint created with model routing rules and cost budgets
  • 9:04 AM: n8n workflows deployed from Git repository, webhooks configured
  • 9:05 AM: Grafana dashboards auto-generated from service discovery
  • 9:06 AM: Langfuse project created for LLM tracing, connected to LiteLLM
  • 9:07 AM: Metabase database connection configured, initial dashboards created
  • 9:08 AM: Health checks begin, alerts configured for all services
  • 9:09 AM: Infrastructure ready, deployment can begin

What’s automated:

  • All service configurations are pulled from Git
  • Environment variables are injected from Coolify’s secure storage
  • Service dependencies are resolved automatically (e.g., LiteLLM needs MinIO for caching)
  • Network policies are applied (services can only talk to whitelisted endpoints)
  • SSL certificates are provisioned automatically via Let’s Encrypt

Total time: 9 minutes. Manual steps: 0.

The CORE Operating Model

As Infrastructure Lead, my role is to:

  1. Design the infrastructure architecture
  2. Automate every operational task
  3. Monitor system health and performance
  4. Optimize costs and resource usage
  5. Document everything for reproducibility

But the goal is to make myself obsolete—the infrastructure should run itself.

Key Learnings

What Works

  • Coolify + Hetzner: Perfect balance of simplicity and power
  • Self-hosting everything: Complete control, predictable costs
  • Git-based config: Infrastructure changes are auditable
  • Automated provisioning: New projects spin up instantly
  • Comprehensive monitoring: We see issues before they become problems

What We’d Change

  • More Hetzner regions: Currently single-region, would benefit from multi-region for redundancy
  • Backup automation: Could be more granular (per-service, not just daily)
  • Cost optimization: More aggressive auto-scaling during low usage periods

What’s Next

  • Multi-region deployment: Replicate infrastructure across Hetzner regions
  • Advanced auto-scaling: ML-based prediction of resource needs
  • Zero-downtime deployments: Blue-green deployments for all services
  • Infrastructure testing: Automated chaos engineering to test resilience

Getting Started

If you want to replicate this setup:

  1. Start with Coolify: Deploy on a Hetzner server (CPX21 is enough to start)
  2. Add MinIO: First service to deploy, needed by everything else
  3. Deploy LiteLLM: If you’re using LLMs, this is essential
  4. Set up n8n: For workflow automation
  5. Add monitoring: Grafana + Prometheus for metrics
  6. Automate everything: Use Coolify’s API and Git-based configs

Cost estimate: ~€50-100/month for a full stack on Hetzner (vs €500-1000/month on AWS).


The infrastructure layer is invisible when it works perfectly. That’s the goal—complete automation so we can focus on building, not managing servers.