How we run a fully automated infrastructure on Coolify and Hetzner

Part of our AI Tool Stack pillar — This article covers the infrastructure layer that powers our entire operating system.

Our infrastructure runs on a single principle: maximum automation, zero manual intervention. Every service—from object storage to LLM hosting to workflow automation—runs self-hosted on Hetzner servers, orchestrated by Coolify, and managed with near-total autonomy.

The Foundation: Coolify on Hetzner

Coolify is our platform orchestrator. It’s the control plane that manages every service in our stack without requiring Kubernetes complexity or cloud vendor lock-in.

Why Coolify?

Self-hosted: We own the entire stack
Docker-native: Every service runs in containers
Git-based deployments: Push to deploy, automatically
Zero vendor lock-in: Runs on any server, we chose Hetzner for cost efficiency
API-first: Everything can be automated via API

Why Hetzner?

Cost: 70% cheaper than AWS/GCP for equivalent resources
Performance: Excellent network and compute for the price
EU-based: Data sovereignty and GDPR compliance
Simple: No complex cloud abstractions, just servers

Our setup: Multiple Hetzner servers (CPX and CCX instances) running Coolify, with services distributed across them based on resource needs.

The Self-Hosted Stack

Every tool in our operating stack that requires infrastructure is self-hosted. Here’s what CORE manages:

Object Storage: MinIO

MinIO replaces S3 for all object storage needs. It’s our S3-compatible object storage running self-hosted on Hetzner:

Model artifacts: Trained models, embeddings, vector stores from our ML pipelines
Data pipelines: Raw data ingestion, processed datasets for Mage and Databricks
Backups: Automated daily backups of all databases and application state
Static assets: Generated content, media files, and user uploads
Vector stores: Embeddings and vector databases for RAG applications

Automation:

MinIO buckets are created automatically via Coolify when new projects spin up
Lifecycle policies auto-archive data older than 90 days to cold storage
Access keys rotate automatically every 30 days
Bucket policies are applied via infrastructure-as-code
Usage metrics feed into Grafana for cost tracking

LLM Hosting: LiteLLM

LiteLLM is our unified LLM gateway running self-hosted. It’s the single entry point for all LLM calls across our ventures:

Multi-model support: Routes to OpenAI, Anthropic, local models via Ollama, or any OpenAI-compatible API
Cost tracking: Per-project, per-model usage analytics with automatic spend attribution
Fallback routing: Automatic failover if a model is unavailable or rate-limited
Rate limiting: Prevents cost overruns with per-project, per-model limits
Request caching: Reduces costs by caching identical prompts
Streaming support: Handles streaming responses for real-time applications

Automation:

New projects get LiteLLM endpoints automatically configured in Coolify
Model selection is optimized based on cost and latency requirements (cheaper models for non-critical tasks)
Usage alerts trigger when daily/monthly thresholds are exceeded
Cost budgets are enforced automatically—requests are rejected if budget is exceeded
Model performance is tracked and automatically routes to best-performing models
All requests are logged to Langfuse for tracing and evaluation

Workflow Automation: n8n

n8n handles all workflow automation:

Agent orchestration: Coordinates between AI agents
Data pipelines: Moves data between services
API integrations: Connects external services
Scheduled tasks: Cron-like automation for maintenance

Automation: Workflows are version-controlled in Git. New workflows deploy automatically. Failed workflows trigger alerts and auto-retry with exponential backoff.

Analytics & Observability

Grafana provides unified dashboards:

Infrastructure metrics: CPU, memory, disk, network
Application metrics: Request rates, error rates, latency
Business metrics: User growth, revenue, experiment results

Langfuse traces all LLM calls:

Prompt/response logging: Every AI interaction is logged
Cost tracking: Per-request cost analysis
Quality evaluation: Automatic evaluation of AI outputs

Metabase serves as our decision cockpit:

30-day venture reviews: Automated reports for experiment evaluation
Financial guardrails: Real-time spend tracking
Custom dashboards: Built automatically from database schemas

Automation: Dashboards update in real-time. Alerts trigger on anomalies. Reports generate automatically on schedule.

The Automation Layer

What makes this infrastructure truly autonomous:

1. Git-Based Infrastructure as Code

Every service configuration lives in Git:

Coolify projects: Defined as YAML files
Environment variables: Version-controlled, encrypted
Service dependencies: Declared explicitly
Scaling rules: Auto-scaling based on metrics

Result: Infrastructure changes are reviewed, versioned, and deployed like code. Rollbacks are instant.

2. Automated Provisioning

When a new venture spins up:

Coolify creates the project automatically
MinIO buckets are provisioned
LiteLLM endpoints are configured
n8n workflows are deployed
Monitoring is set up automatically
Alerts are configured

Result: A new venture gets full infrastructure in under 5 minutes, zero manual steps.

3. Self-Healing Systems

Health checks: Every service has automated health monitoring
Auto-restart: Failed containers restart automatically
Resource limits: CPU/memory limits prevent resource exhaustion
Backup automation: Databases and state back up daily
Disaster recovery: Automated restore procedures tested monthly

Result: Infrastructure runs 24/7 with minimal intervention. Issues resolve themselves.

4. Cost Optimization

Resource right-sizing: Services scale down during low usage
Idle resource detection: Unused services are flagged for shutdown
Cost alerts: Spending thresholds trigger notifications
Automated cleanup: Old data, logs, and artifacts are purged automatically

Result: Infrastructure costs stay predictable and optimized.

Real-World Example: Spinning Up a New Venture

Here’s what happens when we launch a new experiment:

Day 1, 9:00 AM: New venture approved in Linear

9:01 AM: Coolify receives webhook, creates new project with resource limits
9:02 AM: MinIO buckets provisioned automatically (data, models, backups, static assets)
9:03 AM: LiteLLM endpoint created with model routing rules and cost budgets
9:04 AM: n8n workflows deployed from Git repository, webhooks configured
9:05 AM: Grafana dashboards auto-generated from service discovery
9:06 AM: Langfuse project created for LLM tracing, connected to LiteLLM
9:07 AM: Metabase database connection configured, initial dashboards created
9:08 AM: Health checks begin, alerts configured for all services
9:09 AM: Infrastructure ready, deployment can begin

What’s automated:

All service configurations are pulled from Git
Environment variables are injected from Coolify’s secure storage
Service dependencies are resolved automatically (e.g., LiteLLM needs MinIO for caching)
Network policies are applied (services can only talk to whitelisted endpoints)
SSL certificates are provisioned automatically via Let’s Encrypt

Total time: 9 minutes. Manual steps: 0.

The CORE Operating Model

As Infrastructure Lead, my role is to:

Design the infrastructure architecture
Automate every operational task
Monitor system health and performance
Optimize costs and resource usage
Document everything for reproducibility

But the goal is to make myself obsolete—the infrastructure should run itself.

Key Learnings

What Works

Coolify + Hetzner: Perfect balance of simplicity and power
Self-hosting everything: Complete control, predictable costs
Git-based config: Infrastructure changes are auditable
Automated provisioning: New projects spin up instantly
Comprehensive monitoring: We see issues before they become problems

What We’d Change

More Hetzner regions: Currently single-region, would benefit from multi-region for redundancy
Backup automation: Could be more granular (per-service, not just daily)
Cost optimization: More aggressive auto-scaling during low usage periods

What’s Next

Multi-region deployment: Replicate infrastructure across Hetzner regions
Advanced auto-scaling: ML-based prediction of resource needs
Zero-downtime deployments: Blue-green deployments for all services
Infrastructure testing: Automated chaos engineering to test resilience

Getting Started

If you want to replicate this setup:

Start with Coolify: Deploy on a Hetzner server (CPX21 is enough to start)
Add MinIO: First service to deploy, needed by everything else
Deploy LiteLLM: If you’re using LLMs, this is essential
Set up n8n: For workflow automation
Add monitoring: Grafana + Prometheus for metrics
Automate everything: Use Coolify’s API and Git-based configs

Cost estimate: ~€50-100/month for a full stack on Hetzner (vs €500-1000/month on AWS).

AI Tool Stack — Complete overview of all tools in our operating stack
Auto Agent Workflows — How agents interact with this infrastructure
Post-Human Venture Engine — The operating system this infrastructure powers
Venture Studio Operations — How we manage multiple ventures on this infrastructure

The infrastructure layer is invisible when it works perfectly. That’s the goal—complete automation so we can focus on building, not managing servers.