Zalion

Platform Engineer

Munich, Germany · Posted 3 months ago
Tech Stack
AWSTerraformGitHub ECSFargateKubernetesGrafanaPrometheusOpenTelemetry
Language Requirements
English
Requirements
Mid Seniority
4+ years Experience
No Degree
Zalion is on a mission to eliminate repetitive procurement work through agentic AI. We’re building autonomous agents that operate deep within enterprise procurement — navigating messy data, legacy systems, and complex workflows to deliver real impact. **Join us early and help define how enterprise AI is done right.** Zalion is on a mission to eliminate repetitive procurement work through agentic AI. We’re building autonomous agents that operate deep within enterprise procurement — navigating messy data, legacy systems, and complex workflows to deliver real impact. **Join us early and help define how enterprise AI is done right.** ## Tasks You will: * **Own our platform foundations end-to-end** — from AWS architecture and IaC to CI/CD, observability, and incident readiness. * Build and evolve **secure, scalable AWS infrastructure** (networking, compute, storage, IAM) optimized for reliability and cost. * Design and maintain **CI/CD pipelines on GitHub** that are fast, repeatable, and developer-friendly (clear feedback loops, safe deploys, strong defaults). * Define and operate infrastructure using **Terraform** — with clean modules, sensible standards, and automated validation. * Improve **developer experience** through golden paths: templates, self-service environments, paved roads for deployments, and internal tooling that removes friction. * Drive **availability, scalability, and resilience**: deployment strategies, rollbacks, capacity planning, DR thinking, and performance tuning. * Implement pragmatic **security-by-default**: least privilege IAM, secrets management, secure supply chain, and guardrails that enable speed without compromising safety. * Establish and refine **observability and reliability practices** (SLOs/SLIs, monitoring, alerting, postmortems, runbooks) that scale with the team. * Partner closely with product engineering to reduce operational load and keep delivery velocity high as Zalion grows. ## Requirements * Strong experience as a **Platform / DevOps / Site Reliability Engineer** in product teams shipping to production. * Deep practical knowledge of **AWS**: networking, IAM, security controls, and designing for failure. * Hands-on expertise with **Terraform**: modules, state strategy, DRY patterns, environment separation, and automated reviews. * Solid CI/CD engineering experience with **GitHub**: pipeline design, artifact/versioning, deployment safety, and fast feedback loops. * A strong mindset for **reliability and operability**: you think in failure modes, automation, and measurable outcomes (SLOs). * Security awareness and discipline: you build **guardrails** that make the secure path the easy path. * A **builder mindset**: you ship improvements, measure impact (lead time, deploy frequency, MTTR), and iterate. * Comfort with **ambiguity and ownership**: you proactively identify platform bottlenecks and fix them without waiting for perfect specs. * **4+ years** experience in relevant roles (startup/scale-up experience is a plus). ## Benefits * Build the platform behind agentic AI systems that run in real enterprise environments * Massive autonomy, zero bureaucracy * Immediate impact — your work accelerates every engineer and every release * Modern stack, no legacy constraints * Competitive salary + meaningful equity * High-end equipment **🛠️ Tech Stack You’ll Work With** * **AWS** (core services; compute, networking, IAM, logging/monitoring, managed data services) * **Terraform** (modules, workspaces, validation, state management) * **GitHub** (Actions, CI/CD workflows, checks, release automation) * Containers orchestration (e.g., **ECS/Fargate** and/or Kubernetes depending on evolution) * Observability tooling (metrics, logs, tracing; e.g., Grafana/Prometheus/OpenTelemetry and friends) * Security tooling (SAST/DAST, dependency scanning, secrets scanning, policy as code

Tasks

You will: * **Own our platform foundations end-to-end** — from AWS architecture and IaC to CI/CD, observability, and incident readiness. * Build and evolve **secure, scalable AWS infrastructure** (networking, compute, storage, IAM) optimized for reliability and cost. * Design and maintain **CI/CD pipelines on GitHub** that are fast, repeatable, and developer-friendly (clear feedback loops, safe deploys, strong defaults). * Define and operate infrastructure using **Terraform** — with clean modules, sensible standards, and automated validation. * Improve **developer experience** through golden paths: templates, self-service environments, paved roads for deployments, and internal tooling that removes friction. * Drive **availability, scalability, and resilience**: deployment strategies, rollbacks, capacity planning, DR thinking, and performance tuning. * Implement pragmatic **security-by-default**: least privilege IAM, secrets management, secure supply chain, and guardrails that enable speed without compromising safety. * Establish and refine **observability and reliability practices** (SLOs/SLIs, monitoring, alerting, postmortems, runbooks) that scale with the team. * Partner closely with product engineering to reduce operational load and keep delivery velocity high as Zalion grows.

Requirements

* Strong experience as a **Platform / DevOps / Site Reliability Engineer** in product teams shipping to production. * Deep practical knowledge of **AWS**: networking, IAM, security controls, and designing for failure. * Hands-on expertise with **Terraform**: modules, state strategy, DRY patterns, environment separation, and automated reviews. * Solid CI/CD engineering experience with **GitHub**: pipeline design, artifact/versioning, deployment safety, and fast feedback loops. * A strong mindset for **reliability and operability**: you think in failure modes, automation, and measurable outcomes (SLOs). * Security awareness and discipline: you build **guardrails** that make the secure path the easy path. * A **builder mindset**: you ship improvements, measure impact (lead time, deploy frequency, MTTR), and iterate. * Comfort with **ambiguity and ownership**: you proactively identify platform bottlenecks and fix them without waiting for perfect specs. * **4+ years** experience in relevant roles (startup/scale-up experience is a plus).

Benefits

* Build the platform behind agentic AI systems that run in real enterprise environments * Massive autonomy, zero bureaucracy * Immediate impact — your work accelerates every engineer and every release * Modern stack, no legacy constraints * Competitive salary + meaningful equity * High-end equipment **🛠️ Tech Stack You’ll Work With** * **AWS** (core services; compute, networking, IAM, logging/monitoring, managed data services) * **Terraform** (modules, workspaces, validation, state management) * **GitHub** (Actions, CI/CD workflows, checks, release automation) * Containers orchestration (e.g., **ECS/Fargate** and/or Kubernetes depending on evolution) * Observability tooling (metrics, logs, tracing; e.g., Grafana/Prometheus/OpenTelemetry and friends) * Security tooling (SAST/DAST, dependency scanning, secrets scanning, policy as code