Senior Lead DevOps Engineer
YPO
11d ago
0DevopsUnited Stateshimalayas
DevOps-EngineeringCloud-InfrastructurePlatform-EngineeringSite-Reliability-EngineeringKubernetesSenior
Job Description
Position PurposeYPO is seeking a Senior / Lead DevOps Engineer to design, build, and operate the cloud infrastructure and developer platform that will power its next generation of products. This is a hands-on technical leadership role spanning the full DevOps surface — cloud infrastructure, CI/CD pipelines, release engineering, observability, platform reliability, and developer experience — all in service of a rapidly scaling AI-first mobile platform.You will be a close partner to the Director of Product, Lead Security Engineer, and mobile engineering leadership — connecting platform reliability to product velocity and security posture in equal measure. You will bring strong technical depth, a product-minded approach to internal tooling, and the communication skills to champion engineering excellence across the organisation.Key ResponsibilitiesCloud Infrastructure Design and OperationsOwn the architecture and day-to-day operation of YPO's cloud infrastructure across its full lifecycle.Architect, implement, and continuously evolve YPO's cloud infrastructure across AWS, Azure, and/or GCP — ensuring it is scalable, resilient, cost-efficient, and production-ready for a global AI-first platform.Design and manage multi-region, highly available environments that meet YPO's performance and uptime requirements for a 35,000+ member global community.Own cloud cost management and FinOps practices — implementing tagging strategies, reserved capacity planning, and anomaly detection to optimise infrastructure spend without sacrificing reliability.Lead the evaluation and adoption of new cloud services, platforms, and tooling — making well-reasoned build-vs-buy decisions based on engineering impact and long-term maintainability.Manage DNS, CDN, load balancing, and networking configurations across cloud environments, ensuring global performance and failover capabilities.Infrastructure as Code and AutomationCodify everything. If it cannot be automated, it should be questioned.Lead YPO's Infrastructure as Code practice using Terraform as the primary tool, ensuring all infrastructure is version-controlled, reviewed, tested, and deployed through automation — never manually.Define and enforce IaC standards, module structures, and governance practices across the engineering organisation, ensuring infrastructure code is readable, reusable, and maintainable over time.Automate environment provisioning, teardown, and configuration management for development, staging, and production environments — enabling engineers to spin up and destroy environments on demand.Build and maintain automation pipelines for routine operational tasks including certificate rotation, secret rotation, compliance remediation, and infrastructure drift detection.Write clean, well-tested automation scripts in Python, Bash, or equivalent — treating operational scripts with the same engineering rigour applied to product code.CI/CD Pipeline Design and Release EngineeringAccelerate the path from commit to production without sacrificing quality or safety.Design, build, and maintain end-to-end CI/CD pipelines for YPO's mobile (iOS and Android), backend API, AI platform, and data engineering workloads — reducing time-to-deploy and increasing deployment frequency.Implement branch strategies, environment promotion workflows, and feature flagging patterns that allow teams to ship incrementally and safely to a global production audience.Integrate automated quality gates — unit tests, integration tests, security scans (SAST/DAST/SCA), container scanning, and IaC linting — as non-negotiable steps in every pipeline.Lead the adoption of progressive delivery techniques including blue-green deployments, canary releases, and traffic shifting to minimise deployment risk and enable rapid rollback.Partner with the Lead Security Engineer to embed security and compliance checks into every pipeline stage, ensuring secure-by-default releases across all environments.Own release documentation, change management workflows, and deployment runbooks — ensuring all production changes are auditable, traceable, and recoverable.Container Orchestration and Platform EngineeringBuild the platform that the platform runs on.Design, operate, and continuously improve YPO's container orchestration infrastructure using Kubernetes (EKS, AKS, or GKE), ensuring reliable scheduling, resource efficiency, and operational simplicity.Manage container image governance, including base image standards, image scanning pipelines, registry management, and deprecation policies for outdated or vulnerable images.Implement and maintain service mesh, ingress controllers, network policies, and inter-service security patterns appropriate for YPO's AI platform and mobile API surfaces.Evaluate and adopt platform engineering tools that improve developer self-service — internal developer platforms (IDPs), environment-as-a-service patterns, and golden path templates that let engineers provision what they need without DevOps as a bo
