Senior Cloud & AWS Support Engineer
DVT
4h ago
0DevAustralia, Canada, France +7 morehimalayas
Cloud-Support-EngineeringAWS-DevOpsSite-Reliability-EngineeringCloud-Infrastructure-EngineeringDevOps-EngineeringCloud-Support-EngineerSenior-Cloud-EngineerSenior-Support-EngineerSenior-Cloud-DevOps-EngineerSenior-AWS-EngineerSenior-Cloud-Systems-EngineerSenior-Engineer---Cloud-&-DevOps-(P)Senior
Job Description
DVT is one of the top software development companies on the continent, consulting on cutting-edge applications for leading enterprises in South Africa and globally. We are committed to continuously developing our people, with a strong culture of learning, internal speaking, and sponsored technical events across the AWS ecosystem.We are looking for a Senior Cloud & AWS Support Engineer to join our cloud team on a client-embedded engagement. This is a dual role: alongside building and maintaining cloud infrastructure and automation, you will provide day-to-day AWS operational support — owning incident, alert, and request triage, on-call response, and the operational health of the client's AWS estate. You sit within the client's Operations ("keeping the lights on") team, working closely with their platform and engineering teams to keep production stable while improving it.RequirementsDUTIES AND RESPONSIBILITIESAWS Support & OperationsAct as first responder for the client's AWS environment — triage, diagnose, and resolve incidents and service requests within agreed SLAsOwn alert handling across CloudWatch, GuardDuty, Security Hub, and AWS Health, including the email/notification alerting pipeline, routing, and noise reductionParticipate in the on-call rotation (incl. cross-timezone cover where client launches require it) and lead structured incident response and post-incident reviewsManage the operational ticket queue (incident, request, problem, and change), maintaining clear status, ownership, and communication to client stakeholdersBuild and maintain runbooks, playbooks, and knowledge-base articles to standardise response and enable faster, repeatable resolutionDrive problem management — identify recurring issues, perform root-cause analysis, and convert findings into permanent fixes and automationSupport patching, backup/restore verification, and routine operational maintenance under tagging- and SCP-based governanceInfrastructure & AutomationDesign, implement, and maintain scalable CI/CD pipelines (e.g. GitHub Actions, AWS CodePipeline, GitLab CI) for automated testing, deployment, and provisioningManage infrastructure-as-code with Terraform (primary), CloudFormation and CDK — remote state, modular refactoring, and multi-environment deploymentsDevelop ephemeral feature environments for isolated testing (Terraform workspaces / Terragrunt) and automate provisioning, routing (Route 53, ALB/NLB), and teardownApply scheduled shut-downs to drive cost management; use tagging, Kubernetes taints/tolerations, and tag-driven patching schedulesDesign and implement AWS networking — VPC architecture, security groups, NACLs, Transit Gateway, and hybrid connectivityDeployment & OperationsCollaborate with software teams to integrate and deploy backend services (Java, .NET, Python, Node.js) and containerised applicationsImplement blue/green or canary deployment strategies with safe, traceable rollouts and automated rollbackIntegrate unit and functional/API testing (pytest, Postman/Newman, LocalStack) into the CI pipelineEstablish conventions for test execution, image tagging, versioning, and reusable CI/CD componentsManage production deployments, change windows, and release coordination across environmentMonitoring, Security & ComplianceImplement comprehensive monitoring, logging, and observability (CloudWatch, X-Ray, third-party APM)Monitor and optimise system performance, deployment efficiency, resource utilisation, and cloud costsMaintain security best practices — Well-Architected & OWASP, secrets management (Secrets Manager, Parameter Store), IAM policies, SAST/DAST scanning, and compliance controlsConfigure alerting and incident-response workflows and lead post-incident reviewsEnsure compliance with relevant standards (ISO 27001, SOC 2, POPIA / GDPR) per client requirementsLeadership & Client EngagementProvide technical leadership and mentorship to junior engineers and client development teamsEngage directly with client stakeholders to understand requirements, advise, and present solutionsParticipate in architectural reviews, retrospectives, and planning to continuously improve tooling and processConduct training and create documentation to embed DevOps and operational best practicesContribute to pre-sales — solution design, effort estimation, and technical proposalsRequired Experience and Skills7+ years in Cloud DevOps, SRE, or AWS operational support, with strong CI/CD and infrastructure automation experienceHands-on experience running AWS managed support / operations — incident management, alert triage, on-call, and SLA-bound resolutionDeep AWS proficiency: EC2, ECS/EKS, Lambda, S3, DynamoDB, RDS, VPC, Route 53, CloudFront, IAM (Identity Center), CloudWatch, X-RayExpert IaC (Terraform required; CloudFormation beneficial) — remote state, modules, multi-environmentStrong CI/CD platform experience (GitHub Actions, CodePipeline, GitLab CI, or Jenkins)Advanced scripting in Bash or Python for automation and toolingContain
