← Back to all jobs
Partly

Site Reliability Engineer, ANZ

Partly

7d ago

0DevopsAustraliahimalayas
Site-Reliability-EngineerSite-Reliability-Operations-EngineerSenior-Site-Reliability-EngineerDevOps-Site-Reliability-EngineerSite-Reliability-Engineer-IIMid-level

Job Description

Note: Partly is headquartered in the UK, with a Product and Engineering base in Christchurch, and an early presence in San Francisco. If you are not based in Christchurch, we will fly you to HQ for 2 weeks for onboarding, as well as 1 week per quarter for our “Season Openers” (we pay for your travel and accommodation). If you are relocating to Christchurch from NZ or from overseas, we can also assist with relocation costs.🚀 Our storyPartly's mission is to connect the world's parts and we're doing that by building the first global platform for replacement parts, starting with auto parts. Our big vision is to accelerate the world toward a sustainable future where anyone can fix anything.Founded by ex-Rocket Lab engineers, we utilise cutting-edge technology to solve challenging but exciting problems that make a huge impact in a $1.9 trillion industry. We've more than tripled our team over the last 12 months and expect to double in size again over the coming 12 months. We're a global team spanning both Europe and Australasia.We provide a scalable digital infrastructure solution to some of the world's largest businesses and the most exciting startups. Partly's solutions are integrated across hundreds of companies globally, providing the backbone for cataloguing and managing parts online.Our investors in Blackbird Ventures (Canva, CultureAmp etc.), Square Peg, Octopus Ventures, Icehouse, Peter Beck (Rocket Lab), Akshay Kothari (Notion Co-Founder) and Dylan Field (Figma Co-Founder).We're continuing to build a world-class team and ensuring Partly is a place where people can do the best work of their lives. We're proud of the culture we've built at Partly, and our values are lived throughout every experience.🖍️ This roleSite Reliability Engineering (SRE) combines software and systems engineering to build and run large-scale, distributed systems, ensuring that both internally critical and externally visible services have the reliability, uptime, and performance appropriate to clients' needs while enabling a fast rate of improvement. SREs maintain constant awareness of system capacity and performance, ensuring our networks, platforms, and tools are scalable, secure, and reliable so engineers can focus on delivering impactful software. This senior role demands high autonomy, leadership, and strategic thinking, making it ideal for those excited by the challenge of designing and supporting the infrastructure that connects the worlds parts.💻 What will you doReliability Engineering: Ensure the stability, scalability, and security of our cloud infrastructure, Partly & 3rd party applications in our Kubernetes powered clusters. Leverage Infrastructure-as-Code and automation (Terraform for GCP, GitOps with ArgoCD, Custom scripts in Python/Bash, etc.) to deploy and manage workloads and resources in a repeatable, automated way.Cost Optimisation: Monitor and optimise costs across our cloud and on-prem infrastructure, ensuring we get maximum value from our investments. Make recommendations for resource allocation or architecture changes to improve cost-efficiency without sacrificing reliability or performance.Cross-Functional Collaboration: Work closely with developers, data engineers, and leadership to plan infrastructure needs and improvements. Provide tooling, guidance and training to the engineering team on SRE practices, and collaborate during software delivery to ensure smooth integrations from code to production.Software Engineering: Make sure our software meets high production readiness standards. When you see a problem or an opportunity to improve, you drive the solution.Troubleshooting: participate in incidents resolutions, give developers helping hand in debugging applications, networks, databases, compute systems.Want to learn more about the problems we're solving and the culture we're building at Partly? Hear directly from our team here: https://shorturl.at/DPDdl🥷 Your skillsSoftware Engineering: You excel at developing and maintaining large, established software systems beyond simple scripts and utilities. You definitely know what makes software maintainable and you are able to write robust code.Firmly grounded computer science fundamentals: Including data structures, concurrency, architecture, APIs, testing, and design patterns.System engineering fundamentals: You most likely know how to deploy and use memory or stack sampling profiler, how to locate excessive lock contention, how to identify network issues, etc.SRE Expertise: Hands-on experience with modern SRE practices and tooling – for example, containerization (Docker/Kubernetes), infrastructure-as-code (Terraform), and GitOps workflows (ArgoCD or equivalent). You have designed, built, and maintained scalable infrastructure and CI/CD systems.Cloud & Systems Knowledge: Deep familiarity with at least one major cloud platform and Linux operating system. You can tune servers, manage databases/storage, and wrangle Kubernetes clusters.Ownership & Leadership: High deg