← Back to all jobs
Roboflow

Infrastructure Engineer

Roboflow

5h ago

0DevopsUSAjobicy
Software EngineeringFull-TimeMidweight

Job Description

Roboflow - Infrastructure Engineer Who We Are Our mission is to make the world programmable. Sight is one of the key ways we understand the world, and soon this will be true for the software we use, too. At Roboflow, we’re building the tools, community, and resources needed to make the world programmable with artificial intelligence. Roboflow simplifies building and using computer vision models. Today, over 1M+ developers, including those from half the Fortune 100, use Roboflow’s machine learning open source and hosted tools. That includes counting cells to accelerate cancer research, improving construction site safety, digitizing floor plans, preserving coral reef populations, guiding drone flight, and much more. Our team is small relative to our impact, and we believe our user success is our success (not the inverse). A team member summarized: “Roboflow is a company full of giant brains and tiny egos.” We find software has a multiplier effect on all roles (not only product and engineering), so Roboflow employs developers across the company in design, sales, customer support, marketing, and beyond. We’re supported by great customers and investors, having raised over 63 million from Y Combinator, Google Ventures, Craft Ventures, Sam Altman, Lachy Groom, amongst other leading software investors. What We're Looking For Primarily, you like to make great things with passionate colleagues. You are someone that likes to own outcomes, not only inputs. You’re motivated by having responsibility and accountability. You’re eager to ‘do the work,’ big and small. You’re curious and learning about new technologies, perhaps an early tinkerer with MLOps products. You show more than you tell. You’re motivated by the question, “How can I improve this?” and have a track record of doing so, even in ways adjacent to your role. Much of our current team is made up of former founders and thrive in the level of autonomy at Roboflow. Maybe you had a side hustle in high school or college. Many Roboflowers have used our tools before joining. One of the best ways to stand out amongst other applicants is to write about something you have built with Roboflow or contribute to one of our open source projects. Likewise we highly value users with meaningful contributions to successful open source devtool and security projects. What You'll Do The focus of this role is on securing, scaling, and maintaining the infrastructure that powers our product backend, including: our cloud architecture, databases, file storage, search cluster, micro-services, and machine learning pipelines. You'll be working alongside our existing infrastructure team along with doing cross-team work spanning product, operations and customer-facing projects and should have the ability to context switch across a wide range of infrastructure, security and systems engineering work in a fast-paced startup environment. Skillset Some or all of the following would be helpful: Production experience with Kubernetes Infrastructure-as-code - Terraform, Kubernetes Helm charts, bash scripting and Python-based automation in production environments Scale - operating infrastructure for large scale applications, especially in the machine learning/AI space Site reliability - alerting, monitoring, scaling services in AWS and GCP clouds Node.js and Python programming skills; ability to work with full-stack developers on designing, developing, and operating SaaS applications Experience with machine learning/big data at scale (GPU, Docker and Kubernetes) Experience with CI/CD automation (for example Github actions, Spacelift) Prior experience with machine learning libraries and stacks (Pytorch, PyTorch, Tensorflow, OpenCV, Supervision) etc. is a plus. Awareness of security best practices and tightening infrastructure for highly secure cloud operations; ideally experienced in a GDPR, ISO 27001 and/or SoC2 certification for SaaS applications Implement engineering security and reliability best practices in roboflow applications and infrastructure Examples of tasks Running a high availability machine learning inference service Work with customer security teams to securely integrate Roboflow with their systems Develop infrastructure-as-code solutions to scale Roboflow in a cost-effective manner Work with the engineering team to define SLAs and SLOs, and participate in addressing security and reliability incidents across the platform Diving into cost optimization opportunities across the Roboflow stack Be part of teams designing and deploying new product features, including hands on coding in Python, Javascript and other related technologies Work with SoC2, HiPAA and GDPR requirements by improving security across systems and processes at Roboflow, making Roboflow audit-ready for the highest security standards in the industry Participate in on-call rotations Within one week, you will… Learn all about computer vision, our product, company, customers, and vision. Ship something substantial to an e