Junior Site Reliability Engineer
Remote located in the UTC -4 to -6 region
Snowplow enables you to track any event data; ask any question of that data and use any tool you want to answer it. We want to empower people and companies to do transformative things using data.
As a company, we have almost doubled in size over the last 18 months and we’re not looking to slow down. To support our growth, we are now looking for a Junior SRE (Site Reliability Engineer) to join our Tech Ops Team. You’ll be working closely with our Cloud leads on AWS & GCP to help develop and rollout improvements and new features for our current stack and helping keep everything running smoothly. We would love to hear from you if the idea of programmatically controlling thousands of production environments excites you!.
Our Private SaaS offering has grown significantly over the past year and we now orchestrate and monitor Snowplow event pipelines across more than 150 customer-owned AWS & GCP sub-accounts. Each account has its own individualised and optimised stack and all are capable of processing many billions of events per month.
Tech Ops has two areas of responsibility: the centralised services we provide customers and their pipeline infrastructure hosted in their own AWS or GCP accounts. Within both domains we are striving to increase service reliability, fulfil customer requests in a timely fashion, and automate recurring tasks. Task automation is essential as our customer base grows, because our infrastructure estate scales linearly with our customer numbers, unlike most software businesses.
We are looking for a Junior SRE to help us grow to managing 1,000 and then 10,000 AWS, GCP & Azure accounts. They will have a keen interest in learning more about monitoring and automation of infrastructure at scale and want to learn how to develop solutions to handle these scenarios. They will work closely with our Tech Ops Lead and experienced team of SREs on all aspects of our proprietary deployment, orchestration and monitoring stacks.
The challenge of automating the maintenance and deployment of thousands of individualised stacks is an enormously ambitious undertaking and a hugely exciting infrastructure automation challenge!
The environment you’ll be working in:
Our company values are Transparency, Honesty, Ownership, Inclusivity, Empowerment, Customer-centricity, Growth and Technical Excellence. These aren’t just words we plucked out of thin air, we came up with them together as a company and are continually looking to find new ways to weave these into our day to day operations. From flexible hours and working locations to the way we give feedback, we’re passionate about building a company that supports both company and individual development.
What you’ll be doing:
Maintaining and developing our growing Terraform infrastructure-as-code stacks which we use to deploy infrastructure for all internal and client use casesMaintaining our internal infrastructure stacks which include the HashiCorp suite as well as our Snowplow Insights UI and VPNsParticipating in our on-call rotation to help us serve our client base 24/7Taking rotations of L3 Technical Support where you will be responsible for triaging and dealing with infrastructure issuesSupporting the team in handling high-severity internal or customer incidents, ensuring we meet all SLAs.
What you bring to the team:
Has worked with a cloud hosting provider before (AWS, GCP or Azure)Has worked with Terraform, CloudFormation or some form of infrastructure-as-code toolingAny experience with the HashiCorp stack (Vault, Consul, Nomad) and understanding their role in infrastructure automation is a bonusHas worked with Docker and is familiar with container-based architecturesKnowledgeable about the Linux operating system and how to manage servers in a production capacityKnowledgeable about Cloud networking principles and how to troubleshoot issues in this spaceComfortable scripting in one or more of: Bash, Python, Ruby or PerlComfortable programming in one or more of: Java, Scala, Golang or Python.
What you’ll get in return
A competitive package based on experience, including:
- share options
- 25 days of holiday a year (plus bank holidays)
- MacBook or Dell XPS 13/15
- Two fantastic company Away Weeks in a different European city each year (the last one was in November 2019 in Bratislava)
- Work alongside a supportive and talented team with the opportunity to work on cutting edge technology and challenging problems
- Progress in your discipline alongside a team of experienced SREs
- Grow and develop in a fast-moving, collaborative organisation