Lead Site Reliability Engineer

The SRE Leader will join Couchbase’s cloud leadership team to lead the newly formed Cloud Operations Teams and help build the function within the Cloud organization. The SRE team is responsible for the availability and support of the service to customers by performing full-stack observability, level one alerting. reliability engineering and incident response for Couchbase’s cloud organization. In this role, you will set the strategy and operational KPI’s for the SRE organization and the applications supported by the cloud organization. Partnering with our engineering leadership and cloud leadership, you will work to build our Service level indicators (SLI), Service Level Objectives (SLO), Service level agreements (SLA’s) and Error budgets for our services. As part of this role, you will lead customer escalations and will build a close relationship with our engineering and product organizations.


  • Own the end-to-end availability (SLO/SLA), reliability, and performance of Couchbase’s Cloud offerings.

  • Develop automation, processes and metrics to ensure maximum reliability and uptime for our customers

  • Establish an on-call cadence with the team and ensure adequate coverage areas

  • Foster a healthy and collaborative culture, in line with Couchbase’ core values

  • Participate in 24x7 Site Reliability rotations and escalation workflows

  • Serve as a change board approver and incident manager

  • Serve as project manager or scrum master for major initiatives and train the team to be the first line of support

  • Present quarterly operations review in addition to other more routine reporting obligations

  • Represent Couchbase in customer meetings and serve as a customer advocate in influencing product roadmap and improvements

  • Take ownership of many controls, processes, and risks required to maintain our compliance portfolio (SOC 2, PCI-DSS, GDPR, and HIPAA, among others)

  • Collaborate with the Cloud Engineering team to understand deployment practices and processes and work towards iteratively improving the release pipeline to ensure a highly resilient deployment strategy, ideally with zero downtime


  • At least 5 years of work experience in Site Reliability/Infrastructure Engineering for a team operating in public cloud

  • A passion for SRE/DevOps and running highly resilient/automated systems

  • Proficient working with Terraform configuration management tools, version control systems (Git), integrating with CI/CD platforms and tool chains such as CircleCI, GitHub.

  • Deep working experience on cloud platforms like Amazon Web Services and open source software like Kubernetes, Prometheus, Datadog etc.

  • Experience developing or integrating Chaos Engineering tool chains or methodologies

  • Manage on-call rotations across continents, using a follow-the-sun model and handle incidence response to ensure high-availability

  • Regularly report on availability and incidents to senior management

  • Build a team culture to aim for high service availability, scalability and observability goals

  • Bias towards data driven decisions and ensuring key metrics are agreed on, visible and actionable

  • BS/BE/Masters in Computer Science

About Couchbase

Couchbase's mission is to be the platform that accelerates application innovation. To make this possible, Couchbase created an enterprise-class, multi-cloud NoSQL database architected on top of an open source foundation. Couchbase is the only database that combines the best of NoSQL with the power and familiarity of SQL, all in a single, elegant platform spanning from any cloud to the edge.

Couchbase has become pervasive in our everyday lives; our customers include industry leaders Amadeus, AT&T, BD (Becton, Dickinson and Company), Carrefour, Comcast, Disney, DreamWorks Animation, eBay, Marriott, Neiman Marcus, Tesco, Tommy Hilfiger, United, Verizon, Wells Fargo, as well as hundreds of other household names.

Couchbase’s HQ is conveniently located in Santa Clara, CA with additional offices throughout the globe. We’re committed to a work environment where you can be happy and thrive, in and out of the office.

At Couchbase, you’ll get:

  • A fantastic culture

  • A focused, energetic team with aligned goals

  • True collaboration with everyone playing their positions

  • Great market opportunity and growth potential

  • Time off when you need it.

  • Regular team lunches and fully-stocked kitchens.

  • Open, collaborative spaces.

  • Competitive benefits and pre-tax commuter perks

Whether you’re a new grad or a proven expert, you’ll have the opportunity to learn new skills, grow your career, and work with the smartest, most passionate people in the industry.

Revolutionizing an industry requires a top-notch team. Become a part of ours today. Bring your big ideas and we'll take on the next great challenge together.

Check out some recent industry recognition:

  • Couchbase Named a Leader: Forrester Wave Big Data NoSQL Report

  • Wealth Front Career-Launching Companies List 2018

  • Forbes Next Billion Dollar Startups 2018

  • 2018 Deloitte Fast 500

  • 2018 DBTA Readers’ Choice Award for Best In-Memory Solutions

  • Big Data 100: 35 Coolest Data Management And Integration Vendors

  • No. 17 on Forbes’ list of Best Big Data Companies to Work For in 2017

Want to learn more? Check out our blog: https://blog.couchbase.com/

Couchbase is proud to be an equal opportunity workplace. Individuals seeking employment at Couchbase are considered without regards to age, ancestry, color, gender (including pregnancy, childbirth, or related medical conditions), gender identity or expression, genetic information, marital status, medical condition, mental or physical disability, national origin, protected family care or medical leave status, race, religion (including beliefs and practices or the absence thereof), sexual orientation, military or veteran status, or any other characteristic protected by federal, state, or local laws.

Apply Now

Back to jobs