Principal SRE ( Site Reliability Engineer)



Role:


Cribl Inc is seeking an incredibly talented Principal Site Reliability Engineer to join our mission to unlock the value of all machine data. Cribl provides users a new level of observability, intelligence and control over their real-time data. You will join a team of technical engineers who are committed to shipping only high-quality software and enjoying all the goat gifs the internet has to offer. This role is remote and you will report into the engineering organization where you will contribute in our efforts to envision, create and run Cribl Cloud offering.


In this role you will engage with engineering teams across the Cribl products and implement modern interpretations of SRE, observability, Chaos Engineering and DevOps. Our software is deployed in some of the largest organizations in the world processing 100s of TB to PB of IT & Security data.


Responsibilities:



  • You will drive technology direction within the product domain and influence setting the bar for the engineering team(s) for technical excellence and delivery.

  • You will work closely with other technical leader and architect(s) to understand and capture product dependencies and align/prioritize project execution; able to challenge decisions when needed

  • You will provide thought leadership for the vision, strategy and architecture for the SRE and SaaS engineering teams.

  • You will work with engineers to ensure that the designed solution responds to non-functional requirements such as availability, performance, security, and maintainability.

  • Improve the reliability of our systems by working with engineers to ensure that the software delivery pipeline is as efficient as possible.

  • Mentor our engineers to achieve more than they thought possible. You enjoy making other teams successful and are fulfilled through the success of others.

  • You will write and update documentation, including runbooks/playbooks



    • You will automate work including infrastructure needs, testing, failover solutions, failure mitigation, and much more

    • You will debug complex problems across an entire stack and creating solid solutions




Minimum Qualifications:



  • 10+ years experience with software engineering, software development, and/or system operations

  • Able to lead complex initiatives/projects from inception to completion

  • Experience building, and operating large-scale production systems

  • Knowledge of Container technologies, Python, Go, Java/JS/TS & source control (Git, GitHub)

  • Experience working with container deployment and orchestration technologies with knowledge of fundamentals including service discovery, deployments, monitoring, scheduling, and load balancing.

  • Understanding of Systems programming (network stack, file system, OS services) and networking (L2 vs. L3, network architecture, VLANs)

  • Experience identifying performance bottlenecks, identifying anomalous system behavior, and resolving root cause of service issues.

  • You have skills to work across teams and functions to influence design, operations and deployment of available software.


Bonus Points/Preferred Qualifications:



  • Experience with development and deployment in a hosted cloud environment, preferably AWS & GCP.

  • Experience with running containerized environments and understanding of multi-tenancy and security implications.

  • Experience with optimized and scalable software that operates on a large number of nodes.

  • Experience with monitoring and observability tools and applications, such as Data Dog or Elastic Search.

  • Experience automating infrastructure, testing, and deployments using tools like Cloud Formation, Chef, or Terraform and can explain the Infrastructure as Code paradigm

What we offer:



  • Competitive Salary

  • Stock Options

  • Medical, dental, and vision insurance

  • Flexible spending account (FSA)

  • 401(k) plan offered

  • Parental Leave

  • Professional Development and Career Growth

  • Generous Vacation and Holiday Policy, including 2 Floating Holidays to use for holidays you observe

  • Social Responsibility Employee Group that reflects our value-driven company culture


Diversity drives innovation, enables better decisions to support our customers, and inspires change for the better. We’re building a culture where differences are valued and welcomed. We work together to bring out the best in each other. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, or any other applicable legally protected characteristics in the location in which the candidate is applying.



Apply Now

Back to jobs