Site Reliability Engineer
ShareStream Education is a leader in online video and media management solutions for academic institutions. Our team is passionate about building a great product that is continually evolving and providing a service that allows our customers to realize the vast potential of streaming media for education.
ShareStream Education is deeply committed to achieving client successes and building strong relationships with the Company’s clients, whom we regard as our partners.
Join us and contribute to changing the way online education takes place through the use of streaming media!
The Site Reliability will work remotely. ShareStream Education will not accept resumes from recruiters for this position.
ShareStream is seeking a multitalented, dedicated Site Reliability Engineer who excels at automating engineering operations and building high-availability and fault-tolerant systems. The Site Reliability Engineer will:
- Enhance and operate the continuous integration and continuous delivery (CI/CD) pipeline for multiple applications
- Operate the Kubernetes platform and perform day-to-day monitoring and maintenance
- Automate upgrades, scaling, and other operational needs as required
- Deploy new releases across multiple SaaS customers
- Implement and operate a central logging solution as well as a central metrics solution
- Develop operational playbooks and dashboards to monitor production SaaS environments
- Contribute to managing AWS cost and resource usage
- Work with the Engineering team to implement new technologies, including Istio, CephFS, ElasticSearch, and InfluxDB
- BS and/or MS degree in Computer Science or a related degree
- Extensive experience building and operating distributed systems in Amazon Web Services (AWS)
- Expert-level Linux skills (CentOS and Ubuntu)
- Extensive experience with container-based software development and management using Docker and Kubernetes
- Extensive experience with Jenkins
- Extensive experience with Ansible, Chef, or Puppet
- Expert in at least one scripting language, preferably Bash or Python
- Intermediate-level software-development skills using Java or another object-oriented programming language is a strong plus
- Experience managing backups and participating in disaster-recovery planning and testing is a strong plus
Experience working in a fast-moving startup environment is a strong plus
Back to jobs