Site Reliability Engineer (Full-Remote in Japan*)
At PayPay, we’re constantly working on improving our systems and processes to be prepared for PayPay’s exponential growth. As an SRE at PayPay, we strive towards empowering our developers with the right tools and ensuring high availability, top-notch performance so that our users can have a great experience with our services.
Considering PayPay’s growth, we are looking for experienced SRE who can deliver insights into system bottlenecks, ensure reliability of the system and ensure that CI/CD processes are efficient and scalable for the increasing number of services that our company is offering.
Specifically, we are looking for someone who can bring informed and unique viewpoints, enjoys collaborating with a cross-functional team and is actively pushing boundaries to develop scalable solutions and positive user experiences.
- Building software and solutions for teams to optimize SDLC
- Deploy and maintain CI/CD pipelines across multiple environments
- Iterate on best practices to increase the quality and velocity of deployments
- Analyze current technologies used in the company and develop steps to improve observability and visibility into potential bottlenecks
- Ensuring system stability by pre-emptively verifying failure scenarios and implement solutions to reduce MTTR
- Implement industry best practices for system hardening and configuration management
- Develop solutions to improve system performance with a focus on high availability, scalability and resilience
- Establish SLAs for service uptime, and integrate with telemetry and alerting platforms to enforce them
- To ensure seamless flow of information between teams, document the knowledge gained.
- Be up to date on modern technologies and trends to advocate for their inclusion within products if they are an added value
- Good understanding of DevOps concepts and implementation
- CI/CD implementation expertise
- Experienced in docker image management and optimizations
- Knowledge about storage options like SQL, NoSQL and distributed storage like TiDB
- Experience operating Kubernetes and managing manifests
- Ability to program with one or more high level languages like Python, Java, etc
- Proactive in finding problems, areas of improvements and performance bottlenecks for distributed systems
- In-depth knowledge and hands-on experience with AWS and production workloads
- Strong coding skills with one or more high level languages
- Excellent communication skills and collaborative attitude
- Keen on trying new technologies and taking up challenges
- Knowledge about Microservices
- Knowledge about observability and how to gather data
- System design experience and capacity planning for large distributed systems
- Understanding of Automation tools and implementation
- Terraform/cloud formation experience
- Experience with managing monitoring tools like Cloudwatch, NewRelic etc.
Back to jobs