Remote Site Reliability Engineer (SRE) - Cloud Infrastructure
About the Role
We are seeking a talented Remote Site Reliability Engineer (SRE) to join our dynamic team. As a Site Reliability Engineer, you will play a crucial role in ensuring the reliability and performance of our cloud infrastructure. You will work with cutting-edge technologies such as AWS, GCP, Docker, and Kubernetes to enhance our platform's resilience and observability.
What You'll Do
- Design, implement, and manage cloud infrastructure using AWS, GCP, and Azure.
- Develop automation scripts using Python, Shell, and Terraform to streamline operations.
- Monitor system performance and reliability using tools like Prometheus and Grafana.
- Collaborate with development teams to improve application performance and reliability.
- Respond to incidents and outages, performing root cause analysis to prevent future occurrences.
- Enhance observability and monitoring across all services and applications.
- Participate in on-call rotations to ensure 24/7 availability of services.
- Contribute to the continuous improvement of our Site Reliability Engineering practices.
Requirements
- 3+ years of experience as a Site Reliability Engineer or in a similar role.
- Strong knowledge of cloud platforms (AWS, GCP, Azure).
- Experience with container orchestration tools such as Kubernetes and Docker.
- Proficiency in scripting languages like Python and Shell.
- Familiarity with monitoring tools such as Prometheus and Grafana.
- Understanding of microservices architecture and API management.
- Excellent problem-solving skills and a proactive approach to incident management.
- Strong communication skills and the ability to work collaboratively in a remote environment.
Nice to Have
- Experience with Java and Spring Boot.
- Knowledge of big data technologies and ETL processes.
- Familiarity with SQL databases.
- Previous experience in technical support or customer experience roles.
What We Offer
- Competitive salary ranging from $120,000 to $150,000 per year.
- Meal allowance and comprehensive health, dental, and life insurance plans.
- Flexible work model promoting a sense of belonging and work-life balance.
- Access to our Corporate University with various development paths.
- Extended maternity and paternity leave for new parents.
- Opportunities for professional growth and advancement.
- A supportive remote work environment with a focus on team collaboration.
This Remote Site Reliability Engineer position offers a competitive salary and a flexible work environment, making it an attractive opportunity for tech professionals.
Who Will Succeed Here
Proficient in AWS and GCP for cloud infrastructure management, with hands-on experience in deploying and maintaining services using Docker and Kubernetes.
Self-motivated and disciplined, able to thrive in a remote work environment by effectively managing time and prioritizing tasks without direct supervision.
Analytical mindset with strong problem-solving skills, capable of utilizing monitoring tools like Prometheus and Grafana to proactively identify and resolve performance issues.
Learning Resources
Career Path
Market Overview
Skills & Requirements
Domain Trends
Industry News
Loading latest industry news...
Finding relevant articles from the last 6 months