Site Reliability Engineer - Remote at Polygon Labs
About the Role
We're hiring a Site Reliability Engineer (SRE) to join Polygon Labs remotely. As a Site Reliability Engineer, you will play a pivotal role in maintaining the reliability and performance of our production infrastructure. This position is ideal for individuals early in their SRE or infrastructure careers, eager to learn about large-scale, distributed blockchain systems.
What You'll Do
- Monitor production systems, alerts, dashboards, and logs across Polygon networks, including Polygon PoS and the Agglayer.
- Assist with incident detection, triage, escalation, and resolution under the guidance of senior engineers.
- Support on-call and operational coverage through structured rotations, with training and mentorship.
- Maintain and improve runbooks and standard operating procedures.
- Assist with routine operational tasks such as service restarts, upgrades, and configuration changes.
- Help maintain and improve monitoring, logging, and alerting systems, including dashboards for network health, RPC performance, and node metrics.
- Learn to improve alert signal quality and reduce operational noise.
- Support cloud-based and containerized infrastructure, including nodes, RPC endpoints, and supporting services.
- Collaborate with protocol, product, and cross-functional teams to understand production issues and user impact.
- Participate in post-incident reviews and contribute to root-cause analysis documentation.
- Continuously build knowledge of blockchain fundamentals, distributed systems, and networking.
Requirements
- A foundational understanding of Linux systems, processes, and basic networking concepts.
- Familiarity with at least one scripting or programming language, such as Python, Bash, or Go.
- An interest in site reliability, monitoring, and operating production infrastructure.
- Clear written and verbal communication skills, with a willingness to ask questions and learn.
- The ability to remain calm, methodical, and responsive during incidents or operational events.
Nice to Have
- Exposure to cloud platforms such as AWS or GCP.
- Familiarity with containerization or orchestration technologies, including Docker or Kubernetes.
- Basic understanding of blockchain or Web3 concepts, such as nodes, RPC services, or validators.
- Experience with monitoring and observability tools such as Grafana, Prometheus, Datadog, or ELK-based stacks.
What We Offer
- Remote-first global workforce.
- Industry-leading Medical, Dental, and Vision health insurance.
- Company matching 401k with 3% match (for US employees).
- $1,500 Home Office Set Up Allowance (lifetime max).
- $200 Annual Book Allowance Program.
- $75 Monthly internet or phone reimbursement.
- Flexible Time Off.
- Company-issued laptop.
- Egg freezing, mental health, and employee wellness benefits.
Join us as a Site Reliability Engineer and contribute to the future of blockchain technology. Your role will directly impact the reliability and performance of critical public infrastructure used by developers and users globally.
This Site Reliability Engineer role at Polygon Labs offers a unique opportunity to work remotely in the rapidly growing blockchain industry. With a strong focus on mentorship and learning, you'll be part of a team that values collaboration and innovation.
Who Will Succeed Here
Proficient in Linux system administration, with hands-on experience in troubleshooting and optimizing performance, which is crucial for managing Polygon Labs' production infrastructure.
A self-motivated learner with experience in scripting languages like Python and Bash, allowing for automation of operational tasks and enhancing system reliability in a remote work environment.
Familiarity with container orchestration tools such as Kubernetes and monitoring tools like Grafana and Prometheus, demonstrating a proactive mindset to ensure system health and performance.
Learning Resources
Career Path
Market Overview
Skills & Requirements
Domain Trends
Industry News
Loading latest industry news...
Finding relevant articles from the last 6 months