Job Description

Summary

We're looking for a Senior Site Reliability Engineer to join our Infrastructure team. This Engineer will enable our developers as they work efficiently while building a vibrant ecosystem for the Avalanche Blockchain. You'll enable our teams across several business units and engineering teams to design, optimize, and and implement greenfield technology for a variety of use cases.  This particular role will be a key part of our release schedule and production monitoring.

WHAT YOU WILL DO

  • Develop and optimize highly reliable and scalable infrastructure focused on SRE principles.
  • Implement and maintain monitoring, logging, and tracing tools to gain insights into service behavior and health.
  • Uphold SLOs (Service Level Objectives), SLIs (Service Level Indicators), and error budgets for critical systems.
  • Enhance the reliability and resiliency of critical systems by identifying single points of failure and implementing best practices.
  • Collaborate with software developers to build reliability and performance into applications from inception.
  • Automate and streamline incident management processes to minimize service disruption and improve response times.
  • Participate in on-call rotations, ensuring quick restoration of services and fostering a blameless post-mortem culture.
  • Foster a continuous improvement mindset by analyzing and learning from incidents and implementing preventive measures.
  • Leverage cloud technologies and IaC tools to ensure scalability and repeatability.
  • Advocate for best practices in reliability, security, and maintainability within the team.

WHAT YOU WILL BRING

  • BS in Computer Science or related field.
  • 5+ years of experience as an SRE, DevOps, or Cloud Engineer.
  • Strong grasp of SRE principles, including error budgets, SLOs, and SLIs.
  • Cloud networking and orchestration with AWS (EKS, ECS, VPC, S3, ELB).
  • Strong Kubernetes experience with Docker or RKT containerization.
  • Proficiency in Infrastructure as Code (IaC) using tools such as Terraform, Terragrunt, and Ansible.
  • Experience with monitoring and observability tools like Prometheus, Grafana, or ELK Stack.
  • Building and maintaining CI/CD pipelines with GitHub Actions (preferred), Jenkins, Travis CI, Circle CI.
  • Experience with automation and configuration management using Ansible, Puppet or Chef.
  • Experience with Linux-based infrastructures. (Ubuntu preferred).
  • Experience with scripting languages and the creation of scripts. (Python and GoLang preferred).
  • Working knowledge of decentralized architecture design patterns and distributed systems.

Skills
  • AWS
  • Networking
  • Python
  • Team Collaboration
© 2024 cryptojobs.com. All right reserved.