Job Description

Summary

In this position, you will lead a team of skilled Site Reliability Engineers responsible for the design, deployment, and maintenance of our production systems. You will play a crucial role in ensuring the reliability, scalability, and performance of our infrastructure, as well as driving continuous improvement initiatives. Your expertise in SRE practices and experience with the listed technologies will enable you to effectively guide the team towards achieving operational excellence. 

Responsibilities:

  1. Lead, mentor and manage a team of Site Reliability Engineers, fostering a culture of collaboration, innovation, and operational excellence. Provide guidance and career development opportunities to team members.
  2. Develop, communicate, and execute the SRE team's strategic goals, objectives, and roadmap in alignment with the overall business objectives.
  3. Oversee the design, implementation, and maintenance of highly available and scalable production systems.
  4. Drive continuous improvement initiatives by identifying areas for enhancement and implementing best practices, automation, and process improvements.
  5. Collaborate with cross-functional teams and Departments to ensure smooth integration of applications and systems.
  6. Define and enforce Service Level Objectives (SLOs) and Service Level Agreements (SLAs) to ensure system reliability and uptime.
  7. Monitor system performance, troubleshoot issues, and ensure timely incident response, root cause analysis, and problem resolution.
  8. Implement effective monitoring, logging, and alerting systems to proactively identify and mitigate potential issues.
  9. Stay up-to-date with industry trends, emerging technologies, and best practices related to SRE and DevOps, and apply them to improve operational efficiency.
  10. Identify potential risks to system reliability and implement strategies to mitigate them.
  11. Ensure that all systems and processes comply with relevant regulations, standards, and best practices.

Minimum Qualifications:

  1. Bachelor's degree in Computer Science, Engineering, or a related field (or equivalent practical experience).
  2. Proven experience as a Site Reliability Engineer or similar role, with at least 3-5 years of hands-on experience in managing production systems.
  3. Strong expertise in the listed technologies: Ansible, Concourse CI, Jenkins, Github Actions, EKS (Kubernetes), Linux Administration, terraform.
  4. Demonstrated experience in leading and managing a team of technical professionals for at least 2 years.
  5. Solid understanding of SRE principles, including reliability, scalability, availability, and performance.
  6. Proficient in scripting and automation (e.g., Python, Bash, or similar).
  7. Experience with infrastructure-as-code (IaC) tools, configuration management, and CI/CD pipelines.
  8. Knowledge of cloud platforms (e.g., AWS, Azure, or Google Cloud) and containerization technologies (e.g., Docker).
  9. Excellent problem-solving skills and the ability to thrive in a fast-paced, dynamic environment.
  10. Strong communication and leadership skills, with the ability to collaborate effectively with both technical and non-technical stakeholders.

Preferred Qualifications:

  1. Relevant certifications, such as Certified Kubernetes Administrator (CKA) or AWS Certified DevOps Engineer.
  2. Experience with monitoring and observability tools (e.g., Datadog, New Relic, Prometheus, Grafana, ELK Stack).
  3. Familiarity with agile methodologies and experience working in an Agile/Scrum environment.

Salary Range: The base salary range for this role is between $172,000 - $215,000 in the State of New York, the State of California and the State of Washington. This range is not inclusive of our discretionary bonus or equity package. When determining a candidate’s compensation, we consider a number of factors including skillset, experience, job scope, and current market data.

Skills
  • AWS
  • Development
  • Leadership
  • Problem Solving
  • Python
  • Team Collaboration
© 2024 cryptojobs.com. All right reserved.