Job Description
Summary
You will ensure the reliability of data projects in a team of infrastructure and data engineers that include DWH, analytics services and infrastructure platforms.
Looking ahead, the team has some exciting plans in the pipeline. One of the main goals is to expand the current offerings with DeFi integrations and insurance protections, providing users with more options and added security. Another important initiative is to allow DeFi protocols to build on top of the platform, which will open up new possibilities for the entire DeFi ecosystem. The platform team is dedicated to staying ahead of the curve in this rapidly evolving industry, and will continue to work tirelessly to ensure that P2P.org remains a leading player in the space.
Stack: K8s (GKE); GCP; Airflow, Superset, ClickHouse, Kafka, Victoria Metrics; HC Vault; Loki; Istio; ArgoCD, Istio.
Who we are looking for 🔍
We are seeking a Senior SRE experienced in data tools who will be responsible for the reliability of data platforms, improving services delivery pipelines, troubleshooting and investigating causes, and increasing observability.
You will 🚀
- Ensure reliability and scalability of complex data platforms
- Assisting the data team in DevOps: training in interacting with continuous delivery pipelines, monitoring systems, setting up alerts and provisioning secrets
- Migrate services to modern infrastructure platforms
- Advance our technological stack with innovative ideas and new creative solutions
- Participate in capacity management of core systems and services, application analysis and performance and security tuning. Provide operational support for systems and build automation to remediate and address the root cause; with the goal of automating response to all non-exceptional service conditions
- Maintain documentation, build tooling, and create alerts to both identify and address infrastructure reliability
- Proactively identify system anomalies
You have 🤓
- At least 4 years of experience as SRE Engineer
- Deep expertise and regular interaction with Kubernetes over the past 2 years
- Comprehensive knowledge of ClickHouse (must) and Kafka (preferred
- Experience with data tools is required (Airflow as a must, Superset, etc)
- Experience with scalable infrastructure monitoring solutions such as Victoria Metrics (Prometheus), Loki, ELK.
- Fluency in GitOps automation tools (ArgoCD)
- Secret management is essential to succeeding in this role, Vault expertise is also required
- Skilled in any scripting language
- Fluent in English
Skills
- Communications Skills
- Development
- Software Engineering
- Team Training and Development