Job Description
Summary
We are seeking a Site Reliability Engineer (SRE) to design, optimize, and support highly available systems across our global trading infrastructure. As part of GSR’s SRE team, you will manage a multi-regional cloud environment while integrating and automating our physical server inventory using Infrastructure as Code (IaC).
You will work across all layers of infrastructure, including:
- Networking & Exchange Connectivity
- Linux Systems & Kubernetes Administration
- Microservice Orchestration & Observability
- Disaster Recovery & Security Optimization
Your mission is to improve latency, scalability, and reliability, ensuring GSR remains a best-in-class market maker. We value engineers who drive automation, reduce friction, and enhance developer velocity through better tooling, CI/CD, and infrastructure design.
Who We’re Looking For
Core Skills
- Containers & Orchestration: Strong expertise in container security and Kubernetes (multi-cluster/global deployment is a plus).
- Distributed Systems & Messaging: Knowledge of clusters, storage, Kafka, Aeron, and experience with multicast or HPC.
- Automation & IaC: Proficiency in Python, Golang, or Rust with experience in IaC tools and immutable infrastructure.
- Continuous Delivery & Config Management: Familiarity with FluxCD, ArgoCD, and custom CD deployments. Strong grasp of CI/CD pipelines.
- Linux & Networking: Solid understanding of Linux internals, cgroups, routing, switching, firewalls, and DNS/service discovery.
- Databases: Experience with MySQL, MongoDB, and database administration (Flyway or Liquibase a plus).
Bonus Experience
- Data center operations
- Crypto, fintech,bare-metal provisioning or trading experience
Skills
- Database Management
- Development
- Networking
- Python
- Rust
- Software Engineering
- SQL