Job Description
Summary
We are looking for a seasoned SRE Engineer to design and improve our central Big Data infrastructure/services to the next stage, to ensure the data, services, and infrastructures are reliable, fault-tolerant, efficiently scalable, and cost-effective.
Responsibilities
- Engage in and improve the whole lifecycle of service, from inception and design, through to deployment, operation, and refinement.
- Develop and maintain tools, re-designing capacity planning infrastructure for greater scalability.
- Troubleshooting, diagnosing, fixing software issues, and ensuring data security.
- Define architecture improvements, and push for changes that improve reliability.
Requirements
- Have source code understanding of open-source data groups, such as HDFS, HBase, YARN, Spark, Flink, Airflow, Kyuubi, ZK, Kafka, etc.
- Have used at least one automation component tool: Ansible, Terraform, etc.
- In-depth understanding of Linux and computer networks.
- Experience in at least one language (Python/Golang/Java, etc.).
- Experience in the public cloud (AWS, GCP, Azure, etc.) management and use is preferred.
- Minimum of 5 years of hands-on experience on backend or big data ecosystem.
Skills
- AWS
- Development
- Java
- Python
- Software Engineering