Job Description
Summary
We are a team to design, develop, maintain, and improve software for various ventures projects, i.e., projects that are adjacent to our core businesses and are bootstrapped fast with a lean team. You will be actively involved in the design of various components behind scalable applications, from frontend UI to backend infrastructure.
Responsibilities
- Contribute to system architecture design
- Improve and manage Terraform, K8S and CI/CD pipeline
- Develop tools for application and performance monitoring
- Develop and deploy cloud infrastructure in IaC manner
- Design and implement Disaster Recovery & backup solution
- Automate and streamline operations and processes
- Communicate, escalate and follow up as appropriate to ensure that problems are solved
- Continuously improve availability, reliability, and observability and cost-effectiveness
Personal Attributes
- Desire to grow expertise, inform, and educate others
- Curiosity about finding root causes demonstrating troubleshooting skills
- Capable to pick up various technologies, a fast learner and have a “get things done” mentality
- Humble to embrace better ideas from others, eager to make things better, open to challenges and possibilities
Requirement
- Experience coding in Python, Nodejs, shell scripting
- Experience in one or more of the public cloud: AWS, Azure, GCP
- Experience in designing resilient system architecture
- Experience in managing server infrastructure and Kubernetes clusters
- Experience in building CICD pipelines of a large scale enterprise application
- Experience in MLOps will be a plus
- Familiar with GitOps principles and CI/CD tools (Github Actions, ArgoCD, Tekton)
- Familiar with containerisation and orchestration tools (Docker, Kubernetes, Helm)
- Familiar with IaC tools (Terraform, Ansible)
- Familiar with monitoring tools (Cloudwatch, Grafana, Prometheus, ELK, and/or OpenTelemetry)
- Familiar with Cloud/Linux networkings, DNS, Common tweaks/command lines for linux
- Familiar with Next.js/Nest.js, Flask/FastAPI, Stable Diffusion/ComfyUI will be a plus
Skills
- Communications Skills
- Development
- Problem Solving
- AWS
- Python