Job Description
Summary
Kraken is looking for an experienced Machine Learning Ops engineer to join our AI/ML Team in the centralized Data organization. In this role you will be building infrastructure to support building cutting edge AI/ML technology to solve the most complex and exciting problems in the quickly growing and evolving crypto industry. We are looking for an extremely strong communicator and team-player, who is able to break down large complex problems into smaller more manageable problems-to-solve. You will take initiative to work with engineers across the team and org, exploring different ways to resolve issues.
The opportunity
- Build ML and AI Ops infrastructure to enable the development and deployment of production models running at scale. This includes deployments across multiple cloud infrastructures
- Lead resource planning and optimization, especially with GPU instances
- Develop tool(s) to load test various production AL/ML systems
- Work closely with SREs across the entire organization
- Support 24/7/365 uptime of services by supporting a partial on-call rotation along with other members of the team
- Stay up-to-date in machine learning, and artificial intelligence trends and technologies, all while contributing to the growth of AI/ML in the Crypto industry
Skills you should HODL
- Experience in deploying, maintaining, and monitoring production systems
- A minimum of 3-5 years of experience in DevOps, SRE, AI/ML Engineer, or a similar discipline
- Familiarity with the software development lifecycle, DevOps (build, continuous integration, deployment tools) and best practices
- Programming skills in Python, Scala, Rust or other languages
- Good written and verbal communication skills and interpersonal skills
- Deep experience with Kubernetes and Docker
- Experience with AWS, specifically S3, Athena, EMR, Sagemaker, and Lambda
- Experience with Terraform, MLFlow, Flink, Kafka, MariaDB, and Nomad are all a plus
- Knowledge of GenAI tools, such as Langchain, LlamaIndex, and open source Vector DBs, is a plus
- Bachelor's degree in Computer Science, Machine Learning or related field
Skills
- AWS
- Communications Skills
- Development
- Generative AI
- Machine Learning
- Python
- Rust