Job Description

Summary

Kraken is looking for an experienced Machine Learning Ops engineer to join our AI/ML Team in the centralized Data organization. In this role you will be building infrastructure to support building cutting edge AI/ML technology to solve the most complex and exciting problems in the quickly growing and evolving crypto industry. We are looking for an extremely strong communicator and team-player, who is able to break down large complex problems into smaller more manageable problems-to-solve. You will take initiative to work with engineers across the team and org, exploring different ways to resolve issues.

The opportunity

  1. Build ML and AI Ops infrastructure to enable the development and deployment of production models running at scale. This includes deployments across multiple cloud infrastructures
  2. Lead resource planning and optimization, especially with GPU instances
  3. Develop tool(s) to load test various production AL/ML systems
  4. Work closely with SREs across the entire organization
  5. Support 24/7/365 uptime of services by supporting a partial on-call rotation along with other members of the team
  6. Stay up-to-date in machine learning, and artificial intelligence trends and technologies, all while contributing to the growth of AI/ML in the Crypto industry

Skills you should HODL

  1. Experience in deploying, maintaining, and monitoring production systems
  2. A minimum of 3-5 years of experience in DevOps, SRE, AI/ML Engineer, or a similar discipline
  3. Familiarity with the software development lifecycle, DevOps (build, continuous integration, deployment tools) and best practices
  4. Programming skills in Python, Scala, Rust or other languages
  5. Good written and verbal communication skills and interpersonal skills
  6. Deep experience with Kubernetes and Docker
  7. Experience with AWS, specifically S3, Athena, EMR, Sagemaker, and Lambda
  8. Experience with Terraform, MLFlow, Flink, Kafka, MariaDB, and Nomad are all a plus
  9. Knowledge of GenAI tools, such as Langchain, LlamaIndex, and open source Vector DBs, is a plus
  10. Bachelor's degree in Computer Science, Machine Learning or related field

Skills
  • AWS
  • Communications Skills
  • Development
  • Generative AI
  • Machine Learning
  • Python
  • Rust
© 2024 cryptojobs.com. All right reserved.