Job Description

Summary

We are seeking an experienced Site Reliability Engineer to join our team and manage AI tools like ControlNet, ensuring efficient, stable operation and continuous system performance optimization.

Responsibilities:

  1. System Maintenance & Monitoring: Oversee daily operations of AI tools, including server, database, and network maintenance. Monitor system performance and address issues promptly.
  2. Deployment & Release: Manage deployment and version releases of AI tools. Implement CI/CD processes for automated deployments.
  3. Troubleshooting: Address and resolve issues in AI tools, analyze logs and monitoring data to find root causes, and propose solutions.
  4. Performance Optimization: Enhance deployment architecture, improve efficiency and stability, and implement performance tuning strategies.
  5. Security Management: Ensure tool security, conduct regular assessments, fix vulnerabilities, and implement data protection and backup strategies.
  6. Collaboration & Documentation: Work closely with development teams, contribute to system design and optimization, and maintain operations documentation. 

Qualifications:

  1. Education: Bachelor's degree or higher in Computer Science, Software Engineering, or related fields.
  2. Experience: 3+ years in operations engineering or related roles, with a preference for AI tools experience.
  3. Skills: Proficiency in Linux, monitoring tools (e.g., Prometheus, Grafana, ELK), automation tools (e.g., Ansible, Puppet, Chef), scripting (e.g., Python, Shell), cloud platforms (e.g., AWS, Azure, GCP), and AI tools (e.g., ControlNet).
  4. Other: Strong communication, teamwork, analytical and problem-solving skills. Ability to work under pressure, strong sense of responsibility, and a proactive attitude towards continuous learning.

Plus Points

  1. Exceptional problem-solving abilities and strong communication skills.
  2. Experience with AI or machine learning technologies and their integration into backend systems.
  3. Contributions to open-source projects or a strong presence in the developer community.
  4. Prior experience in a fast-paced startup environment.

What We Offer

  1. Competitive salary and equity package, commensurate with experience and location.
  2. Flexible working hours and a fully remote work environment, with the ability to collaborate effectively across time zones.
  3. A dynamic and collaborative work environment that fosters innovation, growth, and professional development.
  4. The opportunity to work on cutting-edge technologies and help shape the future of AI, transforming industries and making a global impact.

Skills
  • Analytical Thinking
  • AWS
  • Communications Skills
  • Machine Learning
  • Problem Solving
  • Python
  • Software Engineering
  • Team Collaboration
© 2024 cryptojobs.com. All right reserved.