Senior Site Reliability Engineer at Ripple | Singapore | Full-Time | cryptojobs.com | Best Platform for the Latest Web3 and Blockchain Jobs

We are seeking a new Site Reliability Engineer (SRE) to join our Team in Singapore.

WHAT YOU’LL DO:

Keeping your assigned site or service up and running or rapid recovery from failures
Actively troubleshoot any issues that arise during testing and production, catching and solving issues before launch,
Automating work including infrastructure needs, testing, failover solutions, failure mitigation, and much more,
Monitor and troubleshoot highly scalable and distributed server clusters that perform various functions, from web-servers to machine learning processing,
Be on a PagerDuty rotation to respond to availability incidents and provide support for service engineers with customer incidents,
Participate and establish best practices in Site Reliability Engineering,
Manage code deployments, fixes, updates, and related processes,
Work with a close-knit team and brainstorm on the best ways to tackle complex problems in infrastructure, security and monitoring,
Provide technical guidance and educate team members and coworkers on monitoring and logging. (Have an interesting idea or solution? Present it!),
Automating any software maintenance processes which previously required a manual procedure.

WHAT YOU'LL BRING:

5+ years’ experience with software eaoing, software development, or system operations on high available and high traffic environments
Strong experience with Linux-based infrastructures, Linux/Unix administration, Azure and AWS
Experience with databases such as PostgreSQL
Experience administering Linux servers as well as docker based infrastructure (like Kubernetes, AKS, etc.) in a highly available environment
Experience of scripting languages such as Typescript, Java Bash
Experience with message broker/queue technologies like RabbitMQ, AMQP 1.0
Experience with modern monitoring, logging and observability tools in complex distributed systems such as with Application Insights, Grafana, New Relic, Splunk, Elastic stack, Datadog, Prometheus, etc
Practical experience with infrastructure-as-code (with tools like Terraform, Chef, Ansible, etc.).
Good understanding of cybersecurity fundamentals and best practices
Stellar problem-solving and troubleshooting skills with the ability to spot issues before they become problems
Excellent problem-solving and communication skills
Committed to processes, with excellent documentation skills and a strong ability to work well in a team!

Newsletter