We are looking for Site Reliability Engineers who have passion for developing reliable, distributed software systems that require high availability to support mission-critical business tasks. Your development background will help in designing large scale, highly distributed and fault-tolerant applications. It will also help in troubleshooting and resolving the production issues which impact the customer experience and our revenues. Systems background will help in ensuring the uptime and reliability through monitoring deep system parameters and remediating issues at the systems level. SREs don t sit on the other side of the tossing fence we re a first class engineering citizen and help lead our infrastructure focus.
Experience bringing software to production at high scale The knack for writing, clean, readable, maintainable code. Expertise in at least one of the following languages and willingness to learn new ones: C/C++, Java, Golang, and Python. Deep Experience in designing, analyzing and solving problems for large-scale distributed systems. You have designed applications and systems that scale, are resilient to failure, and are observable. An eye for automation and instrumentation. Hands on any one of cloud platforms AWS, Azure, GCP. Expertise in system level debugging Responsibilities Work with a top-notch team using cutting edge technology. Automate as much as humanly possible and always configure as code Bring ideas to life (i.e. production) to help make the lives of engineers better Predict our future failures and work proactively to mitigate them Guide new SRE engineers on aspects of system debugging