In the role of Vice President - Site Reliability Engineering for Treasury Chief Investment Office (TCIO), you will work in a collaborative team of software professionals and be responsible for improving the health of the applications.
The Site Reliability Engineer will be part of a horizontal function that is responsible to ensure that the practices, processes and tools are in place to ensure stability and functionality of each application. SRE Lead will ensure the highest level of quality and success in support of technical issues, DR testing, and hardware/software updates. The SRE is expected to implement DevOps practices and automate the release process and develop scripts to automate the manual processes.
You will be leading and working with other SRE members and development team members in the development and support of innovative technology solutions including user interfaces, middle-tier and server-side components, and will need to ensure adherence to architecture standards, risk management, and security policies.
As a Site Reliability Engineer (SRE) you will help build a meaningful engineering discipline, combining software and systems to develop creative engineering solutions to operations problems. Much of our support and software development focuses on optimizing existing systems, building infrastructure and reducing work through automation. Youll join a team of curious problem solvers with a diverse set of perspectives who are thinking big and taking risks. As an SRE youll be focused on running better production applications and systems.
Responsibilities : Lead the TCIO SRE team both functionally and people in the team Design, code, test and deliver software to automate manual operational work Troubleshoot priority incidents, facilitate blameless post-mortems and ensure permanent closure of incidents Engage with development team throughout the life cycle to help develop software for reliability and scale, ensuring minimal refactoring or changes Identify application patterns and analytics in support of better service level objectives Design self-healing and resiliency patterns Design automated software and product upgrades, change management, and release management solutions Coach or manage teams as applicable Participate in the 24x7 support coverage as needed Expertise in Incident, Problem and Change Management processes and tools Collaborate across Application Development, Product and production management to establish and maintain Service Level Objective (SLO), Service Level Indicator (SLI) and Error Budget for key Production services. Implement required telemetry and observability to monitor and measure the quality of service in real-time against the established SLO. Manage, track and validate all changes to the Production, Disaster Recovery environment Manage priority incidents and leverage cross-functional teams to quickly eliminate impacts Escalate issues/Risks effectively when necessary across supporting framework Ability to align IT service offerings with business strategies, goals, and objectives Troubleshoot Key technical issues or escalate and work with appropriate technology teams to provide solutions. Aggressively respond to service requests from Client facing support teams, Operations partners, etc. Manage application and infrastructure to maximize stability and resiliency. Leverage and improve monitoring and alerting capabilities to ensure application SLAs are met. Strong focus on automation and processes. Design, implement, improve and utilize key monitoring tools.
Qualifications : 15+ years total of experience managing similar functions, tools technologies Bachelors degree or equivalent experience in an software engineering discipline Proficiency in one or more technology domains, may be a cross-domain expert able to solve complex and mission critical problems within a business or across the firm Working knowledge of infrastructure components. (E.g. routers, load balancers, cloud products, container systems, compute, storage and networks) Excellent debugging and trouble shooting skills Expert in performance monitoring and capacity management of large systems using various tools Deep level expertise in instrumentation, customization and usage of modern monitoring toolset such as Dynatrace, AppDynamics, Grafana, Splunk, Geneos etc. Hands-on in one technology stack (Java/J2EE/C#.NET) with designing, coding, testing, and delivering software Expert in at least one of the relational database (SQL Server, Oracle, DB2 etc.) Working knowledge of infrastructure components like routers, load balancers and networks Comfortable working in Agile mode and proficient in Continuous Integration and Continuous Delivery Solid analytical and problem solving skills Attention to detail and time-management skills
JPMorgan Chase Co., one of the oldest financial institutions, offers innovative financial solutions to millions of consumers, small businesses and many of the worlds most prominent corporate, institutional and government clients under the J.P. Morgan and Chase brands. Our history spans over 200 years and today we are a leader in investment banking, consumer and small business banking, commercial banking, financial transaction processing and asset management.
We recognize that our people are our strength and the diverse talents they bring to our global workforce are directly linked to our success. We are an equal opportunity employer and place a high value on diversity and inclusion at our company. We do not discriminate on the basis of any protected attribute, including race, religion, color, national origin, gender, sexual orientation, gender identity, gender expression, age, marital or veteran status, pregnancy or disability, or any other basis protected under applicable law. In accordance with applicable law, we make reasonable accommodations for applicants and employees religious practices and beliefs, as well as any mental health or physical disability needs.