MyPage is a personalized page based on your interests.The page is customized to help you to find content that matters you the most.


I'm not curious
2

Site Reliability Specilatist

Location Bengaluru, India
Posted 03-July-2022
Description
Company Profile

Morgan Stanley is a leading global financial services firm providing a wide range of investment banking, securities, wealth management and investment management services. With offices in more than 41 countries, the Firms employees serve clients worldwide including corporations, governments, institutions and individuals. For further information about Morgan Stanley, please visit www.morganstanley.com.


Enterprise Technology & Services

Enterprise Technology & Services (ETS) delivers shared technology services for Morgan Stanley supporting all business applications and end users. ETS provides capabilities for all stages of Morgan Stanleys software development lifecycle, enabling productive coding, functional and integration testing, application releases, and ongoing monitoring and support for over 3,000 production applications. ETS also delivers all workplace technologies (desktop, mobile, voice, video, productivity, intranet/internet) in integrated configurations that boost the personal productivity of employees. Application and end user functions are delivered on a scalable, secure, and reliable infrastructure composed of seamlessly integrated datacenter, network, compute, cloud, storage, and database functions.


Job Profile

The Application Infrastructure (AI) SRE Ops & Support department is seeking a Site Reliability Engineer to drive the reliability engineering, operations and customer support services for Morgan Stanleys suite of IT Service Management (ITSM) products. AI SRE & Ops Support is a cornerstone of the Application Infrastructure organization in Morgan Stanleys Technology Division.

This position specializes in IT Service Management (ITSM) products, including ServiceNow and proprietary Change Management tooling, as well as the Morgan Stanley IPSoft Amelia Chatbot platform. The Morgan Stanley implementation of ServiceNow is a vendor hosted solution involving a number of internally hosted technologies (SQL databases, ServiceNow API, Unix, AFS, SSL, SAML and other Web Infrastructure, etc.) that extends the functionality of base applications (Incident, Problem, Change, Service Requests, etc.). This team is also responsible for a suite of change management products and tools used to ensure changes are properly documented, and detected, provide audit trails of changes, and ensure changes are properly authorized before being deployed. These change management tools are critical for ensuring the firm meets regulatory requirements and are critical to protecting the firm. The team is also in charge of managing the firm?s suite of IPSoft Amelia Chatbot?s, deployed to Docker using Ansible and other tooling, as well as overseeing the Chatbot platforms Observability using the firms strategic Observability stack (Prometheus, Grafana, etc.).

The Site Reliability Engineering team (SRE) drive the reliability, recoverability and operational efficiency of this product portfolio. Reporting to the SRE Lead, key features of this role include implementing advanced observability, troubleshooting complex systems, task automation, and technical debt management.

Members of the SRE team align to an Agile squad, and are expected to work closely with the ITSM user community on day to day usage of the products, as well with our internal development and engineering squads, and the offshore support team that provide first line support.

Candidates will have the technical skills required to support these products on a Linux platform. Prior task automation experience in at least one programming language is expected, and through that some user experience with typical software development lifecycle toolchains. Hands-on experience with at least one pillar of observability is required and ideally experience in defining system monitoring, not just reacting to alerts. Cloud experience is not necessary, as training can be provided, but it would be an advantage.


Responsibilities include:
- Building and maintaining knowledge front to back of Application Infrastructures IT Service Management products, and then specializing in one or two of them
- Maximizing the availability and performance of supported systems through optimized and automated plant management, ongoing problem management, and architecture reviews with dev-side peers
- Reduction of the cost of support (hours of effort) through the elimination of operational issues, optimization and automation of tasks, development of operational tools and driving client self-service to minimize constraints
- Identification and prioritization of technical debt that is impacting client developer productivity, reliability or the efficiency of the ops team
- Complex troubleshooting in a Linux environment
- Consult with clients (the Firms internal development community, IT service practitioners) to maximize their productivity, including troubleshooting the issues they have using the departments products
- Minimizing the escalation rate to the dev-side product delivery team members to ensure the department has the greatest possible flow of feature delivery
- Being operationally responsive, including sharing on-call rotation with the rest of the global team (with a time-off in lieu system)


Qualifications


Skill Set


Required Qualifications / Skills

- Strong Linux troubleshooting skills
- Task automation experience in any programming language
- Practical experience of at least one pillar of observability (metrics, logs or traces)
- Exhibit working knowledge in at least ONE of the following areas
o SQL
o REST services (API)
o Load balancing and networking
o Performance troubleshooting and resolution
- Confident collaboration skills

Desired Skills

- Python development for task automation
- Experience with site reliability engineering practices, like service level objectives (SLOs), error budgets, blameless postmortems, toil reduction
- Prior experience creating operational dashboards (Splunk, Grafana, etc)

 
Awards & Accolades for MyTechLogy
Winner of
REDHERRING
Top 100 Asia
Finalist at SiTF Awards 2014 under the category Best Social & Community Product
Finalist at HR Vendor of the Year 2015 Awards under the category Best Learning Management System
Finalist at HR Vendor of the Year 2015 Awards under the category Best Talent Management Software
Hidden Image Url