A Financial Services Corporation headquartered in Pasadena, CA is looking for a Site Reliability Engineer who will be responsible for engineering automated data ingestion and service management. The main focus areas will be for infrastructure support, monitoring, configuration, automation and testing. This group ensures site reliability for major product and feature development for their core applications. This resource will apply everything-as-code methodologies across configuration, infrastructure and orchestration. A successful candidate will design and implement large scale improvements to existing systems by reviewing past incidents and employing previous systems knowledge to triage problems and tune resource usage. Additionally, this resource will help to define the future of running services for our BaaS platform with Kubernetes as well as define and implement enhanced monitoring and logging solutions.
Experience working as a software engineer (5+ years) focused on product or feature development.
Must be comfortable reading and writing code in Java
Experience with Scripting languages such as Powershell, Bash, and/or Groovy.
Experience with IAC/CM tools (Terraform, Ansible, Cloud Formation, Salt)
Hands on experience in incident management -- working with support teams
Experience collaborating with Product Managers and Developers in feature or product development
Comfortable with one or more cloud service provider offerings Azure, AWS, GCP
Experience operating Kubernetes, Docker, Podman, or related technologies in a production environment