What are the responsibilities and job description for the Principal Site Reliability Engineer position at Fidelity Investments?
Job Description :
Position Description :
Delivers services at high scale, high availability with resilience by using automation and Infrastructure Code. Builds reliability into ecosystem by applying best practices in Resiliency Engineering, Automation, Observability, and Chaos Testing. Manages systems using infrastructure as code tools (IAM, ARM, Terraform, and Chef). Utilizes modern monitoring tools (Datadog, Prometheus, and Splunk). Automates with various scripting languages - Python and Shell scripting. Helps teams scale through production insights, operational automation, developer guidance, real-time metrics, and automation.
Primary Responsibilities :
- Performs Instrumentation with systems skills on building and operating, monitoring, logging, and alerting services of distributed systems at scale.
- Maintains scalability and resiliency in complex environments.
- Implements advanced observability practices and techniques at scale.
- Triages and executes root cause analysis.
- Manages and interprets large datasets using query languages and visualization tools.
- Communicates with both technical and non-technical audiences.
- Presents new software, methods and practices to developers.
- Works with a variety of individuals and groups in a constructive and collaborative manner; and builds and maintains effective relationships.
- Applies Cloud Computing and DevOps concepts including continuous integration and continuous delivery (CI / CD) pipelines in system and infrastructure maintenance.
Education and Experience :
Bachelor's degree (or foreign education equivalent) in Computer Science, Engineering, Information Technology, Information Systems, Mathematics, Physics, or a closely related field and five (5) years of experience as a Principal Site Reliability Engineer (or closely related occupation) designing, building, deploying, and maintaining infrastructure and applications in Cloud providers - Amazon Web Services (AWS) and Azure.
Or, alternatively, Master's degree (or foreign education equivalent) in Computer Science, Engineering, Information Technology, Information Systems, Mathematics, Physics, or a closely related field and three (3) years of experience as a Principal Site Reliability Engineer (or closely related occupation) designing, building, deploying, and maintaining infrastructure and applications in Cloud providers - Amazon Web Services (AWS) and Azure.
Skills and Knowledge :
Candidate must also possess :
PE1M2
Certifications : Category :
Information Technology
Fidelity's hybrid working model blends the best of both onsite and offsite work experiences. Working onsite is important for our business strategy and our culture. We also value the benefits that working offsite offers associates. Most hybrid roles require associates to work onsite every other week (all business days, M-F) in a Fidelity office.