What are the responsibilities and job description for the Senior Site Reliability Engineer position at Federal Reserve Bank of Boston?
All applicants must be US Citizens or Green Card holders who have resided in the US for the past 3 years.
Federal Reserve Financial Services (FRFS) delivers a suite of payments services to financial institutions via FedLine® Solutions, FedNowSM, Fedwire®, National Settlement Service (NSS), FedCash®, FedACH® (Automated Clearing House), and Check Services. We are currently leading a strategic effort to transform FRFS to a national, enterprise-focused organization. Through our evolved structure, we will meet the needs of the marketplace for new products and services more quickly, seek to provide a more robust and unified customer experience across our financial service
Candidates must live near one of our Reserve Bank locations
- As a Senior Engineer of the SRE / Production Operations team, you will operate the production environment for the program.
- You will help architect, implement, and leverage solution monitoring and tooling to be used for capacity planning, utilization reporting, and scaling.
- The team uses open source and proprietary software to support Engineering, DevOps, and DevSecOps tools, services, and solutions.
- CI/CD and IaC Pipeline automation design and development.
- Resiliency, DR and BCP (including testing)
- The SRE / Production Operations team is part of the Technical Operations (TechOps) department and has the overall responsibility for the design, management and execution of operations required to support the ongoing technical and delivery needs as well as the transition to production support and operations.
- This team interfaces with internal stakeholders, customers for planning, delivery, and service management.
- It owns ongoing ITIL processes, and the implementation and driving of continuous improvement initiatives.
- You will work closely with Engineers and Architects in order to maintain seamless automation across the entire platform.
- Proactively identify suspected gaps in system architecture and design experiments to expose them
- The ideal candidate is someone who loves building and maintaining reliable and scalable systems, CI/CD tooling, and automating cloud-based highly available, high performing applications.
Key Skills
- Strong communication and collaboration skills
- Extensive knowledge and understanding of working in AWS environments & services
- EC2, EBS, EKS, RDS, Aurora, S3, Route 53, ELB, IAM, etc.
- Hashicorp Terraform, Consul, Vault, and Ansible
- Automation experience preferably using GitLab
- Experience with scripting languages preferably Python for automated processes
- Experience working in Linux environment and shell scripting
- Experience supporting infrastructure for large multi-services applications
- Experience working with continuous deployment in micro-services architectures
- Experience working with Docker, Containers, ECR and EKS.
- Observability - CloudWatch, OpenSearch, Dynatrace, Grafana, Prometheus
- Familiarity with Fault Injection tooling (i.e. AWS Fault Injection Simulator, Gremlin, ChaosToolkit, Chaos Monkey)
- Automation mindset to enable consistency and dependability in common actions
Required Qualifications:
- Bachelor's degree in Computer Science, Engineering, Mathematics, Information Technology, or related field.
- Minimum 5 years of SRE experience in an enterprise cloud-based system.
- Proficient with Linux/Unix systems and scripting languages (Bash, Python, etc.).
- Proficient in AWS
- Proficient in Git and GitOps
- Experience with infrastructure-as-code tools like Terraform, and CloudFormation.
- Strong knowledge of containerization (Docker, Kubernetes) and orchestration.
- Expertise in CI/CD tools like GitLab CI
- Experience with configuration management tools like Ansible
Additional Qualifications:
- Excellent troubleshooting skills with the ability to quickly identify and resolve system performance and reliability issues.
- Strong written and verbal communication skills, with the ability to explain technical concepts to non-technical stakeholders.
- Experience in working with sensitive data.