What are the responsibilities and job description for the Senior Site Reliability Engineer position at Unisys?
Overall years of experience:
- 8 years of related experience in their specific area with experience leading teams on projects with similar scope and complexity.
- Bachelor’s or Master’s degree in computer science or equivalent.
- Certifications: AWS Solutions Architect, Agile Certified Practitioner (ACP), or relevant cloud certifications.
Job Description:
- We are seeking a highly skilled and experienced Site Reliability Engineer (SRE) to join our team.
- The ideal candidate will have a strong background in AWS cloud platforms, DevOps practices, and modern software development frameworks.
- The Site Reliability Engineer (SRE) will play a critical role in designing, building, and maintaining highly scalable, fault-tolerant, and secure cloud infrastructure while ensuring operational excellence, high availability, and reliability.
- Some kind of programming/ development knowledge either in Java or in Python, so a good understanding of either Python or Java or both is excellent.
Key Responsibilities:
1. Cloud Infrastructure & Automation:
- Design, implement, and manage cloud-based infrastructure using platforms in AWS
- Utilize Infrastructure-as-Code (IaC) tools such as Terraform, CloudFormation, and Ansible to automate deployments and configurations.
- Create robust automation targeted at anomaly detection, toil reduction, recovery processes, and self-healing mechanisms, and optimize cloud costs.
2. DevSecOps & CI/CD:
- Deep understanding of DevSecOps principles and CI/CD pipelines using tools like GitLab, Jenkins, SonarQube, Nexus/Artifactory, and Docker.
- Implement security best practices, including IAM roles, RBAC, vulnerability remediation, and SAST/DAST/SCA tools.
3. Observability & Incident Management:
- Design and implement monitoring, logging, and distributed tracing solutions using tools like AWS CloudWatch, Splunk/SignalFX, Dynatrace, and OpenTelemetry.
- Lead root cause analysis, blameless postmortems, and proactive incident management to minimize MTTR and MTTD.
- Define and monitor SLOs, SLIs, and error budgets to ensure system reliability.
4. Microservices & API Management:
- Architect and manage microservices, serverless computing, and RESTful APIs.
- Ensure fault tolerance and resilience using design patterns like Circuit Breaker, Retry, Timeout, and Bulkhead.
5. Chaos Engineering & Resiliency:
- Conduct chaos engineering experiments using tools like AWS FIS and Chaos Toolkit.
- Perform resiliency assessments using Resilience Hub and implement self-healing solutions.
6. Database & Application Support:
- Manage and optimize database technologies such as PostgreSQL, MongoDB, DynamoDB, Oracle, and Redshift.
- Provide production support, including incident response, problem management, and runbook creation. Participate in on-call rotations.
7. Collaboration & Communication:
- Collaborate with cross-functional teams to implement shift-left testing practices (BDD, TDD, Unit, Regression).
- Create and maintain architecture diagrams, knowledge articles, and disaster recovery plans.
- Communicate effectively with stakeholders and demonstrate strong relationship management skills.
Required Skills & Qualifications:
- Expertise in cloud platforms (AWS) and container orchestration.
- Proficiency in programming/scripting languages such as Python, Java, Node.js, Bash, and PowerShell.
- Strong knowledge of database technologies (e.g., PostgreSQL, MongoDB, DynamoDB, Oracle, Redshift).
- Experience with DevOps tools (Jenkins, Docker, Nexus/Artifactory) and build tools (Maven, Gradle).
- Familiarity with AI/ML integrations, event-driven architectures, and distributed systems.
- Expertise in observability, logging, and monitoring tools (AWS CloudWatch, Splunk, Dynatrace, OpenTelemetry).
- Strong understanding of security practices, including IAM, RBAC, and vulnerability management.
- Experience with chaos engineering, resiliency assessments, and disaster recovery planning.
- Proficiency in performance testing tools (JMeter, LoadRunner) and capacity planning.
- Excellent verbal and written communication skills, with the ability to collaborate across teams.
Preferred Qualifications:
- Experience with AI/ML libraries (e.g., NLTK, Transformers, Spacy, SciPy), Amazon SageMaker, and GenAI tools.
- Familiarity with project management tools like JIRA, Confluence, and ServiceNow.
- Knowledge of utilities like AWS CLI, POSTMAN, and curl.
#LI-CGTS
#TS-3142