What are the responsibilities and job description for the Site Reliability Engineer position at Fusion HCR?
Fusion HCR is hiring! Direct hire, Site Reliability Engineer.
We are seeking a proactive and experienced Site Reliability Engineer (SRE) to enhance the reliability, scalability, and performance of our cloud-based systems, with a strong emphasis on Microsoft Azure. This role involves transitioning from reactive support to proactive system hardening, ensuring seamless integration of reliability best practices into our development and operational workflows.
Key Responsibilities:
Qualifications:
Technical Proficiency:
We are seeking a proactive and experienced Site Reliability Engineer (SRE) to enhance the reliability, scalability, and performance of our cloud-based systems, with a strong emphasis on Microsoft Azure. This role involves transitioning from reactive support to proactive system hardening, ensuring seamless integration of reliability best practices into our development and operational workflows.
Key Responsibilities:
- System Design and Implementation: Architect, implement, and maintain scalable and reliable cloud infrastructure solutions within Azure to support the company's applications and services.
- Monitoring and Incident Response: Develop and maintain comprehensive monitoring and alerting systems to detect and address issues proactively; respond promptly to incidents, conduct root cause analysis, and implement preventative measures.
- Automation and Infrastructure as Code (IaC): Automate deployment, configuration, and management processes using tools such as Terraform or Azure Resource Manager (ARM) templates to enhance efficiency and reduce manual intervention.
- Collaboration with Development Teams: Work closely with development teams to integrate reliability into the software development lifecycle, ensuring that design, testing, and deployment of new products and features are optimized for reliability.
- Performance Optimization: Conduct capacity planning, performance tuning, and optimization to meet operational demands and ensure high availability and performance of systems.
- Security and Compliance: Implement and enforce security best practices within the cloud environment, ensuring compliance with industry standards and regulatory requirements.
- Documentation and Training: Develop and maintain detailed documentation for system operations, reliability practices, and incident response procedures; provide training to team members as needed.
Qualifications:
- Bachelor’s degree in Computer Science, Engineering, Information Systems, or a related field.
- Professional Experience: Minimum of 5 years in site reliability engineering or related roles, with substantial experience in Azure cloud services.
Technical Proficiency:
- In-depth knowledge of Azure services, including Azure Virtual Networks, Azure Active Directory, Azure Monitor, and Azure DevOps.
- Proficiency in scripting and programming languages such as Python, Bash, or PowerShell for automation purposes.
- Experience with containerization and orchestration tools like Docker and Kubernetes.
- Familiarity with CI/CD pipelines and related tools, ensuring seamless integration and deployment processes.
- Excellent analytical and troubleshooting abilities, with a proactive approach to identifying and resolving system issues.
- Strong verbal and written communication skills, capable of effectively collaborating with cross-functional teams and conveying complex technical concepts to both technical and non-technical stakeholders.
- Relevant certifications such as Microsoft Certified: Azure Administrator Associate, Microsoft Certified: Azure Solutions Architect Expert, or similar are highly desirable.