What are the responsibilities and job description for the Principal Site Reliability Engineer (Cortex Cloud Security Posture Management) position at SIDRAM TECHNOLOGIES?
Atlanta GA- Internal
As a Principal SRE with the Cortex Cloud Security Posture Management team, you will:
Your Experience
As a Principal SRE with the Cortex Cloud Security Posture Management team, you will:
- Cloud Expertise - Utilize your expertise in monitoring cloud platforms, particularly Google Cloud Platform, to optimize our infrastructure leveraging cloud-native technologies
- Incident Management - Leverage incident management processes to ensure efficient resolution of system issues and minimal impact on services
- Automation - Automate complex monitoring and alerting tasks by building tools for cloud operations, such as automated remediation of known issues and auto-scaling
- CI/CD - Develop and maintain application deployment tools such as Terraform and Helm
- Continuously Improve - Stay up-to-date with cutting-edge technologies, evaluate their potential impact on our operations, and implement them when appropriate
- On-Call - Participate with our DevOps team to provide follow-the-sun operational coverage in the production of our SaaS product
- Collaborate - Work with our Engineering team to influence the operability of the product and ensure the reliability and availability of our services
Your Experience
- Incident and Alerts Management - Clear understanding of incident and alerts management in Site Reliability Engineering
- DevOps/SRE Expertise - 5 years of experience as a DevOps/SRE engineer with a passion for technology and a strong motivation for high reliability at the service level
- Cloud Proficiency - High proficiency in either Google Cloud Platform or Amazon Web Services
- Kubernetes and Docker - High proficiency with Kubernetes and Docker for container orchestration
- Scripting and Automation - High proficiency in Python programming and Linux Shell commands - Experience with Terraform for infrastructure as code
- Security - Strong grasp of security concepts and best practices
- Observability - Experience with observability and incident response tools
- Communication Skills - Effective communication and interpersonal skills, with the ability to work and coordinate between multiple teams
- Troubleshooting - Ability to effectively troubleshoot and address emerging and complex problems
- Independence - Ability to operate independently, make decisions, take action, and take responsibility