What are the responsibilities and job description for the Digital Site Reliability Engineer position at Technogen, Inc.?
Job Description:
We are seeking a highly skilled and experienced Reliability Engineer to join our team. The ideal candidate must have a strong background in technology, with specific expertise in Kubernetes, Gitlab, Dynatrace, GraphQL, Node, React with a good understanding of CI/CD pipelines. The candidate must be comfortable with ambiguity, learning new things and have a perseverance similar to “if at first I don’t succeed, try and try again”
Responsibilities:
• Collaborate with cross-functional teams to develop and maintain release architectures and monitor frameworks.
• Provide system design consulting and critical support to the development team prior to program launch.
• Identify and solve sophisticated performance and scaling issues, working with engineers to avoid bottlenecks and meet traffic demands.
• Mentor and guide team members, helping them grow in their roles.
• Identify and implement automation and monitoring tools to improve the efficiency and effectiveness of SRE processes.
• Take ownership of any critical incidents and work towards timely resolution and prevention of future occurrences.
Mandatory Requirements:
• Five (5) to Seven (7) years of professional experience in technology or a related field.
• Two (2) years of experience with Kubernetes/EKS
• Two (2) years of experience with CI/CD pipelines.
• Two (2) years of experience with a sophisticated observability platform including RUM and APM.
Good To Have Requirements
• Familiarity with reading and understanding JavaScript (Node.JS).
• Capabilities utilizing Dynatrace APM and RUM (other APM or RUM may be applicable) - Dynatrace Associate Certification is a plus.
• Intermediate to Advanced skills in BASH shell scripting, Python and Docker
• Intermediate skills with on-prem Gitlab CI pipeline creation, troubleshooting, and configuration of Gitlab CI.
Preferred Qualifications:
• Solve sophisticated performance and scaling issues, working with engineers to ensure that we avoid bottlenecks and meet traffic demands through organic growth and marketing events.
• Strong problem-solving skills and the ability to work in a fast-paced environment.
• Communicate effectively with stakeholders, including management, to provide updates, recommendations, and solutions for any SRE-related issues.
• Excellent communication and collaboration skills.
• Experience with Kubernetes/EKS and pod life cycle management including readiness and liveness checks.
• Experience with building and supporting CI/CD pipelines and production releases.
• Working knowledge of complex CDN cached website architecture.