What are the responsibilities and job description for the Site Reliability Engineer position at Careerbuilder-US?
Job description :
We are a cutting edge biomedical startup that is preparing for our first product release. This is a unique opportunity to be on the ground floor of a rapidly growing biomedical company. We are a tight-knit, agile group with many capable engineering, medical, and business personnel on the team and board alike. We are looking to further expand our team by adding a strong software development arm to the company.
Current Project :
Client's Humero Tech C1 changes the way shoulder injuries are rehabilitated with our innovative strength-building and sensor based technology. Our rotator cuff machine tracks patients' efforts as they work through strength-based exercises. At the end of sessions, the user gets a set of in-depth metrics to help inform the next steps for recovery.
Client is at the very beginning of device rollout into the field, and thus Titin is searching for a talented Site Reliably Engineer to ensure customers have a smooth experience while working with our software and their data.
Additionally, a strong and positive personality is critical because this person will inevitably be communicating directly with our customers.
System Monitoring and Incident Management
Set up and maintain monitoring tools to track system performance, availability, and reliability.
Respond to incidents, troubleshoot issues, and ensure fast recovery to minimize downtime.
Implement alerting mechanisms to proactively identify potential issues before they impact end users.
Automation and Efficiency
Automate manual operations and repetitive tasks to improve system reliability and speed.
Write scripts and create tools to streamline deployment, monitoring, and scaling processes.
Work with Continuous Integration / Continuous Deployment (CI / CD) management tools.
Infrastructure Management
Manage cloud infrastructure to ensure system reliability and scalability.
Monitor and maintain these systems to comply with HIPPA and SOC 2 requirements.
Performance Optimization
Analyze system performance and work on tuning to meet predefined service level objectives (SLOs).
Optimize resource usage, including compute, memory, and storage, to ensure cost-efficiency without sacrificing performance.
Disaster Recovery and High Availability
Develop, test, and implement disaster recovery plans.
Ensure high availability by using redundancy, failover mechanisms, and geographical distribution of systems.
Security and Compliance
Implement security best practices to safeguard data and systems.
Ensure compliance with industry regulations and internal security policies.
Cooperate and respond with necessary compliance Audits.
Collaboration and Communication
Work closely with development teams to integrate reliability into the software development lifecycle.
Participate in post-incident reviews to identify root causes and prevent future occurrences.
Provide technical support to teams and help to build a culture of reliability across the organization.
Documentation
Document incident response processes, infrastructure architecture, and SRE best practices.
Maintain clear, accessible records for troubleshooting, deployments, and maintenance tasks.
Generate work instructions to document tasks and enable smooth team expansion.
Continuous Improvement
Identify opportunities for process improvements and performance enhancements.
Keep up to date with the latest technology trends and industry practices, and adopt relevant innovations.
Application Question(s) :
Past Projects Portfolio
Education :
High school or equivalent (Required)
Undergraduate or equivalent experience (Preferred)
AWS Certifications (Preferred)
Required Experience :
Experience with AWS
Experience with Python
Experience with SQL / Databases
Knowledge of managing cloud-based infrastructure, networking, and storage.
Ability to write automation scripts for deployment, monitoring, and scaling.
Preferred Experience :
Experience with Linux / Unix systems
Experience with version control systems like Git.
Understanding of AWS IAM
Expertise in system administration tasks, such as patching, user management, and system performance tuning.
Familiarity with securing infrastructure, including access control, encryption, and vulnerability management.
Keep a pulse on the job market with advanced job matching technology.
If your compensation planning software is too rigid to deploy winning incentive strategies, it’s time to find an adaptable solution.
Compensation Planning
Enhance your organization's compensation strategy with salary data sets that HR and team managers can use to pay your staff right.
Surveys & Data Sets
What is the career path for a Site Reliability Engineer?
Sign up to receive alerts about other jobs on the Site Reliability Engineer career path by checking the boxes next to the positions that interest you.