What are the responsibilities and job description for the Site Reliability Engineering Technical Writer position at Sunrise Systems Inc?
Job Title : Site Reliability Engineering Technical Writer
Job ID : 25-06712
Location : Remote US, primarily EST working hours
Duration : through end of 2025 at minimum
Job Description :
The role will bridge the gap between documenting critical reliability functions and practices to performing.
- Create organized workflows and documented processes with collaboration tools (i.e. PagerDuty, MS Teams, JIRA, and Confluence).
- Offer comprehensive instructions for routine operations and troubleshooting. Create guides for deployments, scaling, monitoring setup, backups, and updates.
- Create and maintain troubleshooting guides to resolve known issues with specific services or systems, including common symptoms, logs to check, resolution steps, etc.
- Documentation of scripts and tools used for repetitive tasks, including usage instructions.
- Ensure teams can effectively monitor systems and respond to alerts, including instructions for using monitoring dashboards and interpreting data.
- Provide a clear framework for handling and resolving incidents efficiently. These are the runbooks, playbooks or Paging Docs that help first line quickly assess and resolve. Work with SMEs to drive content updates.
- Define when and how to escalate incidents, including contact details for on-call personnel. Manage templates for internal and external communication during an incident.
- Encourage learning from incidents to prevent recurrence and improve processes, emphasizing a learning culture over placing blame. Document underlying causes and contributing factors with follow-up tasks and ownership / deadlines.
- Test system resilience and ensure preparedness for failures. Document findings.
- Document the operational goals, reliability standards, and expectations for each service by working directly with stakeholders to determine availability requirements.
- Ensure systems adhere to organizational policies and regulatory requirements. Define / Document requirements for access controls, data encryption, and compliance.
Required Experience :
Proficiency in using collaboration tools such as PagerDuty, MS Teams, JIRA, and Confluence to create and manage workflows and processes.
Proven experience in creating comprehensive documentation for deployments, scaling, monitoring setups, backups, updates, and routine operations.
Experience in documenting scripts and tools used for repetitive tasks, with clear usage instructions.
Knowledge of monitoring systems and the ability to create guides for using monitoring dashboards, interpreting data, and responding to alerts.
Ability to define and document escalation procedures, including contact details for on-call personnel.
Experience in documenting underlying causes and contributing factors of incidents, and encouraging a culture of learning from incidents to improve processes.
Experience in testing system resilience, documenting findings, and ensuring preparedness for failures.
Ability to work directly with stakeholders to define operational goals, reliability standards, and availability requirements for services.
Knowledge of organizational policies and regulatory requirements, including documenting access controls, data encryption, and compliance standards.
Contact :
Raghu : raghu.m@sunrisesys.com | 732-272-0336 | URL : www.sunrisesys.com