What are the responsibilities and job description for the Observability Solutions Architect position at CloudRex?
Key Duties And Responsibilities
- Build and maintain strong relationships with teams across the organization, including IT infrastructure, operations, application development, and business partners, ensuring understanding and alignment on observability and event management practices.
- Clearly communicate observability and event management concepts and best practices with individual contributors and leadership across the enterprise, building alignment and understanding and conveying importance of key concepts and governance.
- Assist teams with observability projects and initiatives spanning various technology stacks, helping identify requirements, technology considerations, and solutions to provide optimal observability.
- Partner with SaaS vendors on product features, defects, and contract renewals.
- Create and maintain documentation and governance related to observability and event management, ensuring understanding at all levels of the organization.
- Conduct enablement and troubleshooting sessions with teams and individuals, ensuring proper and ongoing engagement with monitoring and event management tools.
- Leverage Jira and ServiceNow to manage work and communicate project status to leadership and stakeholders.
- Mentor junior team members.
- Design, maintain, and implement observability and event management architecture documentation and diagrams, ensuring new and evolving technologies are within scope of current and future practices and solutions.
- Champion the design, implementation, and support for applications, systems, and IT products crucial for the business's objectives.
- Help teams implement observability tools and leverage the available telemetry data to troubleshoot and resolve incidents and problems.
- Implement event management concepts, such as event aggregation and correlation patterns, reducing incident noise while meaningfully combining event data.
- Leverage observability and event management to improve key incident management metrics, such as mean time to detect and mean time to restore service.
- Design, develop, and implement innovative solutions to improve observability and event management practices and processes.
- Influence and drive cultural organizational change from traditional IT Ops to modernized Agile operational philosophies and concepts.
Qualifications
- 5 years experience on public cloud platforms (AWS & Azure)
- 5 years direct experience with observability and event management tools, including New Relic, BigPanda, PagerDuty, and ServiceNow.
- 5 years working in application development and/or IT operations in large, complex environments, including on-prem and cloud infrastructure.
- Proven track record leveraging core observability concepts, including application performance monitoring, end-user monitoring, and infrastructure monitoring with SaaS solutions.
- Experience with programming and scripting languages, such as Go, Python, SQL, JavaScript, and PowerShell.
- Experience with Agile methodologies preferred.
- Experience with automation tools, such as Terraform.
- Excellent written and verbal communication.
- SRE and/or DevOps experience preferred, including practices, processes, and tools.
- Bachelor’s or master’s degree in computer science or related field preferred but not required.
Job Type: Full-time
Pay: $117,500.00 - $197,500.00 per year
Benefits:
- 401(k)
- 401(k) matching
- Dental insurance
- Health insurance
Compensation Package:
- Yearly pay
Schedule:
- 8 hour shift
- Monday to Friday
Ability to Relocate:
- Cary, NC 27511: Relocate before starting work (Required)
Work Location: In person
Salary : $117,500 - $197,500