Our partner is a world leader in transportation equipment, leasing and asset management. They are in need of an outstanding and strategic Sr. Enterprise Reliability Engineering Manager. To get onboard with this great company and work for a truly wonderful human please apply today. We value diversity in the workplace and encourage women, minorities, and veterans to apply. Thank you!
Location : Portland, OR Hybrid
Type : Perm F / T
Position Summary
The Sr. Enterprise Reliability Engineering Manager is a critical leadership role reporting to the VP, CISO & Enterprise Reliability Engineering, responsible for the overall
reliability, performance, and automation of our IT Infrastructure services and systems. This includes managing and mentoring a talented team of Cloud, Systems, Server, and Network Engineers, fostering a culture of collaboration, growth, and technical excellence. You will lead the charge in improving the maturity of our IT infrastructure by aligning to the principles of Site Reliability Engineering (SRE) and proactively addressing security gaps and single points of failure to ensure a secure and resilient foundation for our business.
Responsibilities
To perform this job successfully, an individual must be able to perform the following essential duties satisfactorily. Other duties may be assigned to address business needs
and changing business practices.
Leadership and Strategy :
- Define and execute the company's SRE strategy, incorporating SLAs, SLIs, and SLOs to measure and improve service reliability.
- Champion a "Microsoft first and cloud smart" approach, prioritizing the use of Microsoft tools and services and leveraging cloud technologies for scalability, optimization, and efficiency.
- Set departmental goals for individual teams, implement measurement tools, and motivate team members to achieve targets, fostering a culture of continuous improvement and innovation.
- Collaborate with cross-functional teams to design and advise on technology solutions, ensuring alignment with security and reliability standards.
Lead and mentor a team of Cloud, Systems, Server, and Network Engineers :
Provide guidance, coaching, and professional development opportunities.Foster a positive and collaborative working environment.Conduct performance evaluations and provide constructive feedback.Build a strong team culture that values innovation, collaboration, and continuous improvement.Reliability Engineering and Automation :
Ensure proper monitoring and optimization of IT environments for high availability and performance, meeting or exceeding defined SLOs.Develop and implement automation solutions to prevent issues before they occur, using tools like Terraform, Ansible, or Azure Resource Manager (ARM).Design and implement secure-by-default frameworks, services, and systems that allow access management services, automate vulnerability management functions, increase visibility into platform security, and enhance secrets management.Oversee incident response and post-incident analysis, driving continuous improvement in system reliability.Create and maintain IT-related policies and procedures, ensuring they are up-to-date and aligned with industry standards and regulations.Improve IT infrastructure maturity by :
Identifying and closing critical security gaps.Eliminating single points of failure.Implementing robust monitoring and alerting systems.Driving automation to reduce manual effort and human error.Service Level Management :
Define, monitor, and report on SLAs, SLIs, and SLOs for critical IT services.Work with stakeholders to establish realistic SLOs that balance reliability with business needs.Use SLOs to guide capacity planning and resource allocation.Conduct regular reviews of SLOs and SLIs to identify areas for improvement.Security and Compliance :
Provide information and participate in security assessments during internal and external audits, working closely with the VP to address any identified vulnerabilities.Adhere to physical, administrative, and technical safeguards related to core business functions, ensuring compliance with all relevant regulations and standards.Integrate security into all aspects of SRE practices, ensuring that systems are secure by default.Infrastructure as Code (IaC) and Azure DevOps (ADO) :
Champion the adoption of IaC principles and practices across the IT infrastructure, driving automation and consistency in infrastructure provisioning and management.Integrate IaC with Azure DevOps pipelines for continuous integration and continuous delivery (CI / CD) of infrastructure changes.Manage and maintain Azure DevOps pipelines for infrastructure deployment and configuration management.QualificationsThe following generally describes requirements to successfully perform the assigned duties.Minimum Qualifications
Proven experience as an SRE Manager or in a similar leadership role, reporting to senior IT executives.Strong understanding of SRE principles, practices, and tools.Experience with automation tools like Terraform, Ansible, or Azure Resource Manager (ARM).Expertise in incident response, post-incident analysis, and capacity planning.Excellent leadership, communication, and negotiation skills, with the ability to build strong relationships with stakeholders at all levels.Strong people leadership skills with a proven track record of building and motivating high-performing teams.Strong project management skills, including budget management, resource allocation, and risk mitigation.Familiarity with ITIL best practices and security standards.