What are the responsibilities and job description for the Operations Command Center Engineer (OCCE2) position at Trident Consulting?
Operations Command Center Engineer (OCCE2)
Sacramento, California
Information Technology - IT Operations /
Exempt /
Hybrid
Berkshire Hathaway Homestate Companies, Workers Compensation Division, has an immediate opening for an Operations Command Center Engineer 2 (OCCE2). The OCCE2 will be responsible for handling escalated incidents as referred by OCCE1, performing deeper troubleshooting, incident management, and root cause analysis. This individual will provide technical expertise to ensure uptime and efficiency in the operations of IT systems, applications, and infrastructure, and will be involved in maintaining and updating monitoring tools, processes, and cloud-based solutions to enhance operational efficiency. Contributes to key areas such as network management, system administration, and automation.
KEY RESPONSIBILITIES
- Manages escalated tickets from OCCE1 for advanced troubleshooting and problem resolution across network, system, and cloud platforms.
- Proactively monitors system health, performance, and uptime, ensuring continuous service availability using advanced monitoring and observability tools.
- Identifies recurring incidents and initiates root cause analysis for long-term resolution.
- Collaborates with cross-functional teams, including Applications, Infrastructure, Security, and Cloud teams, to resolve incidents.
- Configures, troubleshoots, and maintains network devices (e.g., routers, switches, firewalls) and ensures secure remote access (VPN, remote desktop solutions).
- Manages and maintains cloud infrastructure (AWS, Azure, GCP), including virtualization (VMware, Hyper-V) and automation (Terraform, Ansible).
- Develops and refines operational runbooks, playbooks, and response procedures, focusing on improving cloud governance and security.
- Participates in on-call rotations to support incident handling outside of normal business hours.
- Contributes to the continuous improvement of monitoring tools, cloud services, and incident management processes.
- Prepares and delivers post-incident reports, root cause analysis, and lessons learned to Senior Management.
- Ensures that SLAs related to response times, escalation, and ticket handling are met consistently.
- Coordinates shift handovers with detailed incident reporting and supporting documentation.
- Leads efforts on system administration (Windows, Linux, Mac OS), backup and disaster recovery procedures, and server management.
- Participates in project management efforts, capacity planning and risk management for ongoing operations.
EDUCATION / EXPERIENCE
PREFERRED CERTIFICATIONS
SKILLS NEEDED : Network & Infrastructure Management
SKILLS NEEDED : System Administration
SKILLS NEEDED : Cloud & Virtualization
SKILLS NEEDED : DevOps & CI / CD
SKILLS NEEDED : Security & Compliance
SKILLS NEEDED : Database Management
SKILLS NEEDED : Monitoring & Observability
SKILLS NEEDED : End-User Support & Troubleshooting
SKILLS NEEDED : Automation & Scripting
SKILLS NEEDED : Project Management & Documentation
SKILLS NEEDED : Data Analytics & Reporting