What are the responsibilities and job description for the Fault Isolation & Management Engineer position at NEXTGEN Innovation Labs?
JOB TITLE: Fault Isolation & Management Engineer
LOCATION: Denver, CO (onsite only)
DURATION: 12 MONTHS CONTRACT
Job Description:
The Fault Isolation & Management Engineer, Wireless NOC will be supporting Our Wireless’ 24/7 Network Operations Center. A FIM Engineer will work closely with the Dell Hardware, Vendors/Engineering, and peers across NOC to ensure that wireless projects and new initiatives to ensure platform related outages are resolved within SLA. This position will escalate complex issues to our Engineering teams, and will determine root-causes for failure and develop corrective actions. This pivotal role requires exceptional verbal and written communication skills, a deep understanding of complex system and solution architectures, and the ability to demonstrate engineering expertise through the debugging of intricate issues in enterprise solutions. You will lead technical innovations, mentor fellow engineers, and troubleshoot critical design challenges to uphold the highest standards of quality and reliability. As a strategic member of our team, you will have the opportunity to implement and optimize cutting-edge processes and procedures in ORAN technology. Corrective actions can range from configuration changes to developing new operational procedures. This position will contribute to on-going process improvement reviews using Key Performance Indicator (KPI) metrics to eliminate errors, maximize hardware/software efficiencies, and increase service up-time leading to an overall excellent customer experience.
Responsibilities
- Drive continuous development and enhancements of server hardware platforms, ensuring optimal performance and scalability.
- Provide custom integration and implementation support for EMC products, including software, directly at customer sites.
- Leverage extensive experience with DELL hardware PowerEdge, XR11, R740, R750 and Open Manage Enterprise (OME) software to trace product data flow through the entire product lifecycle, conducting comparative analysis of large data sets, particularly within 5G RAN environments.
- Proficiency in Supermicro servers will be an added advantage
- Draft clear and effective technical solutions that demonstrate business value, ensuring alignment with operational procedures, and document proposals professionally in written reports or presentation slides.
- Provision systems and solutions in detail according to the operational RUN book, ensuring compliance with customer-specific standards.
- Develop and implement cost-effective methods for testing and troubleshooting systems and equipment to ensure reliability and efficiency.
- Manage multiple assignments and activities simultaneously, ensuring alignment with operational objectives and project timelines.
- Work closely with other support teams and engineers to resolve complex issues and improve support processes.
- Provide technical support for Dell Open Manage Enterprise (OME), including troubleshooting and resolving issues.
- Maintain records of support cases, vendor interactions, agreements, and performance metrics.
- Build and maintain strong relationships with vendors to ensure timely and efficient service delivery.
- Identify potential hardware issues before they occur through routine checks and preventive maintenance practices.
- Own responsibility to handle Outages/Service degradations, perform initial RCA and coordinate with vendors, engineering teams to compose final RCA.
- Own responsibility of Market chat groups, initial troubleshooting & necessary support
- Support site monitoring/health checks following maintenance activities (CR’s)
- Oversee the incident management process and team members involved in resolving the incident, as well as driving Ticket management analysis and follow-up until closure.
- Assist with prioritizing/deprioritizing incidents according to their urgency and impact on the business.
- Collaborate and escalate issues with the FIM, Advance Ops, and Engineering & Vendor teams when critical/time sensitive support and resolution is needed.
- Manage outage and emergencies, including the agreed assurance KPI’s & SLA.
- Work in close collaboration across multiple functions within NOC: RF, Deployment & Integration teams, Tech Dev, Core, Cloud Infra, Network Engineering & Market teams to document troubleshooting steps & methods to improve processes
- Assist in driving resolutions for customer complaints (CXO) within service level agreements (SLAs) and ensure effective operational performance and management.
- Own responsibility for Trouble Tickets updated with all the technical details, and troubleshooting MOP’s & templates
- Responsible to ensure the Open Incident backlog is at optimum levels
- Maintain the National Level Availability >99.50%.
- Manage internal, external and customer incident escalations and follow-ups as well as process adherence.
- Contribute to ongoing process improvement reviews, identifying areas for automation and overall efficiency improvements increasing service up-time and overall customer experience.
- Maintain a detailed working knowledge of network technologies and would understand how data moves between Cloud, PaaS Solutions, and Legacy TDM/IP environments.
Skills, Experience and Requirements
- Bachelor/Master's degree or equivalent.
- Minimum of 7-10 years' of telecom/wireless experience working in a RAN domain
- Extensive experience in design architecture and product development, specifically with enterprise server and storage products.
- Strong expertise in PCIe infrastructure and in-depth understanding of pivotal storage concepts such as SAS/SATA and NVMe.
- Solid working knowledge of key system-level concepts, including PCIe VDM and L2C.
- Proven solution-level knowledge of firmware, device drivers, and server storage management strategies.
- Proficient in operating and debugging using protocol analyzers with a strong understanding of embedded systems.
- Ability to identify product weaknesses and provide insightful design guidance and recommendations for improvements.
- Strong experience in VMWare and Linux OS (Ubuntu, Debian, or RHEL) debugging.
- Hands-on experience with virtual environments (VMs) and their management.
- Hands-on experience with Kubernetes and their management.
- Extensive understanding in 5G ORAN split functions (RU, DU, CU, CP, UP…)
- Extensive understanding and experience in RAN/vRAN software architectures in a virtualized environment.
- Extensive understanding of RF Radio operations over eCPRI interfaces. Includes CSR, DU and timing configurations.
- Extensive understanding of software systems/services running virtualized ORAN functions.
- Understanding of latency considerations and issues from gNB to Data Center locations.
- Understanding and experience with virtualized networks, orchestration and automation across a large network architecture.
- Thorough understanding of vRAN/ORAN architecture described in 3GPP standards
- Focused to work under pressure related to the scale of business impact and build strong working relationships both internally and externally.
- Reliable, open and capable of working with minimum supervision.
- Flexible, analytical thinker.
- Enthusiastic and keen to learn new technologies and approaches.
- Self-motivated, achievement-oriented with excellent people management skills and an ability to perform challenging tasks individually with ease.
- Focused on being detail-oriented with strong organization skills.
- Displays ability to work in a fluid atmosphere, handle multiple tasks and set priorities.
Regards,
Piyush Kumar
US IT Recruiter
NEXTGEN INNOVATION
Email:- piyushk@nextgeninnovation.com