Demo

Lead High Performance Computing Engineer

The George Washington University
Ashburn, VA Full Time
POSTED ON 3/4/2025
AVAILABLE BEFORE 6/2/2025

I. JOB OVERVIEW

Job Description Summary :

GW Information Technology (GW IT) provides empowering tools and caring support for all members of The George Washington University (GW) community. We are focused on driving digital transformation and innovation to enable the academic and operational excellence of our students, faculty, staff, and researchers. At GW IT, we are committed to cultivating a team culture that values diversity, inclusion, respect, and collaboration, and invests in each of our team members to grow in their technology and career skills.

Research Technology Services (RTS) is a team of Research Computing and Data (RCD) professionals within GW IT that provide and support various cyberinfrastructure (CI) systems and services in support of GW’s research mission. The RTS service portfolio includes high-performance computing clusters, public cloud infrastructure, purpose-built computational, data management, storage platforms, application and workflow support across a wide array of research disciplines. All RTS members have blend of “system facing” and “researcher-facing” duties, with diverse responsibilities across Applications, System Administration, User Support, High-performance Computing, Research Data Management, Networking, Cloud Computing, and Cybersecurity.

As a Lead High Performance Computing (HPC) Engineer, you will be responsible for designing, implementing, and maintaining high-performance computing systems to meet the computational needs of the RTS. In collaboration with other high-performance computing (HPC) engineers, this senior position is accountable for the operations of multiple HPC systems and contributes to the strategic planning for next-generation services aligned with High Performance Computing services. As part of an advanced team of engineers, this role works closely with the GW research community to define and deliver HPC, related advanced compute and storage infrastructure to support the rapidly evolving research needs. The lead HPC Engineer also engages directly on research projects to understand and consult on the best options available as well as serving as the highest tier of support escalation for operational issues. This position develops and conducts advanced training and mentors other engineers on the team to enhance the interdisciplinary capabilities of the research technology support organization.

While the Lead HPC Engineer position’s focus is more on the HPC service than other RTS services, the Lead HPC Engineer is encouraged to gain advanced knowledge and specialization in a broad range the listed domains. As a leader in these areas within the RTS, this position should actively contribute to the support and adoption of new technologies within the Research Technology Services team.

Responsibilities Include :

  • Follow industry standards to plan and execute system-wide changes, meet with other IT stakeholders, and interact with other teams within GWIT to ensure the HPC environment is operating optimally.
  • Proactively monitor and gather statistics on the HPC infrastructure to identify problems and system issues and work with RTS and GWIT engineers to resolve any bottlenecks in the systems.
  • Keep up with new research areas and new technologies to stay ahead of the needs of researchers.
  • Psc updated the formatting / order here; the above duties were listed before the “While the Lead HPC…” paragraph.

Research Computing and Data (RDC)

  • Categorize and match research computing demands with appropriate platforms, e.g., cloud, HPC, & HTC to aid researchers and stakeholders in planning to meet their research objectives.
  • Assist researchers achieve compliance in storing and handling restricted or regulated data.
  • Create and update knowledge base articles, FAQs, and support documentation.
  • High-Performance Computing (HPC) and Big Data

  • Implement and maintain job schedulers, resource managers, and data transfer solutions for efficient HPC and big data operation.
  • Design, build, and manage scalable HPC clusters and storage solutions to support a wide range of research and computational workloads.
  • Lead the integration of HPC systems with big data platform as needed (e.g., Hadoop, Spark) to process and analyze large datasets.
  • Collaborate with research scientists to optimize software implementations and workflows for HPC environments, enhancing performance and scalability.
  • Participate in outreach events and workshops to educate and update users about HPC developments, tools, and resources.
  • Support team members in ongoing HPC enhancements, maintenance and upgrades.
  • Working with others on complex R&D projects involving teams of scientific researchers, hackers, and developers.
  • Cloud Computing

  • Architect, deploy, and manage cloud-based HPC solutions (AWS, Azure, Google Cloud) for scalable, on-demand research computing resources.
  • Lead efforts to migrate traditional HPC workloads to cloud environments while maintaining performance and cost-effectiveness.
  • Ensure seamless integration between on-premise HPC systems and cloud infrastructure, enabling hybrid computing models.
  • Ensure monitoring of the cloud infrastructure and services, respond to alerts, and take appropriate actions to maintain system performance and availability.
  • Networking

  • Support the Architecture and maintenance of high-speed networking solutions (InfiniBand, Ethernet) for low-latency, high-bandwidth data transfers in H environments.
  • Collaborate with networking teams to ensure secure, robust, and scalable data transfer between HPC clusters, storage systems, and research facilities.
  • Implement and monitor security best practices for data integrity, confidentiality, and regulatory compliance in HPC and cloud computing systems.
  • Troubleshooting of network connectivity, routing, and switching issues.
  • AI & Machine Learning Integration

  • Understanding of ML / AI products and technologies.
  • Work closely with data scientists and researchers to integrate AI / ML workloads with HPC and cloud infrastructures.
  • Optimize AI model training and deployment using GPU-accelerated computing, distributed training frameworks, and HPC architectures.
  • Lead AI / ML-related projects that require high-performance computing resources for tasks such as model training, inference, and data analysis.
  • Research Computing & Collaboration

  • Collaborate with researchers, faculty, and technical teams to understand scientific workflows and compute requirement
  • Provide technical leadership, mentorship, and training to junior engineers and researchers in HPC and cloud computing best practices.
  • Engage in strategic planning to enhance research computing capabilities, including capacity planning, infrastructure upgrades, and the adoption of emerging technologies.
  • Applications

  • Collaborate with users to ensure applications run efficiently, meeting BO performance requirements and user expectations.
  • Provide training to users, as needed.
  • Install and deploy applications as needed by users.
  • Troubleshoot and resolve application-related issues.
  • Support planning and execution of application upgrades and deployments.
  • Data Management Plans

  • Develop and implement data management plans that comply with NIST / CMMC standards, ensuring proper data handling, storage, and retention.
  • Educate researchers and stakeholders on data management best practices and policies.
  • Audit current practices and deployments to ensure they are consistent with the Industry Standards.
  • Storage Systems

  • Manage storage devices to store and retrieve large volumes of data for various research computing applications and platforms.
  • Optimize storage systems for performance, scalability, and disaster recovery.
  • Research and propose new storage systems and methodologies for use with HPC and other RTS systems.
  • Cybersecurity and Identity

  • Work with the security team to conduct regular security assessments and reviews of the cyberinfrastructure to identify vulnerabilities and risks. Recommend and implement security postures and protocols to mitigate potential threats and breaches.
  • Work with GW Information Security team to support infosec related activities.
  • Facilitate consistent identity and group management throughout research cyberinfrastructure with Enterprise Active Directory.
  • Performs other related duties as assigned. The omission of specific duties does not preclude the supervisor from assigning duties that are logically related to the position.

    While the position is designated at the GW Ashburn campus, RTS team members may have the option of choosing either Ashburn or Foggy Bottom as their primary location. Team members regularly are expected to travel between the campuses, regardless of their primary location.

    Minimum Qualifications :

    Qualified candidates will hold a Bachelor’s degree in an appropriate area of specialization plus 5 years of relevant professional experience, or, a Master’s degree or higher in a relevant area of study plus 3 years of relevant professional experience. Degree must be conferred by the start date of the position. Degree requirements may be substituted with an equivalent combination of education, training and experience.

    Additional Required Licenses / Certifications / Posting Specific Minimum Qualifications :

    Preferred Qualifications :

  • Experience in a large-scale production high performance computing environment.
  • Familiarity with a variety of the HPC subject area concepts and practices in the context of academic research, to include basic understanding of sponsored research compliance requirements.
  • Strong expertise in HPC technologies, including parallel computing architectures, job scheduling systems (Slurm), and interconnect technologies (e.g., InfiniBand, Ethernet.
  • Proficiency in programming languages commonly used in scientific computing.
  • Experience with HPC storage systems, file systems (e.g., Lustre, GPFS), a data management strategies.
  • Excellent leadership and communication skills, with the ability to effectively collaborate with stakeholders across the organization.
  • Knowledge of security best practices and experience. implementing security controls in HPC environments.
  • Excellent oral and written communication skills; ability to prepare and present comprehensive presentations to IT and business executives.
  • Demonstrated experience working in an environment with rapidly changing job priorities.
  • Strong analytical and troubleshooting skills.
  • Ability to creatively improve workflows and processes.
  • Experience scripting in Perl, Python, or Bash.
  • Experience with Linux kernel modules, preferably for Lustre, NVIDIA GPUs, and Mellanox InfiniBand card.
  • Familiarity with the Simple Linux Utility for Resource Management (Slurm) workload manager, or other job schedulers, including the setup and maintenance of a multi-factor fair-share priority scheme.
  • Familiarization with virtualization environments for front-end and maintenance image management.
  • Familiarity with ticket tracking systems and service level management.
  • Hiring Range

    92,790.58 - $150,696.60

    GW Staff Approach to Pay

    How is pay for new employees determined at GW?

    Healthcare Benefits

    GW offers a comprehensive benefit package that includes medical, dental, vision, life & disability insurance, time off & leave, retirement savings, tuition, well-being and various voluntary benefits. For program details and eligibility, please visit

    https : / / hr.gwu.edu / benefits-programs.

    II. JOB DETAILS

    Campus Location : Ashburn, Virginia

    College / School / Department :

    GW IT

    Family

    Information Technology

    Sub-Family

    High Performance Computing

    Stream

    Individual Contributor

    Level

    Level 3

    Full-Time / Part-Time :

    Full-Time

    Hours Per Week : 40

    Work Schedule :

    Monday - Friday, 8 am - 5 pm

    Will this job require the employee to work on site?

    Employee Onsite Status

    Hybrid

    Telework :

    Required Background Check :

    Criminal History Screening, Education / Degree / Certifications Verification, Social Security Number Trace, and Sex Offender Registry Search

    Special Instructions to Applicants :

    Employer will not sponsor for employment Visa status

    Internal Applicants Only?

    Posting Number : S013625

    Job Open Date : 02 / 24 / 2025

    Job Close Date :

    If temporary, grant funded, Sponsored Project funded or limited term appointment, position funded until :

    Background Screening

    Successful Completion of a Background Screening will be required as a condition of hire.

    EEO Statement :

    The university is an Equal Employment Opportunity / Affirmative Action employer that does not unlawfully discriminate in any of its programs or activities on the basis of race, color, religion, sex, national origin, age, disability, veteran status, sexual orientation, gender identity or expression, or on any other basis prohibited by applicable law.

    Salary : $92,791 - $150,697

    If your compensation planning software is too rigid to deploy winning incentive strategies, it’s time to find an adaptable solution. Compensation Planning
    Enhance your organization's compensation strategy with salary data sets that HR and team managers can use to pay your staff right. Surveys & Data Sets

    What is the career path for a Lead High Performance Computing Engineer?

    Sign up to receive alerts about other jobs on the Lead High Performance Computing Engineer career path by checking the boxes next to the positions that interest you.
    Income Estimation: 
    $86,680 - $110,316
    Income Estimation: 
    $110,730 - $135,754
    Income Estimation: 
    $117,033 - $148,289
    Income Estimation: 
    $150,358 - $188,456
    Income Estimation: 
    $197,066 - $250,309
    Income Estimation: 
    $117,033 - $148,289
    Income Estimation: 
    $71,122 - $96,652
    Income Estimation: 
    $92,929 - $122,443
    Income Estimation: 
    $92,929 - $122,443
    Income Estimation: 
    $122,257 - $154,284
    Income Estimation: 
    $122,257 - $154,284
    Income Estimation: 
    $143,391 - $179,890
    View Core, Job Family, and Industry Job Skills and Competency Data for more than 15,000 Job Titles Skills Library

    Job openings at The George Washington University

    The George Washington University
    Hired Organization Address Washington, DC Full Time
    I. JOB OVERVIEW Job Description Summary : The George Washington University (GW) seeks a visionary and dynamic leader to ...
    The George Washington University
    Hired Organization Address Washington, DC Full Time
    I. JOB OVERVIEW Job Description Summary : GW CAPS works collaboratively with students to provide compassionate, comprehe...
    The George Washington University
    Hired Organization Address Washington, DC Full Time
    I. JOB OVERVIEW Job Description Summary : GW Information Technology is the chief provider of technology services and app...
    The George Washington University
    Hired Organization Address Ashburn, VA Full Time
    I. JOB OVERVIEW Job Description Summary : The Data Management Architect is responsible for designing, implementing, and ...

    Not the job you're looking for? Here are some other Lead High Performance Computing Engineer jobs in the Ashburn, VA area that may be a better fit.

    High Performance Computing Engineer

    SeaGlass IT, Ashburn, VA

    AI Assistant is available now!

    Feel free to start your new journey!