Dartmouth College is Hiring a Research Cyberinfrastructure Engineer II, HPC and GPU Cluster (RCIEII). Near Hanover, NH
Position DetailsPosition Information Posting date 06/10/2024 Closing dateOpen Until Filled Yes Position Number 1128918 Position Title Research Cyberinfrastructure Engineer II, HPC and GPU Cluster (RCIEII) Department this Position Reports to Research Cyberinfrastructure Hiring Range Minimum $99,400 Hiring Range Maximum $114,300 Union Type Not a Union Position SEIU Level Not an SEIU Position FLSA Status Exempt Employment Category Regular Full Time w/end date Scheduled Months per Year 12 Scheduled Hours per Week 40 ScheduleM-F, 8a-5p Location of PositionHanover, NH Remote Work Eligibility? Hybrid Is this a term position? Yes If yes, length of term in months. 36 Is this a grant funded position? No Position PurposeThe Research Cyberinfrastructure Engineer II ( RCIEII ) enhances research computing infrastructure, focusing on administration, High-Performance Computing ( HPC ), cloud, and advanced computing solutions. Responsibilities include building and maintaining a graphical processing unit ( GPU ) cluster primarily used for artificial intelligence (AI) and machine learning (ML) workloads. This role increases infrastructure security, availability, and scalability, leading automation and system optimization initiatives to advance research capabilities. The RCIEII provides advanced support, develops innovative solutions, and leads projects to enhance research success. DescriptionJoin Our Team as a Research Cyberinfrastructure Engineer II, HPC and GPU Cluster at Dartmouth!Are you ready to enhance the future of research computing? Dartmouth is looking for a dynamic Research Cyberinfrastructure Engineer II ( RCIEII ) to innovate and lead in HPC and GPU cluster administration. About The Role:As an RCIEII , you will enhance research computing infrastructure, focusing on building and maintaining a GPU cluster for AI and ML workloads. You will ensure infrastructure security, availability, and scalability while leading automation and system optimization initiatives. What You’ll Do:Lead Projects: Manage and optimize HPC environments and cloud-based infrastructures, focusing on high availability and performance. Innovate: Implement cutting-edge computing services and applications, integrating GPU technologies into HPC environments. Collaborate: Build strategic partnerships with IT departments, technology providers, and research groups to foster collaboration. Mentor and Train: Create knowledge-sharing platforms, coordinate hackathons and workshops, and promote continuous development. Your Skills And Expertise:
Bachelor’s degree in Computer Science/IT or equivalent experience.
3 years in research computing, focusing on HPC system optimization and security.
Proficiency in scripting (Python, Bash) and automation tools (Ansible, Terraform).
Expertise in Linux, Windows server management, and container technologies (Docker, Kubernetes).
Skilled in cloud platforms ( AWS , Azure, Google Cloud) and HPC software deployment.
Why Dartmouth?Impactful Work: Contribute to groundbreaking research and innovative projects. Collaborative Environment: Work with a diverse and interdisciplinary team of experts. Professional Growth: Continuous learning and professional development opportunities. Join Us:Be a part of a team driving innovation in research computing. Apply now to lead the future of research cyberinfrastructure at Dartmouth! Required Qualifications - Education and Yrs Exp Bachelors plus 3-5 years' experience or equivalent combination of education and experience Required Qualifications - Skills, Knowledge And Abilities
Bachelor’s degree or equivalent experience in Computer Science/IT.
3 years in research computing, focusing on HPC system optimization and security.
Proficient in scripting (Python, Bash) and automation tools.
Proven project success in enhancing research computing environments.
Expertise in Linux and Windows server management.
Experienced in Docker and Kubernetes.
Familiar with Ansible, Terraform, Puppet for automation.
Strong analytical and problem-solving skills.
Skilled in cloud platforms ( AWS , Azure, Google Cloud).
Effective communication and teamwork skills.
Leadership experience in mentoring and team development.
Preferred Qualifications
Advanced degree or certifications in relevant fields.
Expertise in AI/ML software and frameworks.
Experience with CUDA programming and/or C/C .
Professional certifications (e.g., AWS Certified Solutions Architect, Google Cloud Professional Cloud Architect).
Experience in academic/research IT environments.
Hands-on data center operations experience.
Proficient in HPC software deployment and troubleshooting.
Skilled in cloud services for HPC workloads.
Experience in developing and maintaining infrastructure documentation.
Innovative in developing new services and applications.
Comprehensive understanding of security in computing environments.
Excellent troubleshooting skills using command-line tools and vendor support.
Department Contact for Recruitment Inquiries Jonathan Kulp Department Contact Phone Number 603.646.6110 Department Contact for Cover Letter and Title Elijah Gagne Department Contact's Phone Number 603.646.9650 Equal Opportunity EmployerDartmouth College is an equal opportunity/affirmative action employer with a strong commitment to diversity and inclusion. We prohibit discrimination on the basis of race, color, religion, sex, age, national origin, sexual orientation, gender identity or expression, disability, veteran status, marital status, or any other legally protected status. Applications by members of all underrepresented groups are encouraged. Background CheckEmployment in this position is contingent upon consent to and successful completion of a pre-employment background check, which may include a criminal background check, reference checks, verification of work history, conduct review, and verification of any required academic credentials, licenses, and/or certifications, with results acceptable to Dartmouth College. A criminal conviction will not automatically disqualify an applicant from employment. Background check information will be used in a confidential, non-discriminatory manner consistent with state and federal law. Is driving a vehicle (e.g. Dartmouth vehicle or off road vehicle, rental car, personal car) an essential function of this job? Not an essential function Special Instructions to ApplicantsThis position is a 36-month term position. Dartmouth College has a Tobacco-Free Policy. Smoking and the use of tobacco-based products (including smokeless tobacco) are prohibited in all facilities, grounds, vehicles or other areas owned, operated or occupied by Dartmouth College with no exceptions. For details, please see our policy. https://policies.dartmouth.edu/policy/tobacco-free-policyAdditional InstructionsQuick Link https://searchjobs.dartmouth.edu/postings/74282 Key Accountabilities DescriptionCyberinfrastructure Operations
Integrates GPU technologies into HPC environments, collaborating with researchers and HPC programmers.
Acts as a Subject Matter Expert ( SME ) in cloud services, HPC , automation, storage, and container technologies (e.g., Docker, Kubernetes), providing advanced support and consultancy.
Manages and optimizes HPC environments and cloud-based infrastructures, focusing on high availability, efficient load balancing, and performance across platforms such as AWS and GCP .