What are the responsibilities and job description for the HPC System Engineer position at Magicforce?
Job Details
We are seeking a highly skilled and experienced Senior HPC Specialist to design, implement, and maintain high-performance computing systems and solutions. The candidate will play a critical role in optimizing computational performance, ensuring the reliability of the infrastructure, and supporting advanced computational workloads.
Key Responsibilities:
HPC System Design and Implementation:
Design and deploy HPC clusters, including compute, storage, and networking components.
Evaluate and implement new HPC technologies to improve system performance and scalability.
System Administration and Maintenance:
Manage Linux-based HPC systems, including job schedulers (e.g., Slurm, PBS, or Grid Engine).
Monitor system health and resolve performance bottlenecks or failures.
Ensure uptime and optimal configuration of HPC resources.
Performance Optimization:
Fine-tune applications and workloads for optimal performance on HPC systems.
Analyze job performance and provide recommendations to users for improvements.
Storage and Data Management:
Manage large-scale parallel file systems (e.g., Lustre, GPFS, or BeeGFS).
Optimize data transfer and storage strategies for high-throughput workloads.
User Support and Collaboration:
Provide technical support and training to researchers and end users.
Collaborate with interdisciplinary teams to understand computational requirements.
Security and Compliance:
Ensure HPC systems adhere to security best practices and compliance standards.