What are the responsibilities and job description for the High Performance Computing (HPC) Storage Specialist position at Tekgence Inc?
We are seeking a highly skilled and experienced Senior HPC Specialist to design, implement, and maintain high-performance computing systems and solutions.
Candidate will play a critical role in optimizing computational performance, ensuring the reliability of the infrastructure, and supporting advanced computational workloads
HPC System Design and Implementation:
Design and deploy HPC clusters, including compute, storage, and networking components.
Evaluate and implement new HPC technologies to improve system performance and scalability.
System Administration and Maintenance:
Manage Linux-based HPC systems, including job schedulers (e.g., Slurm, PBS, or Grid Engine).
Monitor system health and resolve performance bottlenecks or failures.
Ensure uptime and optimal configuration of HPC resources.
Performance Optimization:
Fine-tune applications and workloads for optimal performance on HPC systems.
Analyze job performance and provide recommendations to users for improvements.
Storage and Data Management:
Manage large-scale parallel file systems (e.g., Lustre, GPFS, or BeeGFS).
Optimize data transfer and storage strategies for high-throughput workloads.
User Support and Collaboration:
Provide technical support and training to researchers and end users.
Collaborate with interdisciplinary teams to understand computational requirements.
Security and Compliance:
Ensure HPC systems adhere to security best practices and compliance standards.
Implement data backup and disaster recovery solutions.