Job Description | Infrastructure Architect - Basic HPC security
- Implementing and maintaining data management infrastructure
- Providing support for SAN and NAS storage, backup/recovery environments and virtualization infrastructure by implementing, managing, and monitoring the hardware and software
- Playing a major role in the security, disaster recovery and services continuity of a highly available enterprise storage and backup infrastructure by following established procedures and compliance requirements
- Technical support (installation, configuration, maintenance, upgrade, retirement, troubleshooting).
- Configuration management using frameworks such as Ansible, Puppet, and Chef.
- Administration of high-speed network storage systems including Mellanox switches, and NAS Cluster.
- Managing, configuring, and supporting cloud systems such as setting up, maintaining, and troubleshooting cloud compute engines and storage buckets
- Managing databases(eg:SQL Server, PostGreSQL , MySQL, Oracle)
- Assisting staff to access and utilize computing resources
- Co-ordinating with Labs and DTMB staff on maintaining and managing the computational resources
Skills & Experiences - 10 years experience with the Linux CLI environment and coding languages such as R, Python, Bash
- 10 years experience with workload management systems such as SLURM
- 10 years experience with setting up HPC systems including identifying suitable hardware and software needs
- 10 years experience with setting up and managing databases such as PostgreSQL
- 10 years experience performing System Administration including installation, configuration, and support software, packages, and libraries in various environments
- 10 years experience with Network Appliance clustered servers and applicable software
- 10 years experience with hands-on troubleshooting, issue resolution, discrepancy tracking, and report generation
- 10 years experience with Linux configuration regarding Storage, Networking, Load Balancing, Memory Management, VMs, Firewalls, and System Monitoring
- 10 years experience with computer security
- Knowledge of package management systems such as conda, Docker and Singularity
- Knowledge of automation tools such as Ansible or Puppet and NextFlow
- Experience with cloud computing (setting up compute engines, storage buckets)
- Strong knowledge of enterprise storage solutions
- Familiar with software frameworks used for searching, monitoring, and analyzing big data
- Ability to provide good recommendations, and guidance for storage and cost savings for Labs
- Knowledge and experience in HL7 messaging
- Ability to review and interpret web.config files for plugins and interpret them
- Ability to review logs(for eg:IIS logs, Dynatrace logs, etc) to make sure that there is no excess resource utilization and no peaks or spikes occurring on the web/app server
- Knowledge in ClouFlare, ForcePoint and the related rule(for eg.C86 rule) and the policies
- Ability to understand the existing junction configuration to the application and review those settings in case of a break
- Help the team with setting up Failover environment for Apps
- Help the team to complete the Disaster Recovery(DR) Plan and DR Testing
- Knowledge on CDC hosted apps(preferred, not a requirement)
|