What are the responsibilities and job description for the Data Lake Architect [Locals to MI are highly preferred] position at Saransh Inc?
Role: Data Lake Architect
Location: Auburn Hills, MI (Onsite from Day 1) – Locals are highly preferred
Job Type: Contract
Job Requirements:
- Minimum of 10 years’ experience in advanced technologies including a minimum of 5 years as data lake admin/architect.
- Manage and maintain Data Lake clusters infrastructure on premise and in cloud: installation, configuration, performance tuning and monitoring of Hadoop clusters.
- Minimum 5 years work experience in Hadoop ecosystems (Horton HDP or Cloudera’s CDP).
- Should demonstrate strong concepts in Unix/Linux, Windows OS, cloud platforms (AWS, GCP), Kubernetes, Open Shift & Docker.
- Must have good exposure to Cloudera manager, Cloudera Navigator or similar cluster management tool.
- Collaborate and assist developers in successful implementation of their code, monitor and fine tune their process for optimum resource utilization on cluster, ability to automate run time process.
- Must have good knowledge of HDFS, Ranger/Sentry, Hive, Impala, Spark, HBase, Kudu, Kafka, Kafka Connect, Schema Registry, Ni-Fi, Sqoop and other Hadoop related services.
- Exposure to Data Science collaborative tools such as data science workbench, CML, anaconda, etc.
- Strong Networking concepts: topology, proxy, F5, firewall.
- Strong security concepts: Active directory ,Kerberos, LDAP, SAML, SSL, data encryption @rest.
- Programming language concepts: Java, Perl, python, PySpark and Unix shell scripting.
- Possess experience in cluster management, perform cluster upgrade, migration, and testing.
- Perform periodic updates to cluster and keeping the stack current.
- Ability to expand clusters by adding new nodes and rebalance cluster storage systems.
- Manage application databases, application integration, users, roles, permissions within cluster.
- Collaborate with OpenShift, Unix, network, database and security teams on cluster related matters.
Technical Experience:
- Solid experience in Cloudera data lake environments both on prem and cloud.
- Solid experience in administration and set up including security topics related to a data lake.
- Strong experience architecting and designing solutions for new business needs.
- Thorough understanding and hands-on experience with implementing robust logging and tracing implementation for end-to-end systems traceability.
- Familiarity with Cloudera’s BDR tool to perform and monitor backups of critical data and able to restore data when in need.
- Willing and ready to get hands on code development with dev team for developing and troubleshooting, doing quick proof of concepts for exploring new solutions, products etc.
- Experience in tuning and optimizing Hadoop environment in keeping clusters healthy and available for end users and applications with maximum cluster uptime as defined in SLA.
- Deep knowledge and related experience with Hadoop and its ecosystem components i.e., HDFS, Yarn, Hive, MapReduce, Pig, Sqoop, Oozie, Kafka, Spark, Presto and other Hadoop components.