What are the responsibilities and job description for the Cloudera Big Data Administrator position at nTech Solutions?
Job Details
Terms of Employment
- Contract, 12 Months
- This position is predominantly remote. Candidates should be comfortable traveling to Northern Virginia roughly once per month. Travel can be expensed.
- Candidates must be based in Maryland, Washington, DC, Virginia, West Virginia, Pennsylvania, Delaware, New Jersey, New York, North Carolina, Florida, or Texas.
Overview & Responsibilities
This role requires expertise in building and managing Cloudera clusters, with a focus on administration rather than development. The ideal candidate will have experience with Cloudera CDP Public Cloud v7.2.17 or higher and a strong understanding of big data services and ecosystem tools.
Key responsibilities include:
Cluster Setup and Management
- Build and configure Cloudera clusters, including services like NiFi, SOLR, HBase, Kafka, Knox, and others in the cloud.
- Set up High Availability for critical services such as Hue, Hive, HBase REST, SOLR, and Impala on the BDPaaS Platform.
- Monitor and optimize cluster performance using Cloudera Manager.
- Perform incremental updates, upgrades, and expansions to the Cloudera environment, ensuring it meets optimal specifications.
Automation and Monitoring
- Develop and implement shell scripts for health checks and automated responses to service warnings or failures.
- Design and implement big data pipelines and automated data flows using Python/R and NiFi.
- Automate the project lifecycle, including data ingestion and processing workflows.
Collaboration and Troubleshooting
- Work with teams such as Application Development, Security, and Platform Support to implement configuration changes for improved cluster performance.
- Troubleshoot and resolve issues with Kerberos, TLS/SSL, and other workload-related challenges.
- Provide expertise for use cases like analytics/ML, data science, cluster migration, and disaster recovery.
Security and Governance
- Implement and manage comprehensive security policies across the Hadoop cluster using Ranger.
- Support governance, data quality, and documentation efforts.
Database and Workflow Management
- Access databases and metastore tables, writing queries in Hive and Impala using Hue.
- Manage job workflows, monitor resource allocation with YARN, and handle data movement.
- Support the Big Data/Hadoop databases throughout their lifecycle, including query optimization, performance tuning, and resolving integrity issues.
Required Skills & Experience
Cloudera CDP Public Cloud
- Administration and optimization of services such as Hive, Spark, NiFi, and CDSW.
AWS Services
- Proficient in managing AWS services (EC2, S3, EBS, EFS).
Apache Kafka
- Strong skills in administration, troubleshooting, broker management, and integration with IBM MQ.
- Proficient in Kafka Streams API, stream processing with KStreams & KTables, and topic/offset management.
- Experience with Kafka ecosystem (Kafka Brokers, Connect, Zookeeper) in production environments.
Apache NiFi
- Administration of flow management, registry server, controller service, and integrations with Kafka, HBase, and SOLR.
HBase
- Administration, database management, and troubleshooting.
SOLR
- Manage logging levels, shards, collections, and troubleshoot resource-intensive queries.