What are the responsibilities and job description for the Senior Java Spark Developer position at Enexus Global?
Job Details
Job Summary:
We are seeking a Senior Java Spark Developer with expertise in Java, Apache Spark, and the Cloudera Hadoop Ecosystem to design and develop large-scale data processing applications. The ideal candidate will have strong hands-on experience in Java-based Spark development, distributed computing, and performance optimization for handling big data workloads.
Key Responsibilities:
Java & Spark Development:
- Develop, test, and deploy Java-based Apache Spark applications for large-scale data processing.
- Optimize and fine-tune Spark jobs for performance, scalability, and reliability.
- Implement Java-based microservices and APIs for data integration.
Big Data & Cloudera Ecosystem:
- Work with Cloudera Hadoop components such as HDFS, Hive, Impala, HBase, Kafka, and Sqoop.
- Design and implement high-performance data storage and retrieval solutions.
- Troubleshoot and resolve performance bottlenecks in Spark and Cloudera platforms.
Collaboration & Data Engineering:
- Collaborate with data scientists, business analysts, and developers to understand data requirements.
- Implement data integrity, accuracy, and security best practices across all data processing tasks.
- Work with Kafka, Flume, Oozie, and Nifi for real-time and batch data ingestion.
Software Development & Deployment:
- Implement version control (Git) and CI/CD pipelines (Jenkins, GitLab) for Spark applications.
- Deploy and maintain Spark applications in cloud or on-premises Cloudera environments.
Required Skills & Experience:
- Strong problem-solving skills, attention to detail, and ability to work in a fast-paced environment.
- 8 years of experience in application development, with a strong background in Java and Big Data processing.
- Strong hands-on experience in Java, Apache Spark, and Spark SQL for distributed data processing.
- Proficiency in Cloudera Hadoop (CDH) components such as HDFS, Hive, Impala, HBase, Kafka, and Sqoop.
- Experience building and optimizing ETL pipelines for large-scale data workloads.
- Hands-on experience with SQL & NoSQL databases like HBase, Hive, and PostgreSQL.
- Strong knowledge of data warehousing concepts, dimensional modeling, and data lakes.
- Proven ability to troubleshoot and optimize Spark applications for high performance.
- Familiarity with version control tools (Git, Bitbucket) and CI/CD pipelines (Jenkins, GitLab).
- Exposure to real-time data streaming technologies like Kafka, Flume, Oozie, and Nifi.