What are the responsibilities and job description for the Hadoop Developer position at AllianceIT Inc?
Job Title: Hadoop Developer
Location: [Plano, TX / Newark, DE / Charlotte, NC / Pennington, NJ / Atlanta, GA]
Job Type: (Full-time) Onsite
- Design, develop, and maintain scalable and efficient data processing applications using the Hadoop ecosystem (HDFS, MapReduce, Hive, Pig, etc.).
- Work with large datasets to build data pipelines that can process and analyze massive amounts of data efficiently.
- Integrate data from various sources (structured, semi-structured, and unstructured) into the Hadoop platform.
- Ensure the performance, scalability, and reliability of Hadoop-based data solutions.
- Collaborate with data engineers and analysts to understand business requirements and translate them into data solutions.
- Implement and maintain ETL processes, ensuring data quality, accuracy, and timeliness.
- Write and optimize complex queries using Hive, HBase, or similar Hadoop-related technologies.
- Troubleshoot and resolve performance issues in Hadoop applications and clusters.
- Work with cloud platforms (AWS, Azure, GCP) to implement big data solutions on cloud-based environments.
- Stay up to date with the latest trends and best practices in big data technologies.
- Participate in code reviews, collaborate with team members, and contribute to maintaining high-quality code standards.
Required Skills & Qualifications:
- Bachelor’s degree in Computer Science, Engineering, or a related field (or equivalent experience).
- Proven experience (3 years) working as a Hadoop Developer or in a similar role within the big data ecosystem.
- Strong experience with Hadoop components, including HDFS, MapReduce, Hive, HBase, Pig, and others.
- Proficient in programming languages like Java, Scala, or Python.
- Experience with Apache Spark for real-time data processing and analysis.
- Familiarity with data warehousing solutions and SQL.
- Solid understanding of data modeling and ETL processes in big data environments.
- Experience with distributed computing, fault tolerance, and high-availability principles.
- Hands-on experience with cloud platforms (AWS, Azure, GCP) and their big data services (e.g., Amazon EMR, Google Dataproc).
- Strong problem-solving and troubleshooting skills.
- Good communication skills with the ability to collaborate with cross-functional teams.