What are the responsibilities and job description for the Senior Data Engineer position at Incedo Inc.?
We are seeking a skilled and experienced Senior Data Engineer to join our dynamic data engineering team. This role will focus on designing, developing, and maintaining large-scale data processing systems, using Hadoop technologies and ETL tools like Informatica to process and analyze complex data.
As a Senior Data Engineer, you will be responsible for building and optimizing our data pipelines, working with large datasets, and ensuring that the data infrastructure is robust, scalable, and reliable.
Key Responsibilities:
- Data Architecture and Design: Design and implement scalable data architectures and pipelines using Hadoop, Spark, and other big data technologies.
- ETL Development: Build and manage ETL processes using Informatica, automating data workflows and ensuring smooth data transformation and loading.
- Big Data Technologies: Work with Hadoop ecosystem tools (Hive, HDFS, Pig, Sqoop, Spark) to process and manage vast amounts of structured and unstructured data.
- Data Integration: Integrate data from various sources into centralized data repositories (data lakes, data warehouses) ensuring data quality, integrity, and security.
- Performance Optimization: Optimize data pipelines and query performance for faster data processing and analysis.
- Collaboration: Work closely with data scientists, analysts, and other engineers to ensure data availability, accessibility, and quality for reporting and analytics.
- Troubleshooting: Identify and resolve data issues related to ingestion, transformation, and integration, ensuring minimal downtime.
- Documentation & Best Practices: Maintain clear documentation for data processes and ensure adherence to best practices and coding standards.
Required Skills & Qualifications:
- Experience: 10 years of experience in data engineering or related field, with at least 5-6 years of experience working with Hadoop and Informatica.
- Hadoop Ecosystem: Solid knowledge and hands-on experience with Hadoop ecosystem tools (Hive, HDFS, Pig, Spark, etc.).
- ETL Development: Proficiency in using Informatica or other ETL tools for building, deploying, and managing data integration processes.
- SQL & Data Modeling: Strong SQL skills, with experience in designing and optimizing data models for high-performance queries.
- Data Warehousing: Familiarity with data warehousing concepts, design, and best practices.
- Cloud Experience: Experience with cloud platforms (AWS, Azure, Google Cloud) is a plus.
- Programming: Proficiency in at least one programming language, such as Python, Java, or Scala, for scripting and automation.
- Problem-Solving: Excellent problem-solving skills and ability to troubleshoot complex data issues.
- Communication: Strong verbal and written communication skills to collaborate effectively with both technical and non-technical stakeholders.
- Education: Bachelor’s degree in Computer Science, Information Technology, Engineering, or a related field (Master’s degree is a plus).
Preferred Skills:
- Data Lakes: Experience in building or managing data lakes using Hadoop or cloud technologies.
- Streaming Data: Experience with real-time data streaming technologies (Apache Kafka, Spark Streaming).
- Machine Learning: Basic knowledge of machine learning algorithms or working with data science teams.