What are the responsibilities and job description for the Data Engineer with Data Hub position at Avanciers?

Role :: Data Engineer Data Hub

Location :: Austin, TX (onsite Role)

Full-time

Client Note - We are looking for a person who has worked on “DataHub” tool which is used for data lineage.

These resumes have a single word of data hub in entire resume which in context of data warehouse that’s not relevant.

Keys kill: Customization on Data Hub Java Experience.

Development of Data Hub using Java, Good Data Cataloguing experience

Job Description:

Directed projects involving data cataloging using the DataHub open-source framework, anomaly detection through machine learning models, and Spark based framework.

- Ingested metadata from multiple systems to pull metadata information of assets from data lake, upstream and downstream systems.

- Developed custom API solutions that can bring data of ETL pipelines as a push mechanism to DataHub. This enriched the impact analysis to identify the data pipelines reading/writing to a data asset.

- Provided a holistic picture of end-to-end lineage that helped with PII identification, Governance, Impact analysis.

- Improved the performance of Spark-based applications, ensuring seamless functionality.

- Provided recommendations on design and development of ETL pipelines using Spark. Developed and maintained Spark based client custom framework used for config-as-code mechanism of data enrichment and transfer.

- Successfully supported Spark version upgrades and executed AWS cost optimization initiatives for platform-wide efficiency.

- Worked with ML engineers to create features from profiled batch data; and identify anomalies in data patterns.

Ideal candidate could be –

An experienced Data Engineer with a strong background in data lineage, data cataloging, and custom tool development using DataHub and Java. Expertise in utilizing the DataHub open-source framework for data cataloging, metadata ingestion, and end-to-end lineage visualization. Proficient in the development of custom APIs to integrate ETL pipelines with DataHub, enriching impact analysis and enabling seamless identification of data flow across systems.

Core Skills & Expertise:

In-depth knowledge of DataHub for data lineage, metadata management, and anomaly detection.
Java development expertise for creating custom API solutions and enhancing DataHub functionality.
Hands-on experience with Spark for data processing, performance optimization, and framework development.
Strong background in ETL pipeline development and optimization, particularly with Spark and custom config-as-code mechanisms.
Proficient in working with AWS for platform optimization and cost reduction.
Experience working alongside ML engineers to profile and analyze batch data, creating features and detecting anomalies in data patterns.
Ability to visualize and maintain a holistic picture of end-to-end data lineage, facilitating PII identification, governance, and impact analysis.

QA Automation Engineer

Luna Data Solutions, Inc. -

Austin, TX

View Job Details

Azure Platform Engineer

Luna Data Solutions, Inc. -

Austin, TX

View Job Details

Senior System Engineer

Luna Data Solutions, Inc. -

Austin, TX

View Job Details

Apply for this job

Receive alerts for other Data Engineer with Data Hub job openings

Data Engineer with Data Hub

What are the responsibilities and job description for the Data Engineer with Data Hub position at Avanciers?

What is the career path for a Data Engineer with Data Hub?

Job openings at Avanciers

Not the job you're looking for? Here are some other Data Engineer with Data Hub jobs in the Austin, TX area that may be a better fit.

We don't have any other Data Engineer with Data Hub jobs in the Austin, TX area right now.

AI Assistant is available now!