What are the responsibilities and job description for the Data Engineer (AI/ML Infrastructure) - Hybrid Role - W2 Only position at Dartz IT Solutions?
Job Details
Job Title: Data Engineer (AI/ML Infrastructure)
Long term contract
Hybrid role- weekly 3 days onsite
Location: Irving, TX
Job Description:
We are seeking an experienced Data Engineer to join our team and help build and
maintain scalable data pipelines and infrastructure for AI/ML applications. You will play a
critical role in ensuring that our data architecture is robust, efficient, and optimized for
machine learning workflows. You will work closely with data scientists, analysts, and
other engineering teams to ensure the seamless flow of high-quality data from various
sources into our AI/ML systems.
Key Responsibilities:
Design, build, and maintain scalable data pipelines to support AI/ML applications
and business analytics.
Develop and optimize SQL queries for data extraction, transformation, and loading
(ETL).
Create and manage data models that enable efficient analysis and reporting.
Work with database and data warehouse systems such as BigQuery and
Snowflake to ensure efficient data storage, access, and retrieval.
Implement robust processes for data export and import, ensuring seamless
integration of data from diverse sources.
Design, implement, and automate data workflows that support AI/ML model training,
testing, and production deployment.
Collaborate with data scientists and ML engineers to structure and transform data for
model development and deployment.
Ensure the data pipelines are optimized for performance, scalability, and cost
efficiency.
Implement data quality checks to ensure the integrity and accuracy of data in AI/ML
models.
Troubleshoot and resolve issues related to data ingestion, storage, and processing.
Continuously monitor the performance of data pipelines and infrastructure, making
improvements as necessary.
Stay current with emerging technologies and tools in data engineering, AI, and
machine learning.
Job Description:
We are seeking an experienced Data Engineer to join our team and help build and
maintain scalable data pipelines and infrastructure for AI/ML applications. You will play a
critical role in ensuring that our data architecture is robust, efficient, and optimized for
machine learning workflows. You will work closely with data scientists, analysts, and
other engineering teams to ensure the seamless flow of high-quality data from various
sources into our AI/ML systems.
Key Responsibilities:
Design, build, and maintain scalable data pipelines to support AI/ML applications
and business analytics.
Develop and optimize SQL queries for data extraction, transformation, and loading
(ETL).
Create and manage data models that enable efficient analysis and reporting.
Work with database and data warehouse systems such as BigQuery and
Snowflake to ensure efficient data storage, access, and retrieval.
Implement robust processes for data export and import, ensuring seamless
integration of data from diverse sources.
Design, implement, and automate data workflows that support AI/ML model training,
testing, and production deployment.
Collaborate with data scientists and ML engineers to structure and transform data for
model development and deployment.
Ensure the data pipelines are optimized for performance, scalability, and cost
efficiency.
Implement data quality checks to ensure the integrity and accuracy of data in AI/ML
models.
Troubleshoot and resolve issues related to data ingestion, storage, and processing.
Continuously monitor the performance of data pipelines and infrastructure, making
improvements as necessary.
Stay current with emerging technologies and tools in data engineering, AI, and
machine learning.
Required Skills & Qualifications:
Proven experience in SQL for querying, transforming, and managing large datasets.
Expertise in data modeling for efficient storage and retrieval of data in data
warehouses.
Strong experience with BigQuery, Snowflake, or similar cloud-based data
warehouse systems.
Familiarity with data integration and migration techniques for data export and
import across various platforms.
Experience with ETL processes and the design of scalable data pipelines.
Ability to design and implement automated data workflows to support machine
learning and AI systems.
Knowledge of cloud platforms (Google Cloud, AWS, Azure) for data engineering and
storage.
Strong problem-solving and troubleshooting skills, with an ability to optimize data
processes.
Experience with data governance, security, and compliance in handling large-scale
data.
Preferred Qualifications:
Familiarity with AI/ML tools and frameworks (e.g., TensorFlow, PyTorch) and how
they integrate with data pipelines.
Experience with Apache Spark, Apache Kafka, or similar data processing
frameworks.
Familiarity with containerization and orchestration tools (e.g., Docker, Kubernetes)
for managing data workflows.
Knowledge of Python, Java, or other programming languages commonly used in
data engineering.
Experience with DevOps practices and CI/CD pipelines for data engineering
workflows.
Proven experience in SQL for querying, transforming, and managing large datasets.
Expertise in data modeling for efficient storage and retrieval of data in data
warehouses.
Strong experience with BigQuery, Snowflake, or similar cloud-based data
warehouse systems.
Familiarity with data integration and migration techniques for data export and
import across various platforms.
Experience with ETL processes and the design of scalable data pipelines.
Ability to design and implement automated data workflows to support machine
learning and AI systems.
Knowledge of cloud platforms (Google Cloud, AWS, Azure) for data engineering and
storage.
Strong problem-solving and troubleshooting skills, with an ability to optimize data
processes.
Experience with data governance, security, and compliance in handling large-scale
data.
Preferred Qualifications:
Familiarity with AI/ML tools and frameworks (e.g., TensorFlow, PyTorch) and how
they integrate with data pipelines.
Experience with Apache Spark, Apache Kafka, or similar data processing
frameworks.
Familiarity with containerization and orchestration tools (e.g., Docker, Kubernetes)
for managing data workflows.
Knowledge of Python, Java, or other programming languages commonly used in
data engineering.
Experience with DevOps practices and CI/CD pipelines for data engineering
workflows.