Demo

Data Engineer – AI & ML

Theron Solutions
San Francisco, CA Full Time
POSTED ON 4/22/2025
AVAILABLE BEFORE 5/21/2025

Job Description

Location: San Francisco, CA


Responsibilities:


1.Design and Build Data Pipelines:

•Develop, construct, test, and maintain data pipelines to extract, transform, and load (ETL) data from various sources to data warehouses or data lakes.

•Ensure data pipelines are efficient, scalable, and maintainable, enabling seamless data flow for downstream analysis and modeling.

•Work with stakeholders to identify data requirements and implement effective data processing solutions.

2. Data Integration:

•Integrate data from multiple sources such as internal databases, external APIs, third-party vendors, and flat files.

•Collaborate with business teams to understand data needs and ensure data is structured properly for reporting and analytics.

•Build and optimize data ingestion systems to handle both real-time and batch data processing.

3. Data Storage and Management:

•Design and manage data storage solutions (e.g., relational databases, NoSQL databases, data lakes, cloud storage) that support large-scale data processing.

•Implement best practices for data security, backup, and disaster recovery, ensuring that data is safe, recoverable, and complies with relevant regulations.

•Manage and optimize storage systems for scalability and cost efficiency.

4. Data Transformation:

•Develop data transformation logic to clean, enrich, and standardize raw data, ensuring it is suitable for analysis.

•Implement data transformation frameworks and tools, ensuring they work seamlessly across different data formats and sources.

•Ensure the accuracy and integrity of data as it is processed and stored.

5. Automation and Optimization:

•Automate repetitive tasks such as data extraction, transformation, and loading to improve pipeline efficiency.

•Optimize data processing workflows for performance, reducing processing time and resource consumption.

•Troubleshoot and resolve performance bottlenecks in data pipelines.

6. Collaboration with Data Teams:

•Work closely with Data Scientists, Analysts, and business teams to understand data requirements and ensure the correct data is available and accessible.

•Assist Data Scientists with preparing datasets for model training and deployment.

•Provide technical expertise and support to ensure the integrity and consistency of data across all projects.

7. Data Quality Assurance:

•Implement data validation checks to ensure data accuracy, completeness, and consistency throughout the pipeline.

•Develop and enforce data quality standards to detect and resolve data issues before they affect analysis or reporting.

•Monitor and improve data quality by identifying areas for improvement and implementing solutions.

8. Monitoring and Maintenance:

•Set up monitoring and logging for data pipelines to detect and alert for issues such as failures, data mismatches, or delays.

•Perform regular maintenance of data pipelines and storage systems to ensure optimal performance.

•Update and improve data systems as required, keeping up with evolving technology and business needs.

9. Documentation and Reporting:

•Document data pipeline designs, ETL processes, data schemas, and transformation logic for transparency and future reference.

•Create reports on the performance and status of data pipelines, identifying areas of improvement or potential issues.

•Provide guidance to other teams regarding the usage and structure of data systems.

10. Stay Updated with Technology Trends:

•Continuously evaluate and adopt new tools, technologies, and best practices in data engineering and big data systems.

•Participate in industry conferences, webinars, and training to stay current with emerging trends in data engineering and cloud computing.

Qualifications: (Please list all required qualifications) Click here to enter text.

(Rationalizes basic requirements for candidates to apply. Helps w/rationalization when


Requirements: -

1. Educational Background:

Bachelor’s or Master’s degree in Computer Science, Information Technology, Data Engineering, or a related field


2. Technical Skills:

•Proficiency in programming languages such as Python, Java, or Scala for data processing.

•Strong knowledge of SQL and relational databases (e.g., MySQL, PostgreSQL, MS SQL Server).

•Experience with NoSQL databases (e.g., MongoDB, Cassandra, HBase).

•Familiarity with data warehousing solutions (e.g., Amazon Redshift, Google BigQuery, Snowflake).

•Hands-on experience with ETL frameworks and tools (e.g., Apache NiFi, Talend, Informatica, Airflow).

•Knowledge of big data technologies (e.g., Hadoop, Apache Spark, Kafka).

•Experience with cloud platforms (AWS, Azure, Google Cloud) and related services for data storage and processing.

•Familiarity with containerization and orchestration tools (e.g., Docker, Kubernetes) for building scalable data systems.

•Knowledge of version control systems (e.g., Git) and collaboration tools (e.g., Jira, Confluence).

•Understanding data modeling concepts (e.g., star schema, snowflake schema) and how they relate to data warehousing and analytics.

•Knowledge of data lakes, data warehousing architecture, and how to design efficient and scalable storage solutions.


3.Soft Skills:

•Strong problem-solving skills with an ability to troubleshoot complex data issues.

•Excellent communication skills, with the ability to explain technical concepts to both technical and non-technical stakeholders.

•Strong attention to detail and a commitment to maintaining data accuracy and integrity.

•Ability to work effectively in a collaborative, team-based environment.


4.Experience:

-3 years of experience in data engineering, with hands-on experience in building and maintaining data pipelines and systems.

- Proven track record of implementing data engineering solutions at scale, preferably in large or complex environments.

- Experience working with data governance, compliance, and security protocols.


5.Preferred Qualifications**:

-Experience with machine learning and preparing data for AI/ML model training.

-Familiarity with stream processing frameworks (e.g., Apache Kafka, Apache Flink).

-Certification in cloud platforms (e.g., AWS Certified Big Data – Specialty, Google Cloud Professional Data Engineer).

-Experience with DevOps practices and CI/CD pipelines for data systems.

-Experience with automation and orchestration tools (e.g., Apache Airflow, Luigi).

-Familiarity with data visualization and reporting tools (e.g., Tableau, Power BI) to support analytics teams


6.Work Environment:

•Collaborative and fast-paced work environment.

•Opportunity to work with state-of-the-art technologies.

•Supportive and dynamic team culture


EOE: Our client is an Equal Opportunity Employer and Prohibits Discrimination and Harassment of Any Kind: We are committed to the principle of equal employment opportunity for all employees and to providing employees with a work environment free of discrimination and harassment. All employment decisions at our client are based on business needs, job requirements, and individual qualifications, without regard to race, color, religion or belief, national, social or ethnic origin, sex (including pregnancy), age, physical, mental or sensory disability, HIV Status, sexual orientation, gender identity and/or expression, marital, civil union or domestic partnership status, past or present military service, family medical history or genetic information, family or parental status, or any other status protected by the laws or regulations in the locations where we operate. We will not tolerate discrimination or harassment based on any of these characteristics.

If your compensation planning software is too rigid to deploy winning incentive strategies, it’s time to find an adaptable solution. Compensation Planning
Enhance your organization's compensation strategy with salary data sets that HR and team managers can use to pay your staff right. Surveys & Data Sets

What is the career path for a Data Engineer – AI & ML?

Sign up to receive alerts about other jobs on the Data Engineer – AI & ML career path by checking the boxes next to the positions that interest you.
Income Estimation: 
$92,929 - $122,443
Income Estimation: 
$122,257 - $154,284
Income Estimation: 
$92,929 - $122,443
Income Estimation: 
$122,257 - $154,284
Income Estimation: 
$122,257 - $154,284
Income Estimation: 
$143,391 - $179,890
Income Estimation: 
$168,522 - $211,152
Income Estimation: 
$189,259 - $248,928
Income Estimation: 
$71,122 - $96,652
Income Estimation: 
$92,929 - $122,443
View Core, Job Family, and Industry Job Skills and Competency Data for more than 15,000 Job Titles Skills Library

Job openings at Theron Solutions

Theron Solutions
Hired Organization Address Carrollton, GA Full Time
Company Overview: Our team is dedicated to helping our clients achieve compliance with regulatory requirements and ensur...
Theron Solutions
Hired Organization Address Edison, NJ Full Time
Job Description: The Head of Communications will be responsible for developing and executing comprehensive communication...
Theron Solutions
Hired Organization Address Mc Lean, VA Full Time
Must have: • Minimum of 10 years' experience in software development and architecture. • 5 years of IAM • Hands-on exper...
Theron Solutions
Hired Organization Address Boston, MA Full Time
Theron Solutions is looking for a Clinic Nurse Director in Boston, MA. This local job opportunity with ID 3063640360 is ...

Not the job you're looking for? Here are some other Data Engineer – AI & ML jobs in the San Francisco, CA area that may be a better fit.

AI/ML Platform Engineer

NTT DATA Group Corporation, San Francisco, CA

Data Engineer (AI & ML)

Techaxis, Inc, San Francisco, CA

AI Assistant is available now!

Feel free to start your new journey!