Demo

Databricks ETL Developer / Data Engineer

Cleo Consulting
Ontario, CA Full Time
POSTED ON 1/31/2025
AVAILABLE BEFORE 3/31/2025

Job Details

Job Description

Job Description
Assignment: RQ08659 - Software Developer - ETL - Senior
Job Title: Databricks ETL Developer / Data Engineer
Requisition (SS): RQ08659
Start Date: 2025-02-10
End Date: 2026-03-31
Client: Children, Youth & Social Services Cluster
Office Location: 315 Front Street, Toronto
Organization: Children, Youth & Social Services Cluster
Ministry: Ministry of Children, Community and Social Services
# Business Days: 282.00

Note: Please note this role is part of a Hybrid Work Arrangement and resource(s) will be required to work at a minimum of 2-3 days per week at 5700 Yonge St.

Must Have:

  • 7 years using ETL tools such as Microsoft SSIS, stored procedures, T-SQL
  • 2 Delta Lake, Databricks and Azure Databricks pipelines
  • Strong knowledge of Delta Lake for data management and optimization.
  • Familiarity with Databricks Workflows for scheduling and orchestrating tasks.
  • 2 years Python and PySpark
  • Solid understanding of the Medallion Architecture (Bronze, Silver, Gold) and experience implementing it in production environments.
  • Hands-on experience with CDC tools (e.g., GoldenGate) for managing real-time data.
  • SQL Server, Oracle

Description

General Responsibilities

  • This role is responsible for designing, developing, maintaining, and optimizing ETL (Extract, Transform, Load) processes in Databricks for data warehousing, data lakes, and analytics. The developer will work closely with data architects and business teams to ensure the efficient transformation and movement of data to meet business needs, including handling Change Data Capture (CDC) and streaming data.

Tools used are:

  • Azure Databricks, Delta Lake, Delta Live Tables, and Spark to process structured and unstructured data.
  • Azure Databricks/PySpark (good Python/PySpark knowledge required) to build transformations of raw data into curated zone in the data lake.
  • Azure Databricks/PySpark/SQL (good SQL knowledge required) to develop and/or troubleshoot transformations of curated data into FHIR.

Data design

  • Understand the requirements. Recommend changes to models to support ETL design.
  • Define primary keys, indexing strategies, and relationships that enhance data integrity and performance across layers.
  • Define the initial schemas for each data layer
  • Assist with data modelling and updates of source-to-target mapping documentation
  • Document and implement schema validation rules to ensure incoming data conforms to expected formats and standards
  • Design data quality checks within the pipeline to catch inconsistencies, missing values, or errors early in the process.
  • Proactively communicate with business and IT experts on any changes required to conceptual, logical and physical models, communicate and review timelines, dependencies, and risks.

Development of ETL strategy and solution for different sets of data modules

  • Understand the Tables and Relationships in the data model.
  • Create low level design documents and test cases for ETL development.
  • Implement error-catching, logging, retry mechanisms, and handling data anomalies.
  • Create the workflows and pipeline design.

Development and testing of data pipelines with Incremental and Full Load

  • Develop high quality ETL mappings/scripts/notebooks
  • Develop and maintain pipeline from Oracle data source to Azure Delta Lakes and FHIR
  • Perform unit testing
  • Ensure performance monitoring and improvement

Performance review, data consistency checks

  • Troubleshoot performance issues, ETL issues, log activity for each pipeline and transformation.
  • Review and optimize overall ETL performance.

End-to-end integrated testing for Full Load and Incremental Load

  • Plan for Go Live, Production Deployment.
  • Create production deployment steps.
  • Configure parameters, scripts for go live. Test and review the instructions.
  • Create release documents and help build and deploy code across servers.

Go Live Support and Review after Go Live.

  • Review existing ETL process, tools and provide recommendation on improving performance and reduce ETL timelines.
  • Review infrastructure and remediate issues for overall process improvement

Knowledge Transfer to Ministry staff, development of documentation on the work completed.

  • Document work and share the ETL end-to-end design, troubleshooting steps, configuration and scripts review.
  • Transfer documents, scripts and review of documents to Ministry.

Experience and Skill Set Requirements

Experience:

  • Experience of 7 years of working with SQL Server, T-SQL, Oracle, PL/SQL development or similar relational databases
  • Experience of 2 years of working with Azure Data Factory, Databricks and Python development
  • Experience building data ingestion and change data capture using Oracle Golden Gate
  • Experience in designing, developing, and implementing ETL pipelines using Databricks and related tools to ingest, transform, and store large-scale datasets
  • Experience in leveraging Databricks, Delta Lake, Delta Live Tables, and Spark to process structured and unstructured data.
  • Experience working with building databases, data warehouses and working with delta and full loads
  • Experience on Data modeling, and tools e.g. SAP Power Designer, Visio, or similar
  • Experience working with SQL Server SSIS or other ETL tools, solid knowledge and experience with SQL scripting
  • Experience developing in an Agile environment
  • Understanding data warehouse architecture with a delta lake
  • Ability to analyze, design, develop, test and document ETL pipelines from detailed and high-level specifications, and assist in troubleshooting.
  • Ability to utilize SQL to perform DDL tasks and complex queries
  • Good knowledge of database performance optimization techniques
  • Ability to assist in the requirements analysis and subsequent developments
  • Ability to conduct unit testing and assist in test preparations to ensure data integrity
  • Work closely with Designers, Business Analysts and other Developers
  • Liaise with Project Managers, Quality Assurance Analysts and Business Intelligence Consultants
  • Design and implement technical enhancements of Data Warehouse as required.

Development, Database and ETL experience (60 points)

  • Experience in developing and managing ETL pipelines, jobs, and workflows in Databricks.
  • Deep understanding of Delta Lake for building data lakes and managing ACID transactions, schema evolution, and data versioning.
  • Experience automating ETL pipelines using Delta Live Tables, including handling Change Data Capture (CDC) for incremental data loads.
  • Proficient in structuring data pipelines with the Medallion Architecture to scale data pipelines and ensure data quality.
  • Hands-on experience developing streaming tables in Databricks using Structured Streaming and readStream to handle real-time data.
  • Expertise in integrating CDC tools like GoldenGate or Debezium for processing incremental updates and managing real-time data ingestion.
  • Experience using Unity Catalog to manage data governance, access control, and ensure compliance.
  • Skilled in managing clusters, jobs, autoscaling, monitoring, and performance optimization in Databricks environments.
  • Knowledge of using Databricks Autoloader for efficient batch and real-time data ingestion.
  • Experience with data governance best practices, including implementing security policies, access control, and auditing with Unity Catalog.
  • Proficient in creating and managing Databricks Workflows to orchestrate job dependencies and schedule tasks.
  • Strong knowledge of Python, PySpark, and SQL for data manipulation and transformation.
  • Experience integrating Databricks with cloud storage solutions such as Azure Blob Storage, AWS S3, or Google Cloud Storage.
  • Familiarity with external orchestration tools like Azure Data Factory
  • Implementing logical and physical data models
  • Knowledge of FHIR is an asset

Design Documentation and Analysis Skills (20 points)

  • Demonstrated experience in creating design documentation such as:
  • Schema definitions
  • Error handling and logging
  • ETL Process Documentation
  • Job Scheduling and Dependency Management
  • Data Quality and Validation Checks
  • Performance Optimization and Scalability Plans
  • Troubleshooting Guides
  • Data Lineage
  • Security and Access Control Policies applied within ETL
  • Experience in Fit-Gap analysis, system use case reviews, requirements reviews, coding exercises and reviews.
  • Participate in defect fixing, testing support and development activities for ETL
  • Analyze and document solution complexity and interdependencies including providing support for data validation.
  • Strong analytical skills for troubleshooting, problem-solving, and ensuring data quality.

Certifications (10 points)

  • Certified in one or more of the following certifications:
  • Databricks Certified Data Engineer Associate
  • Databricks Certified Professional Data Engineer
  • Microsoft Certified: Azure Data Engineer Associate
  • AWS Certified Data Analytics - Specialty
  • Google Cloud Professional Data Engineer

Communication, Leadership Skills and Knowledge Transfer (10 points)

  • Ability to collaborate effectively with cross-functional teams and communicate complex technical concepts to non-technical stakeholders.
  • Strong problem-solving skills and experience working in an Agile or Scrum environment.
  • Ability to provide technical guidance and support to other team members on Databricks best practices.
  • Must have previous work experience in conducting Knowledge Transfer sessions, ensuring the resources will receive the required knowledge to support the system.
  • Must develop documentation and materials as part of a review and knowledge transfer to other members.
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

If your compensation planning software is too rigid to deploy winning incentive strategies, it’s time to find an adaptable solution. Compensation Planning
Enhance your organization's compensation strategy with salary data sets that HR and team managers can use to pay your staff right. Surveys & Data Sets

What is the career path for a Databricks ETL Developer / Data Engineer?

Sign up to receive alerts about other jobs on the Databricks ETL Developer / Data Engineer career path by checking the boxes next to the positions that interest you.
Income Estimation: 
$68,745 - $88,154
Income Estimation: 
$87,428 - $116,878
Income Estimation: 
$91,609 - $116,575
Income Estimation: 
$151,182 - $194,086
Income Estimation: 
$190,942 - $250,988
Income Estimation: 
$91,609 - $116,575
Income Estimation: 
$115,838 - $142,817
Income Estimation: 
$114,981 - $143,201
Income Estimation: 
$114,981 - $143,201
Income Estimation: 
$129,640 - $165,363
View Core, Job Family, and Industry Job Skills and Competency Data for more than 15,000 Job Titles Skills Library

Job openings at Cleo Consulting

Cleo Consulting
Hired Organization Address West Jordan, UT Full Time
Client: State of Utah Job ID: 135287 Job Title: Quality Assurance Developer - KSSOW 25011 - Quality Assurance Technician...
Cleo Consulting
Hired Organization Address Ontario, CA Full Time
Job Details Job Description Job Description Assignment: RQ08440 - Project Manager/Leader - Senior Job Title: Project Man...
Cleo Consulting
Hired Organization Address Ontario, CA Full Time
Job Details Job Description Job Description Assignment: RQ08654 - Software Developer - Senior Job Title: Senior Power Pl...
Cleo Consulting
Hired Organization Address Ontario, CA Full Time
Job Details Job Description Job Description Assignment: RQ08436 - Security Specialist - Penetration Testing - Senior Job...

Not the job you're looking for? Here are some other Databricks ETL Developer / Data Engineer jobs in the Ontario, CA area that may be a better fit.

Senior Infrastructure Engineer

Eppo Data, Inc., Riverside, CA

Software Developer - Data Processing

Prime Healthcare Services, Montclair, CA

AI Assistant is available now!

Feel free to start your new journey!