What are the responsibilities and job description for the Data Engineer with GCP position at Vital Force Solutions?
12 month contract
Job Summary
We are seeking a Senior Data Engineer to lead and execute the design, development, and maintenance of scalable data pipelines, data workflows, and machine learning feature engineering processes. The ideal candidate will have extensive experience with SQL, NoSQL, Kafka, GCP services, and data pipeline development, as well as a proven track record in optimizing data solutions for performance and scalability. This role requires a passion for driving innovation and continuous improvement while mentoring junior team members in best practices.
Key Responsibilities
Provide Technical Leadership:
Offer guidance and leadership to the team, ensuring clarity and alignment between ongoing projects.
Facilitate collaboration across teams to solve complex data engineering challenges.
Promote best practices in data engineering to ensure consistency and quality across all initiatives.
Build And Maintain Data Pipelines
Design, build, and maintain efficient, scalable, and reliable data pipelines to support data ingestion, transformation, and integration across multiple data sources and destinations.
Utilize tools like Kafka, Databricks, and other related technologies to ensure smooth data flow across systems.
Leverage GCP services such as BigQuery, Cloud Storage, Vertex AI, AutoMLOps, and Dataflow for data processing and analytics.
Drive Digital Innovation
Innovate and modernize data engineering approaches, focusing on extending core data assets (e.g., SQL-based, NoSQL-based, cloud-based, and real-time streaming data platforms).
Promote the use of cutting-edge technologies to improve the efficiency and performance of data workflows.
Implement Feature Engineering
Develop and manage feature engineering pipelines for machine learning workflows, utilizing tools like Vertex AI, BigQuery ML, and custom Python libraries.
Collaborate with data scientists to ensure that the data is transformed and prepared effectively for machine learning models.
Implement Automated Testing
Design and implement automated unit, integration, and performance testing frameworks to ensure high-quality, reliable, and scalable data solutions.
Ensure data workflows are tested for accuracy, reliability, and compliance with organizational standards.
Optimize Data Workflows
Optimize data workflows for performance, cost efficiency, and scalability in large, complex data environments.
Continuously monitor and improve the performance of data pipelines to handle large datasets effectively.
Mentor Team Members
Mentor junior team members on best practices in data engineering, guiding them on data principles, patterns, and processes.
Foster a collaborative environment that encourages skill development and knowledge sharing.
Draft And Review Documentation
Draft and review architectural diagrams, interface specifications, and other design documents to ensure clear communication and understanding of technical solutions.
Ensure documentation is thorough, up-to-date, and easily accessible for the team.
Cost/Benefit Analysis
Present opportunities for improvements, providing cost/benefit analysis to leadership to guide informed, scalable, and efficient data architecture decisions.
Experience
Required Qualifications:
Strong understanding of Agile principles, preferably Scrum, and ability to work in an Agile environment.
Preferred Qualifications
Streaming Technologies:
Knowledge of Structured Streaming technologies such as Spark, Kafka, EventHub, or similar technologies.
Cloud & Data Technologies
Experience integrating machine learning models with data pipelines, especially in cloud environments.
Certifications (Optional)
Google Professional Data Engineer (Preferred)
AWS Certified Big Data – Specialty (Preferred)
Databricks Certified Associate Developer (Preferred)
Education: Bachelors Degree
Skills: automated testing frameworks,kafka,data engineering,data pipeline development,nosql,python,pipelines,data warehousing,ci/cd pipelines,gcp,etl processes,gcp services,java,git,databricks,data integration techniques,machine learning,sql,learning,agile principles
Job Summary
We are seeking a Senior Data Engineer to lead and execute the design, development, and maintenance of scalable data pipelines, data workflows, and machine learning feature engineering processes. The ideal candidate will have extensive experience with SQL, NoSQL, Kafka, GCP services, and data pipeline development, as well as a proven track record in optimizing data solutions for performance and scalability. This role requires a passion for driving innovation and continuous improvement while mentoring junior team members in best practices.
Key Responsibilities
Provide Technical Leadership:
Offer guidance and leadership to the team, ensuring clarity and alignment between ongoing projects.
Facilitate collaboration across teams to solve complex data engineering challenges.
Promote best practices in data engineering to ensure consistency and quality across all initiatives.
Build And Maintain Data Pipelines
Design, build, and maintain efficient, scalable, and reliable data pipelines to support data ingestion, transformation, and integration across multiple data sources and destinations.
Utilize tools like Kafka, Databricks, and other related technologies to ensure smooth data flow across systems.
Leverage GCP services such as BigQuery, Cloud Storage, Vertex AI, AutoMLOps, and Dataflow for data processing and analytics.
Drive Digital Innovation
Innovate and modernize data engineering approaches, focusing on extending core data assets (e.g., SQL-based, NoSQL-based, cloud-based, and real-time streaming data platforms).
Promote the use of cutting-edge technologies to improve the efficiency and performance of data workflows.
Implement Feature Engineering
Develop and manage feature engineering pipelines for machine learning workflows, utilizing tools like Vertex AI, BigQuery ML, and custom Python libraries.
Collaborate with data scientists to ensure that the data is transformed and prepared effectively for machine learning models.
Implement Automated Testing
Design and implement automated unit, integration, and performance testing frameworks to ensure high-quality, reliable, and scalable data solutions.
Ensure data workflows are tested for accuracy, reliability, and compliance with organizational standards.
Optimize Data Workflows
Optimize data workflows for performance, cost efficiency, and scalability in large, complex data environments.
Continuously monitor and improve the performance of data pipelines to handle large datasets effectively.
Mentor Team Members
Mentor junior team members on best practices in data engineering, guiding them on data principles, patterns, and processes.
Foster a collaborative environment that encourages skill development and knowledge sharing.
Draft And Review Documentation
Draft and review architectural diagrams, interface specifications, and other design documents to ensure clear communication and understanding of technical solutions.
Ensure documentation is thorough, up-to-date, and easily accessible for the team.
Cost/Benefit Analysis
Present opportunities for improvements, providing cost/benefit analysis to leadership to guide informed, scalable, and efficient data architecture decisions.
Experience
Required Qualifications:
- 4 years of professional Data Development experience.
- 4 years of hands-on experience with SQL and NoSQL technologies (e.g., Cassandra, MongoDB).
- 3 years of experience building and maintaining data pipelines and workflows.
- 5 years of experience with Java development.
- 2 years of experience developing with Python for data-related tasks.
- 3 years of experience with Kafka and real-time streaming data solutions.
- 2 years of experience in feature engineering for machine learning pipelines.
- Experience with GCP services such as BigQuery, Vertex AI Platform, Cloud Storage, AutoMLOps, and Dataflow.
- Strong understanding of ETL processes, data warehousing, and data integration techniques.
- Familiarity with CI/CD pipelines and automated testing frameworks.
- Expertise in version control tools like Git and experience with GitHub Actions.
Strong understanding of Agile principles, preferably Scrum, and ability to work in an Agile environment.
Preferred Qualifications
Streaming Technologies:
Knowledge of Structured Streaming technologies such as Spark, Kafka, EventHub, or similar technologies.
Cloud & Data Technologies
- Familiarity with GitHub SaaS, Databricks, and PySpark for data processing and analysis.
- Experience with Spark development and knowledge of distributed computing.
Experience integrating machine learning models with data pipelines, especially in cloud environments.
Certifications (Optional)
Google Professional Data Engineer (Preferred)
AWS Certified Big Data – Specialty (Preferred)
Databricks Certified Associate Developer (Preferred)
Education: Bachelors Degree
Skills: automated testing frameworks,kafka,data engineering,data pipeline development,nosql,python,pipelines,data warehousing,ci/cd pipelines,gcp,etl processes,gcp services,java,git,databricks,data integration techniques,machine learning,sql,learning,agile principles
Salary : $40 - $45