What are the responsibilities and job description for the Google Cloud Platform Data Engineer position at AQUA Information Systems, Inc.?
Job Details
Job Description:
Client is looking for a Data engineer with experience with ETL workflows and data pipelines using tools like Hive, Spark and Airflow.
The ideal candidate should be an expert in
- Architect scalable, high-performance data solutions on Google Cloud Platform (Google Cloud Platform) to support both batch and real-time data processing workflows.
- Design and implement data pipelines and architectures to ingest, store, transform, and analyze large volumes of data across various Google Cloud Platform services (e.g., BigQuery, Dataflow, Pub/Sub, Cloud Storage).
Skills
Proven experience as a Data Engineer, preferably in a big data environment.
Expertise in Hive, Spark, and Apache Hudi for big data processing and storage.
Hands-on experience with BigQuery and Google Cloud Platform (Google Cloud Platform) services such as GCS, Dataflow, and Pub/Sub.
Strong programming skills in Scala and Python, with experience in building data pipelines and ETL processes.
Proficiency with workflow orchestration tools like Apache Airflow.
Solid understanding of data warehousing concepts, data modelling, and schema design.
Knowledge of distributed systems and parallel processing.
Strong problem-solving skills and ability to work with large datasets in a fast-paced environment.
Responsibilities
Design, develop, and maintain robust and scalable ETL workflows and data pipelines using tools like Hive, Spark, and Airflow.
Implement and manage data storage and processing solutions using Apache Hudi and BigQuery.
Develop and optimize data pipelines for structured and unstructured data in Google Cloud Platform environments, leveraging GCS for data storage.
Write clean, maintainable, and efficient code in Scala and Python to process and transform data.
Ensure data quality, integrity, and consistency by implementing appropriate data validation and monitoring techniques.
Work with cross-functional teams to understand business requirements and deliver data solutions that drive insights and decision-making.
Troubleshoot and resolve performance and scalability issues in data processing and pipelines.
Stay updated with the latest developments in big data technologies and tools and incorporate them into the workflow as appropriate.