What are the responsibilities and job description for the Senior Data engineer position at Cloudious LLC?
Job Details
Job Description
Job Description
deal Candidate:
An undergraduate or Master's degree in Computer Science or equivalent engineering experience
6 years of professional software engineering and programming experience (Java, Python) with a focus on designing and developing complex data-intensive applications
3 years of architecture and design (patterns, reliability, scalability, quality) of complex systems
Advanced coding skills and practices (concurrency, distributed systems, functional principles, performance optimization)
Professional experience working in an agile environment
Strong analytical and problem-solving ability
Strong written and verbal communication skills
Experience in operating and maintaining production-grade software
Comfortable with tackling very loosely defined problems and thrive when working on a team which has autonomy in their day to day decisions
Preferred Skills
In-depth knowledge of software and data engineering best practices
Experience in mentoring and leading junior engineers
Experience in serving as the technical lead for complex software development projects
Experience with large-scale distributed data technologies and tools
Strong experience with multiple database models ( relational, document, in-memory, search, etc )
Strong experience with Data Streaming Architecture ( Kafka, Spark, Airflow, SQL, NoSQL, CDC, etc )
Strong knowledge of cloud data platforms and technologies such as GCS, BigQuery, Cloud Composer, Pub/Sub, Dataflow, Dataproc, Looker, and other cloud-native offerings
Strong Knowledge of Infrastructure as Code (IaC) and associated tools (Terraform, ansible etc)
Experience pulling data from a variety of data source types including Mainframe (EBCDIC), Fixed Length and delimited files, databases (SQL, NoSQL, Time-series)
Strong coding skills for analytics and data engineering (Java, Python, and Scala)
Experience performing analysis with large datasets in a cloud-based environment, preferably with an understanding of Google's Cloud Platform (GCP)
Understands how to translate business requirements to technical architectures and designs
Comfortable communicating with various stakeholders (technical and non-technical)
Experience with Airflow and Spark:
Airflow: Proven experience in using Apache Airflow for orchestrating and scheduling workflows. Ability to design, implement, and manage complex data pipelines. Understanding of DAGs (also how to dynamically create them), task dependencies, and error handling within Airflow.
Spark: Hands-on experience with Apache Spark for large-scale data processing and analytics. Proficiency in writing Spark jobs in Java (PySpak also fine as we're moving in that direction). Also, contains the ability to optimizie performance, and handling data transformations and aggregations at scale.
Familiarity with GCP Services:
BigQuery: Experience with Google BigQuery for running SQL queries on large datasets, optimizing queries for performance, and in general managing data warehousing solutions.
Composer: Knowledge of Google Cloud Composer for managing and orchestrating workflows.
Dataproc: Experience with Dataproc for managing and scaling Spark clusters, including configuring clusters, running jobs, and integrating with other GCP services.
Proficiency in Python, Java, and SQL:
Python: Strong foundation in Python, with experience in writing clean, efficient code and utilizing libraries such as Pandas and NumPy for data manipulation. Proficient in debugging, testing, and using Python for API interactions and external service integration.
Java: Proficiency in Java, especially for integrating with data processing frameworks. Experience with Java-based libraries and tools relevant to data engineering is a plus.
SQL: Experience in writing and optimizing complex SQL queries for data extraction, transformation, and analysis.
Knowledge of Terraform (Optional but Preferred):
Terraform: Familiarity with Terraform to automate the provisioning and management of cloud resources. Ability to write and maintain Terraform scripts to define and deploy GCP resources, ensuring infrastructure consistency and scalability.
What are the top 3 skills required for this role?
1.Python, Spark, SQL
2.GCP Bigquery
3.Airflow
Additional Information:
Team size, direct reports, key deliverables, unique selling points, additional qualifications, team culture etc.
Client is looking for Fintech experience
Experience in Mastercard Products
An undergraduate or Master's degree in Computer Science or equivalent engineering experience
6 years of professional software engineering and programming experience (Java, Python) with a focus on designing and developing complex data-intensive applications
3 years of architecture and design (patterns, reliability, scalability, quality) of complex systems
Advanced coding skills and practices (concurrency, distributed systems, functional principles, performance optimization)
Professional experience working in an agile environment
Strong analytical and problem-solving ability
Strong written and verbal communication skills
Experience in operating and maintaining production-grade software
Comfortable with tackling very loosely defined problems and thrive when working on a team which has autonomy in their day to day decisions
Preferred Skills
In-depth knowledge of software and data engineering best practices
Experience in mentoring and leading junior engineers
Experience in serving as the technical lead for complex software development projects
Experience with large-scale distributed data technologies and tools
Strong experience with multiple database models ( relational, document, in-memory, search, etc )
Strong experience with Data Streaming Architecture ( Kafka, Spark, Airflow, SQL, NoSQL, CDC, etc )
Strong knowledge of cloud data platforms and technologies such as GCS, BigQuery, Cloud Composer, Pub/Sub, Dataflow, Dataproc, Looker, and other cloud-native offerings
Strong Knowledge of Infrastructure as Code (IaC) and associated tools (Terraform, ansible etc)
Experience pulling data from a variety of data source types including Mainframe (EBCDIC), Fixed Length and delimited files, databases (SQL, NoSQL, Time-series)
Strong coding skills for analytics and data engineering (Java, Python, and Scala)
Experience performing analysis with large datasets in a cloud-based environment, preferably with an understanding of Google's Cloud Platform (GCP)
Understands how to translate business requirements to technical architectures and designs
Comfortable communicating with various stakeholders (technical and non-technical)
Experience with Airflow and Spark:
Airflow: Proven experience in using Apache Airflow for orchestrating and scheduling workflows. Ability to design, implement, and manage complex data pipelines. Understanding of DAGs (also how to dynamically create them), task dependencies, and error handling within Airflow.
Spark: Hands-on experience with Apache Spark for large-scale data processing and analytics. Proficiency in writing Spark jobs in Java (PySpak also fine as we're moving in that direction). Also, contains the ability to optimizie performance, and handling data transformations and aggregations at scale.
Familiarity with GCP Services:
BigQuery: Experience with Google BigQuery for running SQL queries on large datasets, optimizing queries for performance, and in general managing data warehousing solutions.
Composer: Knowledge of Google Cloud Composer for managing and orchestrating workflows.
Dataproc: Experience with Dataproc for managing and scaling Spark clusters, including configuring clusters, running jobs, and integrating with other GCP services.
Proficiency in Python, Java, and SQL:
Python: Strong foundation in Python, with experience in writing clean, efficient code and utilizing libraries such as Pandas and NumPy for data manipulation. Proficient in debugging, testing, and using Python for API interactions and external service integration.
Java: Proficiency in Java, especially for integrating with data processing frameworks. Experience with Java-based libraries and tools relevant to data engineering is a plus.
SQL: Experience in writing and optimizing complex SQL queries for data extraction, transformation, and analysis.
Knowledge of Terraform (Optional but Preferred):
Terraform: Familiarity with Terraform to automate the provisioning and management of cloud resources. Ability to write and maintain Terraform scripts to define and deploy GCP resources, ensuring infrastructure consistency and scalability.
What are the top 3 skills required for this role?
1.Python, Spark, SQL
2.GCP Bigquery
3.Airflow
Additional Information:
Team size, direct reports, key deliverables, unique selling points, additional qualifications, team culture etc.
Client is looking for Fintech experience
Experience in Mastercard Products
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.