What are the responsibilities and job description for the Data Engineer position at HStechnologies LLC?
Job Details
Job Title: Data Engineer
We are looking for an experienced data engineer skilled in cloud-based platforms; data pipeline orchestration; infrastructure automation, having hands-on skills on all three; AWS, Azure, Google Cloud Platform. Scalable ETL pipelines' building and managing, infrastructure provisioning with Terraform, building CI/CD workflows; all are expected skills that will help scale the data engineering processes. Here, you have a significant responsibility in designing and implementing optimal data infrastructure for our organization.
Key Responsibilities:
Data Pipeline Design and Development : Scalable ETL/ELT pipeline creating, data transformation from sources of varied types, and preparing useful formats for analytics as well as for machine learning purposes. Use AWS, Azure, Google Cloud Platform, and their associated services in designing efficient and cost-effective data solutions.
- Data Orchestration: Apply workflow orchestration by using Apache Airflow to guarantee that there is timely data flow between different systems.
- CI/CD Implementation: Development and maintenance of CI/CD pipelines in order to automate tests, deploy, and monitor data pipelines and infrastructure changes.
- Infrastructure as Code (IaC):Using Terraform to automate the provisioning and configuration of cloud resources across AWS, Azure, and Google Cloud Platform.
- Data Quality and Performance: This person will focus on validating data quality, integrity, and reliability and optimize data pipelines performance.
- Collaboration: Collaboration with data scientists, analysts, and other engineers to understand data needs and deliver reliable data solutions.
- Documentation of and Best Practices in Data Engineering: Documenting workflows, data lineage, and best practices in data engineering to improve team efficiency and maintain the highest standards.
Qualifications:
Bachelors in Computer Science, Information Technology or relevant field(s).
6 years of experience in Data Engineering, Data Infrastructure, and similar.
Technical Skills:
Cloud Platforms: Significant experience with services on AWS, Azure, Google Cloud Platform, S3, Redshift, BigQuery, Data Factory, Synapse, and more.
Programming Languages: Python proficient - exposed and worked with it for ETL processes, data manipulations and automation
Data Orchestration : exposure to the implementation of Apache Airflow scheduling and monitoring workflow
CI/CD Pipelines: experienced with CI/CD tools like Jenkins, GitLab CI, and AWS CodePipeline
Infrastructure as Code (IaC) Terraform proficient - to automate the cloud infrastructure.
- ETL/ELT Tools: Experience with ETL tools, including data transformation and loading for analytics.
Data Storage and Warehousing: Familiarity with data warehouses such as Redshift, BigQuery, or Snowflake, along with best practices in data modeling.
Version Control: Proficient in Git for versioning and collaboration.
Preferred Skills:
Experience with monitoring tools like Prometheus, Grafana, or CloudWatch.
- Docker and Kubernetes for containerization and orchestration.
- Familiarity with Spark or other big data processing frameworks.
- SQL and NoSQL databases.
Soft Skills: - Excellent problem-solving skills and attention to detail.
- Team-oriented mindset to work collaboratively with colleagues.
- Excellent communication skills for interacting with stakeholders.