What are the responsibilities and job description for the Data Engineer with AWS, PySpark, Databricks position at Solios Corp?
Job Details
Position: Data Engineer with AWS, PySpark, Databricks Location: Rahway, NJ - (Onsite from Day 1 - Need Only Locals) Duration: Long Term Contract Interview Mode: F2F
Job Description:
We are seeking a highly skilled and motivated Data Engineer with 10-15 years experience and expertise in AWS, PySpark, and Databricks to join our dynamic team. As a Data Engineer, you will play a key role in the design, development, and management of our data infrastructure and pipelines. You will work closely with data scientists, analysts, and other engineering teams to ensure data is clean, reliable, and accessible for analysis.
Responsibilities:
Design, develop, and maintain scalable and high-performance data pipelines using AWS services (e.g., S3, Redshift, Lambda, Glue).
Utilize PySpark for large-scale data processing and ETL (Extract, Transform, Load) operations.
Work within Databricks for data analysis, machine learning, and collaborating with data science teams.
Manage data storage, transformation, and orchestration across cloud platforms to ensure data is accurate, secure, and available for business needs.
Implement data validation and quality checks to ensure high integrity of data sets.
Optimize data workflows and performance across cloud-based data infrastructure.
Collaborate with data scientists, analysts, and other stakeholders to understand data requirements and ensure alignment with business goals.
Monitor and troubleshoot data pipeline performance, ensuring continuous data flow and minimizing downtime.
Stay up-to-date with the latest developments in cloud data technologies, big data, and data engineering practices.
Requirements:
Proven experience as a Data Engineer with hands-on expertise in AWS, PySpark, and Databricks.
Strong proficiency in Python and its application in big data processing.
Experience with cloud-based data storage and processing technologies, including Amazon S3, AWS Redshift, and AWS Glue.
Expertise in building and optimizing scalable ETL pipelines for large datasets.
Strong understanding of data warehousing, data lakes, and data pipeline architecture.
Familiarity with containerization technologies (e.g., Docker, Kubernetes) is a plus.
Knowledge of SQL and data modeling techniques.
Excellent problem-solving skills and the ability to debug complex data issues.
Strong communication skills and the ability to work in a collaborative team environment.
Preferred Qualifications:
Experience with Apache Spark and Databricks environment.
Familiarity with machine learning workflows and data preparation for machine learning models.
Experience with CI/CD pipelines for data engineering.
Thanks & Regards
Mahesh