What are the responsibilities and job description for the Data Engineer AWS, Spark, PySpark, Datalake position at Projas Technologies, LLC?
Job Details
The ideal candidate will have a strong background data engineering using Spark, PySpark, Python or Java and Scala. Spark streaming & Spark applications. You should be knowledgeable in using tools and frameworks such as Docker, Spark, Scala, Jupyter Notebook, Kubernetes, Feature Management Platforms, and SageMaker. Strong data engineering experience in AWS environment is required.
- Strong background in Spark and PySpark.
- Proven experience with AWS S3-based data lakes and EMR.
- In-depth knowledge of Amazon Web Services (EC2, S3, EMR, RDS) or equivalent cloud computing approaches.
Responsibilities:
- Work with XML, JSON, YML, and SQL; demonstrate strong Python and Linux skills.
- Utilize tools and frameworks including Docker, Spark, Scala, Jupyter Notebook, Kubernetes, Feature Management Platforms, and SageMaker.
- Apply advanced experience with scripting languages such as Python or Shell.
- Exhibit strong knowledge of software development methodologies and practices.
- Work in Agile development teams and demonstrate working knowledge of Agile (Scrum) development methodologies.
- Utilize experience with Amazon Web Services (EC2, S3, EMR, RDS) or equivalent cloud computing approaches.
- Showcase strong expertise in Data Warehousing and analytic architecture.
- Handle large data volumes efficiently.
- Develop stream-processing applications using Flink, Spark Streaming, Kinesis, PySpark, etc.
- Conduct design and code reviews, defect remediation, and create technical design specifications.
- Develop automated unit tests and provide estimates and sequence of individual activities for project plans.
- Analyze and synthesize various inputs to create software and services.
- Identify dependencies and integrate 3rd party solutions.
- Design and implement data pipelines on AWS S3-based data lakes.
- Collaborate effectively with peer engineers and architects to solve complex problems and deliver end-to-end quality.
- Communicate effectively with non-technical audiences including senior product and business management.
- Design and develop ETL jobs across multiple big data platforms and tools including S3, EMR, Scala, Python, and SQL.
- Evolve mature code bases into new technologies.
- Create and consume SOAP-based or JSON/REST web services and communicate with systems.
Skills:
Java, Spring, Springboot, Scala, Spark Streaming, JVM, XML, JSON, YML, SQL, Python, Linux, Docker, Spark, Scala, Jupyter Notebook, Kubernetes, Feature Management Platforms, SageMaker, Agile, Scrum, AWS, EC2, S3, EMR, RDS, Data Warehousing, Flink, Kinesis, PySpark, Unit Testing, 3rd Party Solutions, ETL, SOAP, REST, Web Services