What are the responsibilities and job description for the Lead Assistant Manager - Data Engineer position at EXL Service?
Senior Data Engineer, Data feeds
About the Role:
Our mission is to empower those who strive to achieve better financial health. Data feeds team plays a crucial role in achieving our mission. We are seeking a Sr Data Engineer for our Data feeds team to provide batch data processing, real-time streaming, and pipeline orchestration capabilities. You’ll be part of the Data Technology organization that helps drive business decisions using data. You will have the opportunity to use your expertise in solving big data problems, design thinking, coding and analytical skills to build data pipelines and data products and leverage our PB scale data. Our business is data driven and you will build solutions to help the company in the areas of marketing, pricing, credit, funding, Investing and many other business aspects, which is transforming the banking industry. We’re looking for talented Data Engineers passionate about building new data driven solutions with the latest Big Data technology.
What you’ll Do:
Create and maintain optimal data pipeline architecture
Build data pipelines that transform raw, unstructured data into formats that data analyst can use to for analysis
Assemble large, complex data sets that meet functional / non-functional business requirements
Identify, design, and implement internal process improvements: automating manual processes, optimizing data delivery, re-designing infrastructure for greater scalability, etc.
Build the infrastructure required for optimal extraction, transformation, and delivery of data from a wide variety of data sources using SQL and AWS Big Data technologies
Work with stakeholders including the Executive, Product, Engineering, and program teams to assist with data-related technical issues and support their data infrastructure needs.
Develop and maintain scalable data pipelines and builds out new integrations and processes required for optimal extraction, transformation, and loading of data from a wide variety of data sources using scalable distributed Data technologies
Implements processes and systems to validate data, monitor data quality, ensuring production data is always accurate and available for key stakeholders and business processes that depend on it
Write unit/integration tests, adopt Test-driven development, contribute to engineering wiki, and document work
Performs root cause analysis on internal and external data and processes to answer specific business questions and identify opportunities for improvement
About you:
6 yrs experience and bachelor’s degree in computer science, Informatics, Information Systems or a related field; or equivalent work experience
In-depth working experience of distributed systems Hadoop/MapReduce, Spark, Hive, Kafka and Oozie/Airflow
At least 5 years of solid production quality coding experience in data pipeline implementation in Java, Scala and Python
Experience with AWS cloud services: EC2, EMR, RDS
Experience in GIT, JIRA, Jenkins, Shell scripting
Familiar with Agile methodology, test-driven development, source control management and test automation
Experience supporting and working with cross-functional teams in a dynamic environment
You're passionate about data and building efficient data pipelines
You have excellent listening skills and are empathetic to others
You believe in simple and elegant solutions and give paramount importance to quality
You have a track record of building fast, reliable, and high-quality data pipelines
Nice to have skills:
Experience building Marketing Data pipelines including Direct Mail will be a big plus
Experience with Snowflake and Salesforce Marketing Cloud
Working knowledge of open-source ML frameworks and end-to-end model development life cycle
Previous working experience with running containers (Docker/LXC) in a production environment using one of the container orchestration services (Kubernetes, AWS ECS, AWS EKS)