What are the responsibilities and job description for the Senior ML Data Engineer position at Futran Tech Solutions Pvt. Ltd.?
About Us :
LTIMindtree is a global technology consulting and digital solutions company that enables enterprises across industries to reimagine business models, accelerate innovation, and maximize growth by harnessing digital technologies. As a digital transformation partner to more than 700 clients, LTIMindtree brings extensive domain and technology expertise to help drive superior competitive differentiation, customer experiences, and business outcomes in a converging world. Powered by nearly 90,000 talented and entrepreneurial professionals across more than 30 countries, LTIMindtree - a Larsen & Toubro Group company - combines the industry-acclaimed strengths of erstwhile Larsen and Toubro Infotech and Mindtree in solving the most complex business challenges and delivering transformation at scale. For more information, please visit .
Job Title :
Senior ML Data Engineer
Work Location : Lyndhurst, NJ
Work Mode : Remote
Job Description :
Senior ML Data Engineer Feature Engineering ETL
We seek a talented Senior Data Engineer with ML feature engineering expertise to join our Consumer ML team This role involves designing and implementing advanced feature engineering and ETL pipelines to enable robust machine learning applications The ideal candidate has hands-on experience with Databricks a deep understanding of the medallion architecture and a proven track record of supporting the end-to-end ML lifecycle Experience with MLflow and an aptitude for creating data driven insights are highly desirable.
Key Responsibilities
Feature Engineering Data Integration Develop and maintain feature engineering pipelines using Databricks to support ML models effectively
Data Pipeline Development Integrate diverse data sources eg clickstreams user behaviour demographic data to create user behaviour features profiles for complex ML tasks
Medallion Architecture Design and implement ETL, ELT pipelines aligned with the bronze silver and gold layers of the medallion architecture
Model Support Build data pipelines to support ML model training calibration and deployment leveraging MLflow for experiment tracking and performance monitoring
Query Optimization Low Latency Pipelines Design low latency production ready data pipelines to support real-time and batch model inference
CICD Practices Apply CICD principles for seamless pipeline deployment
Data Governance Ensure pipelines comply with security and regulatory standards particularly for handling PII and maintain metadata and master data across the data catalogue
Collaboration Work closely with ml scientists ml engineers and other stakeholders to align data transformation with business objectives
Qualifications
7 years in data engineering and at least 4 years focusing on ML feature engineering ETL pipeline development and data preparation for ML
Proven experience managing pipelines on Databricks using Apache Spark with a strong understanding of the medallion architecture
Familiarity with ML lifecycle management with MLflow experience as a strong plus and advanced skills in Apache Spark PySpark for big data processing and analytics
Proficient in Python for data manipulation and SQL for query optimization
Experience building pipelines for real-time and batch model serving in production environments and knowledge of CICD practices for ETLELT pipeline development
Expertise in metadata and master data management within technical data catalogues
Understanding of data security and compliance especially with sensitive data like PII