What are the responsibilities and job description for the Lead Data Engineer - Hybrid (1 day onsite per week), Must be local to San Francisco, CA (Posted by SAM) position at Global Force USA?
Job Details
Required Skills & Experience:
- Hands-on code mindset with deep understanding in technologies / skillset and an ability to understand larger picture.
- Sound knowledge to understand Architectural Patterns, best practices and Non-Functional Requirements
- Overall, 8-10 years of experience in heavy volume data processing, data platform, data lake, big data, data warehouse, or equivalent.
- 5 years of experience with strong proficiency in Python and Spark (must-have).
- 3 years of hands-on experience in ETL workflows using Spark and Python.
- 4 years of experience with large-scale data loads, feature extraction, and data processing pipelines in different modes near real time, batch, realtime.
- Solid understanding of data quality, data accuracy concepts and practices.
- 3 years of solid experience in building and deploying ML models in a production setup. Ability to quickly adapt and take care of data preprocessing, feature engineering, model engineering as needed.
Preferred: Experience working with Python deep learning libraries like any or more than one of these - PyTorch, Tensorflow, Keras or equivalent.
Preferred: Prior experience working with LLMs, transformers. Must be able to work through all phases of the model development as needed.
Experience integrating with various data stores, including:
- SQL/NoSQL databases
- In-memory stores like Redis
- Data lakes (e.g., Delta Lake)
- Experience with Kafka streams, producers & consumers.
Required: Experience with Databricks or similar data lake / data platform.
Required: Java and Spring Boot experience with respect to data processing - near real time, batch based.