What are the responsibilities and job description for the Lead Data engineer - c2c - candidate must be local to San Francisco, CA (posted by Sid) position at Global Force USA?
Job Details
Role: Lead Data Engineer c2c
Duration: Long Term contract
Location: San Francisco, CA
Required Skills
- Python and PySpark. Kafka and Kafka streams. MySQL and MySQL Heat. Azure Delta Lake. ETL processes. Kafka integrations using Spring Boot Java. Data streaming with Spark.
Additional Skills
- Experience with Delta Lake on Azure Tableau or similar data visualization tools Unity Catalog in Databricks Airflow for pipeline management
Responsibilities
Development Tasks:
- Collect metrics based on user interactions.
- Visualize data for business teams.
- Develop and redesign data pipelines using Kafka streams.
- Implement solutions using Spring Boot Java and Databricks Spark streaming.
Leadership Duties:
- Lead the measurement processes from requirements gathering to production delivery.
- Collaborate with other team leads, business partners, and product managers.
- Balance between hands-on engineering (50%) and team leadership (50%).
Collaboration Structure:
- Onsite: Lead role (this resource)
o Nearshore: Senior developer.
o Offshore: Data engineer role.
Lead Data Engineer - Job Description
Required Skills & Experience:
- Hands-on code mindset with deep understanding in technologies / skillset and an ability to understand larger picture.
- Sound knowledge to understand Architectural Patterns, best practices and Non-Functional Requirements
- Overall, 8-10 years of experience in heavy volume data processing, data platform, data lake, big data, data warehouse, or equivalent.
- 5 years of experience with strong proficiency in Python and Spark (must-have).
- 3 years of hands-on experience in ETL workflows using Spark and Python.
- 4 years of experience with large-scale data loads, feature extraction, and data processing pipelines in different modes near real time, batch, realtime.
- Solid understanding of data quality, data accuracy concepts and practices.
- 3 years of solid experience in building and deploying ML models in a production setup. Ability to quickly adapt and take care of data preprocessing, feature engineering, model engineering as needed.
- Preferred: Experience working with Python deep learning libraries like any or more than one of these - PyTorch, Tensorflow, Keras or equivalent.
- Preferred: Prior experience working with LLMs, transformers. Must be able to work through all phases of the model development as needed.
- Experience integrating with various data stores, including:
o SQL/NoSQL databases
o In-memory stores like Redis
o Data lakes (e.g., Delta Lake)
- Experience with Kafka streams, producers & consumers.
- Required: Experience with Databricks or similar data lake / data platform.
- Required: Java and Spring Boot experience with respect to data processing - near real time, batch based.
- Familiarity with notebook-based environments such as Jupyter Notebook.
- Adaptability: Must be open to learning new technologies and approaches.
- Initiative: Ability to take ownership of tasks, learn independently, and innovate.
- With technology landscape changing rapidly, ability and willingness to learn new technologies as needed and produce results on job.
Preferred Skills:
- Ability to pivot from conventional approaches and develop creative solutions.