What are the responsibilities and job description for the Senior Data Scientist with Vision-Language Models position at People Force Consulting Inc?
Job Details
As a Senior Data Scientist with expertise in Vision-Language Models (VLMs) and related technologies to lead the development of efficient, cost-effective multimodal AI solutions. The ideal candidate will have experience with advanced VLM frameworks such as VILA, Isaac, and VSS, and a proven track record of implementing production-grade VLMs for training and testing in real-world environments. A background in healthcare, particularly medical devices, is highly desirable. This role will focus on exploring and deploying state-of-the-art VLM methodologies on cloud platforms like AWS or Azure.
Experience: - 10 Years
Location: - San Jose, CA, Waukesha, WI (100%onsite needed)
Educational Qualifications: - Education: Master s or Ph.D. in Computer Science, Data Science, Machine Learning, or a related field.
Responsibilities: -
VLM Development & Deployment:
- Design, train, and deploy efficient Vision-Language Models (e.g., VILA, Isaac Sim) for multimodal applications.
- Explore cost-effective methods such as knowledge distillation, modal-adaptive pruning, and LoRA fine-tuning to optimize training and inference.
- Implement scalable pipelines for training/testing VLMs on cloud platforms (AWS SageMaker, Azure ML).
Multimodal AI Solutions:
- Develop solutions that integrate vision and language capabilities for applications like image-text matching, visual question answering (VQA), and document data extraction.
- Leverage interleaved image-text datasets and advanced techniques (e.g., cross-attention layers) to enhance model performance.
Healthcare Domain Expertise:
- Apply VLMs to healthcare-specific use cases such as medical imaging analysis, position detection, motion detection and measurements.
- Ensure compliance with healthcare standards while handling sensitive data.
Efficiency Optimization:
- Evaluate trade-offs between model size, performance, and cost using techniques like elastic visual encoders or lightweight architectures.
- Benchmark different VLMs (e.g., GPT-4V, Claude 3.5) for accuracy, speed, and cost-effectiveness on specific tasks.
Collaboration & Leadership:
- Collaborate with cross-functional teams including engineers and domain experts to define project requirements.
- Mentor junior team members and provide technical leadership on complex projects.
Experience:
- Minimum of 10 years of experience in machine learning or data science roles with a focus on vision-language models.
- Proven expertise in deploying production-grade multimodal AI solutions.
- Experience in healthcare or medical devices is highly preferred.
Technical Skills:
- Proficiency in Python and ML frameworks (e.g., PyTorch, TensorFlow).
- Hands-on experience with VLMs such as VILA, Isaac Sim, or VSS.
- Familiarity with cloud platforms like AWS SageMaker or Azure ML Studio for scalable AI deployment.
Domain Knowledge:
- Understanding of medical datasets (e.g., imaging data) and healthcare regulations.
Soft Skills:
- Strong problem-solving skills with the ability to optimize models for real-world constraints.
- Excellent communication skills to explain technical concepts to diverse stakeholders
Good to have skills: -
- Vision-Language Models: VILA, Isaac Sim, EfficientVLM
- Cloud Platforms: AWS SageMaker, Azure ML
- Optimization Techniques: LoRA fine-tuning, modal-adaptive pruning
- Multimodal Techniques: Cross-attention layers, interleaved image-text datasets
- MLOps Tools: Docker, MLflow
Salary : $70 - $80