What are the responsibilities and job description for the Lead Data Scientist position at Natsoft?
Position: Technical Lead, Data Science and AI
Role Overview
We seek a dynamic, well-rounded Technical Lead to support our AI/ML initiatives and working directly with our pharmaceutical customers. You'll serve as the primary technical point of contact for clients while coordinating deliverables with our offshore teams. This role requires strong data engineering expertise, advanced SQL skills, and the ability to work independently to deliver results without relying on teams for some deliverables. You'll implement clinical NLP models using Hugging Face frameworks (BioBERT, ClinicalBERT, spaCy, BERT) while also handling data cleansing, ETL processes, and data quality management. Additionally, expertise in agentic AI, generative AI applications, and LLMOps is critical for this position. The candidate should be skilled integrator comfortable with no-code AI tools like Replit and Lovable.dev. The ideal candidate can lead the offshore team while also rolling up their sleeves to deliver technical solutions directly when needed.
Key Responsibilities
Technical Implementation
- Implement NLP/NER models (BioBERT, ClinicalBERT, BERT, spaCy) for clinical entity extraction
- Design and develop ETL processes and data pipelines for clinical trial documents
- Build and deploy agentic AI systems that can perform autonomous tasks with clinical data
- Create and optimize RAG (Retrieval Augmented Generation) systems for clinical applications
- Implement LLMOps practices for managing, deploying, and monitoring language models
- Develop generative AI interfaces for clinical data exploration and analysis
- Write complex SQL queries for data extraction, transformation, and analysis
- Perform data cleansing, normalization, and quality improvement activities
- Build and maintain data processing workflows with minimal dependency on other teams
- Implement data validation and quality assurance processes
- Independently deliver technical solutions when timelines require direct intervention
Team Coordination
- Coordinate technical tasks with offshore development teams
- Document technical specifications and requirements
- Provide technical guidance to junior team members
- Work with the Product team to translate business needs into technical tasks
- Facilitate communication between technical teams and clients
Client Engagement & Delivery
- Serve as primary technical point of contact for pharmaceutical clients
- Lead technical discussions and presentations with clients
- Translate client requirements into clear technical specifications
- Coordinate deliverables and ensure timely completion of client projects
- Provide regular status updates to clients on technical implementation
- Demonstrate technical solutions and new features to clients
- Troubleshoot and resolve client-reported issues
- Manage client expectations for technical deliverables
Qualifications
Required Skills
- Bachelor's degree in Computer Science, Data Science, or related field, Master's preferred
- 5 years experience in NLP/ML with focus on clinical or biomedical applications
- Strong SQL skills for complex data manipulation and analysis
- Experience with Hugging Face transformers (BERT, BioBERT, ClinicalBERT)
- Expert Python programming skills with focus on data processing
- Experience with clinical data or biomedical terminology
- Excellent client communication and presentation skills
- Proven ability to coordinate technical deliverables and manage timelines
- Experience working as technical liaison with clients
- Strong project coordination experience with offshore teams
Technical Knowledge
- Agentic AI frameworks and autonomous system design
- LLMOps practices including model serving, monitoring, and versioning
- Generative AI application development and RAG system implementation
- Vector databases and semantic search technologies
- Advanced database design, SQL optimization, and query performance tuning
- ETL architecture and implementation best practices
- Data cleansing techniques and methodologies
- Data pipeline development and maintenance
- NLP/NER model implementation and fine-tuning
- OCR and document processing techniques
- Clinical data structures and formats
Domain Knowledge
- Specialized knowledge of oncology, immunology and neuroscience clinical trials, treatments, biomarkers, and therapeutic approaches
- Familiarity with oncology drug classifications, mechanisms of action, and treatment protocols
- Familiarity with pharmaceutical development pipelines