What are the responsibilities and job description for the Data Science Intern, Registries and Real-World Data position at Verily?
Who We Are
Verily is a subsidiary of Alphabet that is using a data-driven approach to change the way people manage their health and the way healthcare is delivered. Launched from Google X in 2015, our purpose is to bring the promise of precision health to everyone, every day. We are focused on generating and activating data from a variety of sources, including clinical, social, behavioral and the real world, to arrive at the best solutions for a person based on a comprehensive view of the evidence. Our unique expertise and capabilities in technology, data science and healthcare enable the entire healthcare ecosystem to drive better health outcomes.
Description
Verily’s internship is a paid 13 week program for rising seniors, either undergraduate or graduate students, who are interested in working at the intersection of technology, data science and healthcare. The program is designed for all students, and again this year we encourage students who have been historically underrepresented in this field to explore the program, which is a pathway towards full-time employment within Verily. This includes but is not limited to: Black/African-American, Latinx/Hispanic, Native American, students with disabilities, veterans, and non-binary people.
As a Data Science Intern on our Registries and RWD (real-world data) team, you will be supporting our core mission to drive innovation in evidence generation for research and care decisions. We are building new types of longitudinal datasets that have foundations of RWD sources, such as EHRs (electronic health records) and claims data, and are augmented with prospective data collection. During your internship, you will contribute to the development and deployment of models that enable scalable curation of RWD. This may include multi-source integrations and reconciliations, creating derived features from the source data (e.g., abstraction of clinical concepts from unstructured data), and facilitating data quality assessments. You will work with a diverse cross-functional team to build reusable and scalable tools and to deliver products that unlock information from structured and unstructured clinical data.
**Join us for a unique 13 week internship that will take place May 16th to August 9th 2025 OR June 16th to September 12th 2025**
Responsibilities
Minimum Qualifications
Verily is a subsidiary of Alphabet that is using a data-driven approach to change the way people manage their health and the way healthcare is delivered. Launched from Google X in 2015, our purpose is to bring the promise of precision health to everyone, every day. We are focused on generating and activating data from a variety of sources, including clinical, social, behavioral and the real world, to arrive at the best solutions for a person based on a comprehensive view of the evidence. Our unique expertise and capabilities in technology, data science and healthcare enable the entire healthcare ecosystem to drive better health outcomes.
Description
Verily’s internship is a paid 13 week program for rising seniors, either undergraduate or graduate students, who are interested in working at the intersection of technology, data science and healthcare. The program is designed for all students, and again this year we encourage students who have been historically underrepresented in this field to explore the program, which is a pathway towards full-time employment within Verily. This includes but is not limited to: Black/African-American, Latinx/Hispanic, Native American, students with disabilities, veterans, and non-binary people.
As a Data Science Intern on our Registries and RWD (real-world data) team, you will be supporting our core mission to drive innovation in evidence generation for research and care decisions. We are building new types of longitudinal datasets that have foundations of RWD sources, such as EHRs (electronic health records) and claims data, and are augmented with prospective data collection. During your internship, you will contribute to the development and deployment of models that enable scalable curation of RWD. This may include multi-source integrations and reconciliations, creating derived features from the source data (e.g., abstraction of clinical concepts from unstructured data), and facilitating data quality assessments. You will work with a diverse cross-functional team to build reusable and scalable tools and to deliver products that unlock information from structured and unstructured clinical data.
**Join us for a unique 13 week internship that will take place May 16th to August 9th 2025 OR June 16th to September 12th 2025**
Responsibilities
- Work closely with core DS team members to design and create longitudinal datasets integrating multiple data sources.
- Build and evaluate highly accurate machine learning models / AI tools using sparsely labeled healthcare datasets.
- Implement, build on and augment existing LLM/NLP tools to maximize the value of using unstructured medical data across a range of research and care applications.
- Explore difficult, non-routine analysis problems and identify potential solutions, handling data challenges from a real-world setting.
- Communicate technical methods and results clearly in well structured reports and presentations to a range of technical and non-technical audiences.
Minimum Qualifications
- Currently enrolled in university working towards a graduate degree (masters or PhD, with plans to graduate by June 2026) in a quantitative discipline (e.g., data sciences, statistics, biomedical informatics, computer science, applied mathematics, or similar).
- Experience working with advanced machine learning and AI techniques (supervised and unsupervised methods, LLMs, NLP).
- Creative and methodical problem solving: understand needs, identify options, form hypotheses, generate robust results, make informed decisions, and learn faster through feedback.
- Strong proficiency in Python.
- Experience working with clinical data, including EHR data, claims data, or other real-world health data sets, including an understanding of the complexities of structured and unstructured clinical data.
- Familiarity with software engineering practices and experience developing production software.
- Ability to work cross-functionally on teams, with a tolerance for ambiguity.
Salary : $52 - $63