Demo

Research Intern - LLM Inference Acceleration and Optimization

Lensa
Mountain View, CA Intern
POSTED ON 2/20/2025
AVAILABLE BEFORE 3/20/2025
Research Internships at Microsoft provide a dynamic environment for research careers with a network of world-class research labs led by globally-recognized scientists and engineers, who pursue innovation in a range of scientific and technical disciplines to help solve complex challenges in diverse fields, including computing, healthcare, economics, and the environment.

If you are excited about investigating and implementing cutting-edge large language model (LLM) inference techniques and optimizations like quantized KV-caches, flash/paged/radix attention, speculative decoding, and advanced collective communication on graphics processing units (GPUs), come join the AIFX team at Microsoft Azure and contribute to a production-focused, planetary-scale LLM serving stack that is being built on top of excellent open-source efforts like vLLM, SGLang, and HuggingFace. The work includes investigation of cutting-edge, state-of-the-art approaches like "You only cache once (YOCO)" and leveraging them to save memory and compute for serving LLMs at scale. You will get a chance to explore, implement, optimize, and publish your research ideas in collaboration with teams at Microsoft working on real-world production workloads at an unprecedented scale.

Responsibilities

Research Interns put inquiry and theory into practice. Alongside fellow doctoral candidates and some of the world’s best researchers, Research Interns learn, collaborate, and network for life. Research Interns not only advance their own careers, but they also contribute to exciting research and development strides. During the 12-week internship, Research Interns are paired with mentors and expected to collaborate with other Research Interns and researchers, present findings, and contribute to the vibrant life of the community. Research internships are available in all areas of research, and are offered year-round, though they typically begin in the summer.

Qualifications

Required Qualifications

  • Accepted or currently enrolled in a PhD program in Computer Science or related STEM field.
  • At least 6 months of experience with training and/or inference of recent LLMs like Llama and Phi.

Other Requirements

  • Research Interns are expected to be physically located in their manager’s Microsoft worksite location for the duration of their internship.
  • In addition to the qualifications below, you’ll need to submit a minimum of two reference letters for this position as well as a cover letter and any relevant work or research samples. After you submit your application, a request for letters may be sent to your list of references on your behalf. Note that reference letters cannot be requested until after you have submitted your application, and furthermore, that they might not be automatically requested for all candidates. You may wish to alert your letter writers in advance, so they will be ready to submit your letter.

Preferred Qualifications

  • Experience with large-scale collective communication on GPUs.
  • Experience with performance benchmarking of AI frameworks like Pytorch, vLLM, and/or SGLang.
  • Ability to convert research ideas into working code that runs and scales on real systems.
  • Proficient interpersonal skills and growth mindset.
  • Open to failing fast in pursuit of ambitious ideas.

The base pay range for this internship is USD $6,550 - $12,880 per month. There is a different range applicable to specific work locations, within the San Francisco Bay area and New York City metropolitan area, and the base pay range for this role in those locations is USD $8,480 - $13,920 per month.

Certain roles may be eligible for benefits and other compensation. Find additional benefits and pay information here: https://careers.microsoft.com/us/en/us-intern-pay (https://careers.microsoft.com/us/en/us-intern-pay )

Microsoft accepts applications and processes offers for these roles on an ongoing basis.

Microsoft is an equal opportunity employer. Consistent with applicable law, all qualified applicants will receive consideration for employment without regard to age, ancestry, citizenship, color, family or medical care leave, gender identity or expression, genetic information, immigration status, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran or military status, race, ethnicity, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable local laws, regulations and ordinances. If you need assistance and/or a reasonable accommodation due to a disability during the application process, read more about requesting accommodations (https://careers.microsoft.com/v2/global/en/accessibility.html) .

Salary : $6,550 - $12,880

If your compensation planning software is too rigid to deploy winning incentive strategies, it’s time to find an adaptable solution. Compensation Planning
Enhance your organization's compensation strategy with salary data sets that HR and team managers can use to pay your staff right. Surveys & Data Sets

What is the career path for a Research Intern - LLM Inference Acceleration and Optimization?

Sign up to receive alerts about other jobs on the Research Intern - LLM Inference Acceleration and Optimization career path by checking the boxes next to the positions that interest you.
Income Estimation: 
$102,775 - $137,396
Income Estimation: 
$153,127 - $203,425
Income Estimation: 
$139,626 - $193,276
Income Estimation: 
$164,650 - $211,440
Income Estimation: 
$130,030 - $173,363
Income Estimation: 
$151,423 - $191,781
Income Estimation: 
$224,177 - $300,651
Income Estimation: 
$213,290 - $266,052
Income Estimation: 
$225,010 - $318,974
Income Estimation: 
$182,205 - $244,055
Income Estimation: 
$68,606 - $89,684
Income Estimation: 
$88,975 - $120,741
Income Estimation: 
$68,121 - $81,836
Income Estimation: 
$71,928 - $87,026
Income Estimation: 
$125,958 - $157,570
Income Estimation: 
$82,813 - $108,410
Income Estimation: 
$120,989 - $162,093
Income Estimation: 
$74,806 - $91,633
Income Estimation: 
$71,928 - $87,026
Income Estimation: 
$145,337 - $174,569
Income Estimation: 
$102,775 - $137,396
Income Estimation: 
$153,127 - $203,425
Income Estimation: 
$139,626 - $193,276
Income Estimation: 
$164,650 - $211,440
Income Estimation: 
$130,030 - $173,363
View Core, Job Family, and Industry Job Skills and Competency Data for more than 15,000 Job Titles Skills Library

Job openings at Lensa

Lensa
Hired Organization Address Sitka, AK Full Time
40756BR Requisition ID 40756BR Business Unit TSU Job Description Are you interested in working on design and constructio...
Lensa
Hired Organization Address Anchorage, AK Full Time
Lensa is the leading career site for job seekers at every stage of their career. Our client, Charlie's Produce, is seeki...
Lensa
Hired Organization Address Juneau, AK Full Time
Lensa is the leading career site for job seekers at every stage of their career. Our client, St. George Tanaq Corporatio...
Lensa
Hired Organization Address Anchorage, AK Full Time
Description Works with surgeons, Anesthesiologists, RN's, and other surgical personnel in delivering patient care during...

Not the job you're looking for? Here are some other Research Intern - LLM Inference Acceleration and Optimization jobs in the Mountain View, CA area that may be a better fit.

Junior Product Manager

Inference, Palo Alto, CA

Optimization Research Intern

ADDA Infusion, Mountain View, CA

AI Assistant is available now!

Feel free to start your new journey!