Demo

Research Engineer, Speech Foundation Models

Tykhe Inc
Palo Alto, CA Full Time
POSTED ON 2/24/2025
AVAILABLE BEFORE 3/22/2025

We are seeking a highly skilled and experienced Research Lead for Speech, Audio, and Conversational AI to join our innovative team. In this role, you will spearhead the research and development of cutting-edge technologies in speech processing, text-to-speech (TTS), audio analysis, and real-time conversational AI. You will push the boundaries of what's possible in automatic speech recognition (ASR), speaker identification, diarization, speech synthesis, voice cloning, dubbing and audio generation.


Key Responsibilities:

  • Bring the state of the art in Audio/Speech and Large Language Models to develop advanced Audio Language Models and Speech Language Models.
  • Research, architect, and deploy new generative AI methods such as autoregressive models, causal models, and diffusion models
  • Design and implement low-latency end-to-end models with multilingual speech/audio as both input and output.
  • Conduct experiments to evaluate and improve the performance of these models, focusing on accuracy, naturalness, efficiency, and real-time capabilities across multiple languages.
  • Stay at the forefront of advancements in speech processing, audio analysis, and large language models, integrating new techniques into our foundation models.
  • Collaborate with cross-functional teams to integrate these foundation models into Krutrim's AI stack and products.
  • Publish research findings in top-tier conferences and journals such as INTERSPEECH, ICASSP, ICLR, ICML, NeurIPS, and IEEE/ACM Transactions on Audio, Speech, and Language Processing.
  • Mentor and guide junior researchers and engineers, fostering a collaborative and innovative team environment.
  • Drive the adoption of best practices in model development, including rigorous testing, documentation, and ethical considerations in multilingual AI.


Qualifications:

  • Ph.D. in Computer Science, Electrical Engineering, or a related field with a focus on speech processing, audio analysis, and machine learning.
  • Train speech / audio models for representation (like, W2V-BERT, SONAR, AST), generation (like, Hi-Fi GAN, VQ-GAN, AudioLDM), Conformers, multilingual multitask models (like, SeamlessM4T).
  • Expertise with Audio Language Models like AudioPALM, Moshi and Seamless M4T
  • Proven track record of developing and applying novel neural network architectures such as Transformers, Mixture of Experts, Diffusion Models, and State Space Machines (MAMBA, SAMBA).
  • Extensive experience in developing and optimizing models for low-latency, real-time applications.
  • Strong background in multilingual speech recognition, voice cloning, dubbing and synthesis, with an understanding of the challenges specific to different language families.
  • Proficiency in deep learning frameworks (e.g., TensorFlow, PyTorch) and experience deploying large-scale speech and audio models.
  • Demonstrated expertise in high-performance computing with proficiency in Python, C/C , CUDA, and kernel-level programming for AI applications.
  • Experience with audio signal processing techniques and their application in end-to-end neural models.

If your compensation planning software is too rigid to deploy winning incentive strategies, it’s time to find an adaptable solution. Compensation Planning
Enhance your organization's compensation strategy with salary data sets that HR and team managers can use to pay your staff right. Surveys & Data Sets

What is the career path for a Research Engineer, Speech Foundation Models?

Sign up to receive alerts about other jobs on the Research Engineer, Speech Foundation Models career path by checking the boxes next to the positions that interest you.
Income Estimation: 
$113,077 - $147,784
Income Estimation: 
$135,356 - $164,911
Income Estimation: 
$153,902 - $198,246
Income Estimation: 
$113,077 - $147,784
Income Estimation: 
$135,356 - $164,911
Income Estimation: 
$153,902 - $198,246
Income Estimation: 
$98,763 - $126,233
Income Estimation: 
$116,330 - $143,011
Income Estimation: 
$113,077 - $147,784
Income Estimation: 
$135,356 - $164,911
Income Estimation: 
$153,053 - $187,211
Income Estimation: 
$153,902 - $198,246
Income Estimation: 
$116,330 - $143,011
Income Estimation: 
$135,356 - $164,911
Income Estimation: 
$153,902 - $198,246
View Core, Job Family, and Industry Job Skills and Competency Data for more than 15,000 Job Titles Skills Library

Job openings at Tykhe Inc

Tykhe Inc
Hired Organization Address Palo Alto, CA Full Time
We are seeking experienced Multimodal and Vision AI Engineers/Scientists to research, develop, optimize, and deploy Visi...
Tykhe Inc
Hired Organization Address Palo Alto, CA Full Time
Would you be interested to be a part of fast-growing AI company in Palo Alto, California where you contribute your exper...

Not the job you're looking for? Here are some other Research Engineer, Speech Foundation Models jobs in the Palo Alto, CA area that may be a better fit.

AI Assistant is available now!

Feel free to start your new journey!