Alldus is Hiring a Multimodal ML Researcher Near Seattle, WA
Our client, an exciting venture-backed AI-driven startup, is hiring a Multimodal ML Researcher to join their team in Seattle. The successful candidate will lead research on multi-modal input and output from LLMs, including voice and image encoders and decoders, with a focus on text-to-speech, speech-to-text and speech-to-animation capabilities. Responsibilities:
As the Multimodal ML Researcher, you will conduct research on multi-modal input and output from LLMs, encompassing voice and image encoders and decoders, focusing on text-to-speech, speech-to-text and speech-to-animation capabilities.
You will enhance voice and vision models through training using both public and proprietary data sources.
Review and optimize the company’s data flywheel to ensure streamlined operations.
Develop methodologies to enhance model efficiency, accuracy, and overall quality.
Create tools for assessing and monitoring model performance and quality.
Skillset:
A highly skilled AI researcher with a proven track record of advancing AI products and systems.
Proficiency with Large Language Models or other generative AI models.
Experience in developing speech or vision models.
Strong proficiency in Python and preferably PyTorch.
Demonstrated ability to take initiative and achieve results.