Demo

Founding Sr. Data Engineer - RAG/LLMs

Crew Capital
New York, NY Full Time
POSTED ON 1/25/2025
AVAILABLE BEFORE 2/21/2025

This role is open within one of our portfolio companies, currently in stealth.


Spearhead technical direction. Drive product vision. Shape the future of market research.


Our Mission

Fundamentally transform the multi-billion dollar research industry using advanced AI.


Our Company

We are at the forefront of a breakthrough in the research industry. By seamlessly integrating advanced AI across traditional workflows, we’re reshaping the multi-billion dollar alt-data and research industry, providing unparalleled insights to due diligence personas. Our traction with the world’s leading corporations and investment teams has spurred us to expand our engineering and data science group.


The Job

We are looking for a Data Scientist who specializes in Natural Language Processing (NLP) models with a strong interest or background in building Retrieval-Augmented Generation (RAG) applications. You will play a critical role in evolving our AI-powered insights platform—driving innovation and harnessing cutting-edge NLP techniques to enhance how market and alt-data are created, consumed, and understood.

You will work closely with our founding team and diverse stakeholders, both technical and non-technical, to develop and integrate state-of-the-art language models, pipeline architectures, and retrieval systems. Your contributions will directly influence our product's direction and scalability.


Key Responsibilities

  • Design, implement, and optimize large language models (LLMs) (both fine-tuned and pre-built) and NLP pipelines for use in RAG-based workflows.
  • Develop approaches to contextualize LLM responses with retrieved and proprietary datasets (text transcripts, financial documents, voice-to-text outputs).
  • Collaborate with data engineers to design efficient data ingestion, preprocessing, and transformation workflows.
  • Be updated with published research and any latest advancements in LLMs, tokenization strategies, and NLP framework.
  • Work with vector databases (e.g., Pinecone, Weaviate) to store, retrieve, and rank embeddings for real-time queries.
  • Partner with backend engineers to integrate AI-driven features into a scalable, secure infrastructure.
  • Constantly innovate and ideate new ways to integrate AI-enabled search across the platform (document search, entity search etc)


Technical Requirements

  • 5 years of experience in data science or machine learning roles, ideally some of it in a startup or fast-paced environment.
  • Extensive experience with LLMs and building custom NLP, including familiarity with at least one major framework (Hugging Face, PyTorch, TensorFlow).
  • Prior experience with RAG (Retrieval-Augmented Generation), including vector database integration, embedding creation, and search optimization.
  • Proficiency in Python for data science (NumPy, Pandas, scikit-learn) and typical backend frameworks or services.
  • Hands-on experience with unstructured data ingestion (e.g., text, audio) and processing large volumes of text from financial transcripts, news, etc.
  • Familiarity with query optimization in data stores and vector databases (Milvus, Pinecone, Elasticsearch, OpenSearch).
  • Strong problem-solving skills; can work independently as well as on a cross-functional team.


Preferred Qualifications

  • Master’s or PhD in Computer Science, Machine Learning, or a related field.
  • Expertise in working with deep-learning based generative models, LLM, natural language processing or similar research experience with deep-learning model fine-tuning and deployment.
  • Experience with MLOps tools and processes for continuous integration and deployment of ML models.
  • Familiarity with audio-to-text transcription models (e.g., Whisper) and advanced text generation models (GPT-4, Llama, etc.).
  • Hands-on experience with big data technologies (Spark, Kafka, etc.) and query optimization in Postgres or similar databases.
  • Previous experience working in a startup, especially in fintech, research, or an AI-driven environment.
  • Knowledge of distributed systems and client-server architecture for scalable ML model serving and inference.


What to Expect Working Here

  • You will be held accountable to an exceptionally high bar and have a significant impact from Day 1.
  • This may be the fastest work environment you will ever experience in terms of growth, decision-making, and time to impact.
  • You will be empowered to set your own boundaries and experiment with new ideas.
  • You will create processes & products that have never existed before in the market research space.


And…

  • You’ll be guided by highly accomplished professionals with deep experience in technology, financial services, and market research.
  • You’ll have the opportunity to be part of a small, early team shaping a multi-billion dollar alt-data and market research industry.
  • You will experience transparency, integrity, and humility from leadership.
  • You’ll have space to experiment & innovate with the latest AI tools.
  • You get to be part of a winning team where work is matched by fun and camaraderie.


What We Offer

  • Competitive compensation package with base salary between $145K - $165K per year, performance bonus and a significant equity stake in the company.
  • Incentive pay (bonus) tied to both your performance and the company’s performance.
  • Flexible paid time off in addition to paid holidays throughout the year.
  • Hybrid work arrangement (3 days in, 2 days out) in NYC, blending collaboration with flexibility.


If you are excited to redefine market research through cutting-edge NLP and a RAG powered research application, we’d love to talk. Join us in our mission to shape the future of how insights are created and consumed.

Salary : $145,000 - $165,000

If your compensation planning software is too rigid to deploy winning incentive strategies, it’s time to find an adaptable solution. Compensation Planning
Enhance your organization's compensation strategy with salary data sets that HR and team managers can use to pay your staff right. Surveys & Data Sets

What is the career path for a Founding Sr. Data Engineer - RAG/LLMs?

Sign up to receive alerts about other jobs on the Founding Sr. Data Engineer - RAG/LLMs career path by checking the boxes next to the positions that interest you.
Income Estimation: 
$122,257 - $154,284
Income Estimation: 
$143,391 - $179,890
Income Estimation: 
$90,112 - $113,166
Income Estimation: 
$116,765 - $144,626
Income Estimation: 
$92,929 - $122,443
Income Estimation: 
$122,257 - $154,284
Income Estimation: 
$116,765 - $144,626
Income Estimation: 
$142,836 - $179,016
Income Estimation: 
$122,257 - $154,284
Income Estimation: 
$143,391 - $179,890
View Core, Job Family, and Industry Job Skills and Competency Data for more than 15,000 Job Titles Skills Library

Job openings at Crew Capital

Crew Capital
Hired Organization Address New York, NY Full Time
This role is within one of our portfolio companies, which is in stealth. Our mission Fundamentally transform the multi-b...
Crew Capital
Hired Organization Address New York, NY Full Time
This role is open within one of our portfolio companies, currently in stealth. Spearhead technical direction. Drive prod...
Crew Capital
Hired Organization Address New York, NY Full Time
This role is within one of our portfolio companies, which is in stealth. Spearhead technical direction. Drive product vi...
Crew Capital
Hired Organization Address New York, NY Full Time
This role is open within one of our portfolio companies, currently in stealth. Spearhead technical direction. Drive prod...

Not the job you're looking for? Here are some other Founding Sr. Data Engineer - RAG/LLMs jobs in the New York, NY area that may be a better fit.

Founding Sr. Data Research Engineer - LLMs

Crew Capital, New York, NY

AI Assistant is available now!

Feel free to start your new journey!