What are the responsibilities and job description for the Founding Sr. Data Engineer - RAG/LLMs position at Crew Capital?

This role is open within one of our portfolio companies, currently in stealth.

Spearhead technical direction. Drive product vision. Shape the future of market research.

Our Mission

Fundamentally transform the multi-billion dollar research industry using advanced AI.

Our Company

We are at the forefront of a breakthrough in the research industry. By seamlessly integrating advanced AI across traditional workflows, we’re reshaping the multi-billion dollar alt-data and research industry, providing unparalleled insights to due diligence personas. Our traction with the world’s leading corporations and investment teams has spurred us to expand our engineering and data science group.

The Job

We are looking for a Data Scientist who specializes in Natural Language Processing (NLP) models with a strong interest or background in building Retrieval-Augmented Generation (RAG) applications. You will play a critical role in evolving our AI-powered insights platform—driving innovation and harnessing cutting-edge NLP techniques to enhance how market and alt-data are created, consumed, and understood.

You will work closely with our founding team and diverse stakeholders, both technical and non-technical, to develop and integrate state-of-the-art language models, pipeline architectures, and retrieval systems. Your contributions will directly influence our product's direction and scalability.

Key Responsibilities

Design, implement, and optimize large language models (LLMs) (both fine-tuned and pre-built) and NLP pipelines for use in RAG-based workflows.
Develop approaches to contextualize LLM responses with retrieved and proprietary datasets (text transcripts, financial documents, voice-to-text outputs).
Collaborate with data engineers to design efficient data ingestion, preprocessing, and transformation workflows.
Be updated with published research and any latest advancements in LLMs, tokenization strategies, and NLP framework.
Work with vector databases (e.g., Pinecone, Weaviate) to store, retrieve, and rank embeddings for real-time queries.
Partner with backend engineers to integrate AI-driven features into a scalable, secure infrastructure.
Constantly innovate and ideate new ways to integrate AI-enabled search across the platform (document search, entity search etc)

Technical Requirements

5 years of experience in data science or machine learning roles, ideally some of it in a startup or fast-paced environment.
Extensive experience with LLMs and building custom NLP, including familiarity with at least one major framework (Hugging Face, PyTorch, TensorFlow).
Prior experience with RAG (Retrieval-Augmented Generation), including vector database integration, embedding creation, and search optimization.
Proficiency in Python for data science (NumPy, Pandas, scikit-learn) and typical backend frameworks or services.
Hands-on experience with unstructured data ingestion (e.g., text, audio) and processing large volumes of text from financial transcripts, news, etc.
Familiarity with query optimization in data stores and vector databases (Milvus, Pinecone, Elasticsearch, OpenSearch).
Strong problem-solving skills; can work independently as well as on a cross-functional team.

Preferred Qualifications

Master’s or PhD in Computer Science, Machine Learning, or a related field.
Expertise in working with deep-learning based generative models, LLM, natural language processing or similar research experience with deep-learning model fine-tuning and deployment.
Experience with MLOps tools and processes for continuous integration and deployment of ML models.
Familiarity with audio-to-text transcription models (e.g., Whisper) and advanced text generation models (GPT-4, Llama, etc.).
Hands-on experience with big data technologies (Spark, Kafka, etc.) and query optimization in Postgres or similar databases.
Previous experience working in a startup, especially in fintech, research, or an AI-driven environment.
Knowledge of distributed systems and client-server architecture for scalable ML model serving and inference.

What to Expect Working Here

You will be held accountable to an exceptionally high bar and have a significant impact from Day 1.
This may be the fastest work environment you will ever experience in terms of growth, decision-making, and time to impact.
You will be empowered to set your own boundaries and experiment with new ideas.
You will create processes & products that have never existed before in the market research space.

And…

You’ll be guided by highly accomplished professionals with deep experience in technology, financial services, and market research.
You’ll have the opportunity to be part of a small, early team shaping a multi-billion dollar alt-data and market research industry.
You will experience transparency, integrity, and humility from leadership.
You’ll have space to experiment & innovate with the latest AI tools.
You get to be part of a winning team where work is matched by fun and camaraderie.

What We Offer

Competitive compensation package with base salary between $145K - $165K per year, performance bonus and a significant equity stake in the company.
Incentive pay (bonus) tied to both your performance and the company’s performance.
Flexible paid time off in addition to paid holidays throughout the year.
Hybrid work arrangement (3 days in, 2 days out) in NYC, blending collaboration with flexibility.

If you are excited to redefine market research through cutting-edge NLP and a RAG powered research application, we’d love to talk. Join us in our mission to shape the future of how insights are created and consumed.

Salary : $145,000 - $165,000

Apply for this job

Receive alerts for other Founding Sr. Data Engineer - RAG/LLMs job openings

Founding Sr. Data Engineer - RAG/LLMs

What are the responsibilities and job description for the Founding Sr. Data Engineer - RAG/LLMs position at Crew Capital?

What is the career path for a Founding Sr. Data Engineer - RAG/LLMs?

Job openings at Crew Capital

Not the job you're looking for? Here are some other Founding Sr. Data Engineer - RAG/LLMs jobs in the New York, NY area that may be a better fit.

We don't have any other Founding Sr. Data Engineer - RAG/LLMs jobs in the New York, NY area right now.

AI Assistant is available now!