What are the responsibilities and job description for the Founding Sr. Data Engineer - RAG/LLMs position at Crew Capital?
This role is open within one of our portfolio companies, currently in stealth.
Spearhead technical direction. Drive product vision. Shape the future of market research.
Our Mission
Fundamentally transform the multi-billion dollar research industry using advanced AI.
Our Company
We are at the forefront of a breakthrough in the research industry. By seamlessly integrating advanced AI across traditional workflows, we’re reshaping the multi-billion dollar alt-data and research industry, providing unparalleled insights to due diligence personas. Our traction with the world’s leading corporations and investment teams has spurred us to expand our engineering and data science group.
The Job
We are looking for a Data Scientist who specializes in Natural Language Processing (NLP) models with a strong interest or background in building Retrieval-Augmented Generation (RAG) applications. You will play a critical role in evolving our AI-powered insights platform—driving innovation and harnessing cutting-edge NLP techniques to enhance how market and alt-data are created, consumed, and understood.
You will work closely with our founding team and diverse stakeholders, both technical and non-technical, to develop and integrate state-of-the-art language models, pipeline architectures, and retrieval systems. Your contributions will directly influence our product's direction and scalability.
Key Responsibilities
- Design, implement, and optimize large language models (LLMs) (both fine-tuned and pre-built) and NLP pipelines for use in RAG-based workflows.
- Develop approaches to contextualize LLM responses with retrieved and proprietary datasets (text transcripts, financial documents, voice-to-text outputs).
- Collaborate with data engineers to design efficient data ingestion, preprocessing, and transformation workflows.
- Be updated with published research and any latest advancements in LLMs, tokenization strategies, and NLP framework.
- Work with vector databases (e.g., Pinecone, Weaviate) to store, retrieve, and rank embeddings for real-time queries.
- Partner with backend engineers to integrate AI-driven features into a scalable, secure infrastructure.
- Constantly innovate and ideate new ways to integrate AI-enabled search across the platform (document search, entity search etc)
Technical Requirements
- 5 years of experience in data science or machine learning roles, ideally some of it in a startup or fast-paced environment.
- Extensive experience with LLMs and building custom NLP, including familiarity with at least one major framework (Hugging Face, PyTorch, TensorFlow).
- Prior experience with RAG (Retrieval-Augmented Generation), including vector database integration, embedding creation, and search optimization.
- Proficiency in Python for data science (NumPy, Pandas, scikit-learn) and typical backend frameworks or services.
- Hands-on experience with unstructured data ingestion (e.g., text, audio) and processing large volumes of text from financial transcripts, news, etc.
- Familiarity with query optimization in data stores and vector databases (Milvus, Pinecone, Elasticsearch, OpenSearch).
- Strong problem-solving skills; can work independently as well as on a cross-functional team.
Preferred Qualifications
- Master’s or PhD in Computer Science, Machine Learning, or a related field.
- Expertise in working with deep-learning based generative models, LLM, natural language processing or similar research experience with deep-learning model fine-tuning and deployment.
- Experience with MLOps tools and processes for continuous integration and deployment of ML models.
- Familiarity with audio-to-text transcription models (e.g., Whisper) and advanced text generation models (GPT-4, Llama, etc.).
- Hands-on experience with big data technologies (Spark, Kafka, etc.) and query optimization in Postgres or similar databases.
- Previous experience working in a startup, especially in fintech, research, or an AI-driven environment.
- Knowledge of distributed systems and client-server architecture for scalable ML model serving and inference.
What to Expect Working Here
- You will be held accountable to an exceptionally high bar and have a significant impact from Day 1.
- This may be the fastest work environment you will ever experience in terms of growth, decision-making, and time to impact.
- You will be empowered to set your own boundaries and experiment with new ideas.
- You will create processes & products that have never existed before in the market research space.
And…
- You’ll be guided by highly accomplished professionals with deep experience in technology, financial services, and market research.
- You’ll have the opportunity to be part of a small, early team shaping a multi-billion dollar alt-data and market research industry.
- You will experience transparency, integrity, and humility from leadership.
- You’ll have space to experiment & innovate with the latest AI tools.
- You get to be part of a winning team where work is matched by fun and camaraderie.
What We Offer
- Competitive compensation package with base salary between $145K - $165K per year, performance bonus and a significant equity stake in the company.
- Incentive pay (bonus) tied to both your performance and the company’s performance.
- Flexible paid time off in addition to paid holidays throughout the year.
- Hybrid work arrangement (3 days in, 2 days out) in NYC, blending collaboration with flexibility.
If you are excited to redefine market research through cutting-edge NLP and a RAG powered research application, we’d love to talk. Join us in our mission to shape the future of how insights are created and consumed.
Salary : $145,000 - $165,000