What are the responsibilities and job description for the Data Scientist (Only W2) position at Acetech Group Corporation?
Master's or PhD Preferred!
- 5-10 years of experience in AI and machine learning, model building and strong coding skills in python
- 2 years of working knowledge of applying recent LLMs including ChatGPT, GPT 3.5, OPT, BLOOM, etc. UTILIZING RAG!
- Experience working directly with large language models and Transformer based architectures including BERT, RoBERTa, T5 etc.
- Experience with conversational search / semantic search, reinforcement learning, prompt engineering, hallucination mitigation
- DevOps repos Debugging, building APIs and managing the algorithm flow across multiple workstreams in one repo
- Senior level experience deploying models in the Cloud (AWS) or Azure as secondary.
Nice to have: Candidate local to Raleigh is strongly preferred (hybrid schedule 2x per week)
- FANG Experience (Facebook, Amazon, Netflix, Google, or even Microsoft)
Secondary Skills - Nice to Haves
- Python
- Machine learning
- cloud computing
Core Technical Skills
Python Proficiency:
Expert level of Python, with experience in writing efficient, clean, and modular code.
Ability to debug and test new code thoroughly.
RAG Systems:
Experience and deep understanding of Retrieval-Augmented Generation (RAG), including concepts like embedding-based search, document retrieval, and combining retrieved information with LLMs.
Hands-on experience with advanced RAG platform development and maintenance.
Familiarity with knowledge base creation, indexing, and retrieval pipelines.
Knowledge of AI Architectures:
Understanding of the end-to-end architecture of generative AI systems, including pre-processing, retrieval, ranking, and post-processing steps.
Prompt Engineering:
Expertise in crafting effective prompts for LLMs tailored to specific tasks.
Experience with techniques like zero-shot, few-shot prompting, prompt tuning, and chain of thought.
Content Generation:
Understanding of generative AI applications in content creation, including best practices for producing accurate, coherent, and domain-specific outputs.
Ability to fine-tune components for custom use cases.
Debugging and Performance Tuning:
Skills in profiling and optimizing LLM responses for latency and accuracy.
Experience diagnosing issues in complex multi-component systems.
Monorepo and Collaboration Skills
Working in Monorepo Environments:
Experience managing and contributing to large, centralized codebases (monorepos).
Understanding of version control workflows suited for monorepos (e.g., Git-based branching strategies).
Collaboration Tools and Practices:
Proficient with CI/CD pipelines and tools like Jenkins, GitHub Actions, or GitLab CI.
Ability to work collaboratively with cross-functional teams in Agile settings.
Proficiency with code review practices and tools.
AI and NLP Knowledge
NLP Expertise:
Solid understanding of transformers, embeddings, and attention mechanisms.
Familiarity with techniques for handling domain-specific language models.
Complementary Skills
Documentation and Communication:
Ability to write clear technical documentation for processes, workflows, and API usage.
Strong communication skills for conveying technical insights to stakeholders.
Preferred Experience
Previous experience working in legal tech or domain-specific generative AI use cases.
Hands-on experience with deploying AI models in production at scale.
Familiarity with multilingual generative AI and fine-tuning for specific languages like French.