Demo

Member of Technical Staff, Data Pipeline

Boson AI
Santa Clara, CA Full Time
POSTED ON 3/20/2025
AVAILABLE BEFORE 4/18/2025

Boson AI is an early-stage startup building large language tools for everyone to use. Our founders (Alex Smola, Mu Li), and a team of Deep Learning, Optimization, NLP, AutoML and Statistics scientists and engineers are working on high quality generative AI models for language and beyond.

We are seeking machine learning engineers to join our team full-time in our Santa Clara office. As part of your role, you will help us build pipelines of data collection, data filtering, synthetic data generation and data analysis. This will help us build more lifelike AI models. You will work closely with other scientists and engineers to empower our next generation of large multimodal model.

Making sure you fit the guidelines as an applicant for this role is essential, please read the below carefully.

Responsibilities :

  • Design and develop data collection pipelines to gather and preprocess diverse datasets (beyond language) from various sources (beyond web crawls).
  • Design and develop data processing pipelines, including data labeling, data filtering, data cleaning, data visualization, data auditing, etc.
  • Implement machine learning models to improve the quality and diversity of data, e.g., quality classifier, document layout model, speech transcribe model.

You may be a good fit if you have :

  • Strong proficiency in building large-scale data processing pipelines, familiar with distributed workload (e.g., multiprocessing, Ray, Docker, Kubernetes).
  • Proficiency in at least one programming language commonly used in machine learning, such as Python and ability to write clean, maintainable code.
  • Proficiency in at least one deep learning framework, such as PyTorch.
  • Proficiency in database management.
  • PhD or Master's degree in computer science or equivalent.
  • Excellent problem-solving skills and attention to detail, especially when handling data anomalies and biases to further improve data quality.
  • Strong candidates may also have :

  • Familiar with at least one of the following tools for data labeling (e.g., LabelStudio), data collection (e.g., VPNs, Selenium), data processing (e.g., Hadoop, Datasketch).
  • Experience in building large-scale datasets.
  • Hands-on experience in the cloud, like AWS, Azure or GCP.
  • Experience in machine learning, e.g., projects in language / vision / audio.
  • Active Github contributions are a big plus.
  • Multilingual which contributes to enriching the language diversity crucial for robust model training.
  • Experience with fairness, toxicity, data privacy regulations and compliance considerations.
  • 150,000 - $300,000 a yearBoson AI offers 401k with employer matching, Gold level healthcare, HSA, FSA and free meals (we have dried mangoes, too).

    J-18808-Ljbffr

    Salary : $150,000 - $300,000

    If your compensation planning software is too rigid to deploy winning incentive strategies, it’s time to find an adaptable solution. Compensation Planning
    Enhance your organization's compensation strategy with salary data sets that HR and team managers can use to pay your staff right. Surveys & Data Sets

    What is the career path for a Member of Technical Staff, Data Pipeline?

    Sign up to receive alerts about other jobs on the Member of Technical Staff, Data Pipeline career path by checking the boxes next to the positions that interest you.
    Income Estimation: 
    $36,436 - $44,219
    Income Estimation: 
    $50,145 - $86,059
    Income Estimation: 
    $48,515 - $60,705
    Income Estimation: 
    $89,551 - $118,439
    Income Estimation: 
    $116,726 - $151,072
    Income Estimation: 
    $124,724 - $161,246
    Income Estimation: 
    $77,900 - $95,589
    Income Estimation: 
    $101,387 - $124,118
    Income Estimation: 
    $101,387 - $124,118
    Income Estimation: 
    $119,030 - $151,900
    Income Estimation: 
    $119,030 - $151,900
    Income Estimation: 
    $149,493 - $192,976
    View Core, Job Family, and Industry Job Skills and Competency Data for more than 15,000 Job Titles Skills Library

    Not the job you're looking for? Here are some other Member of Technical Staff, Data Pipeline jobs in the Santa Clara, CA area that may be a better fit.

    Member of Technical Staff (Data Infra)

    Contextual AI, Mountain View, CA

    Member of Technical Staff, Data Scientist / Statistician

    Terapeuta em São José | Divina essência, Palo Alto, CA

    AI Assistant is available now!

    Feel free to start your new journey!