What are the responsibilities and job description for the Data Scientist position at Broadcast Music, Inc. (BMI)?
Position Summary
Builds scalable solutions for entity resolution, record linkage, and semantic data matching in a high-volume, big-data environment.
LOCATION
Remote - (US)
Functions Of The Job
Essential Functions: which may be representative but not all inclusive of those commonly associated with this position.
Education: Bachelor’s degree preferred
Experience: Minimum five years of experience with NLP techniques, including tokenization, embedding generation and text similarity measures. Minimum five years of experience working in a large data environment.
Skills And Abilities
Which may be representative but not all inclusive of those commonly associated with this position.
The specific base salary offered to a successful applicant will be based on individual qualifications, skills, experience, and education. The pay range is subject to change at any time based on various internal and external factors. The position may also be eligible for one or more performance-based bonuses. In addition to cash compensation, BMI offers a competitive portfolio of benefits to its employees, as described below.
What We Give To You
Builds scalable solutions for entity resolution, record linkage, and semantic data matching in a high-volume, big-data environment.
LOCATION
Remote - (US)
Functions Of The Job
Essential Functions: which may be representative but not all inclusive of those commonly associated with this position.
- Designs and implements Natural Language Processing (NLP) pipelines for preprocessing and analyzing “noisy” or incomplete text data.
- Utilizes embeddings (e.g., word, sentence, multilingual) for semantic similarity and feature engineering in record linkage workflows.
- Updates NLP models for domain-specific tasks, such as abbreviation recognition and title normalization.
- Develops and trains machine learning models for match/no-match classification.
- Optimizes hyperparameters and enhance model performance.
- Deploys NLP and Machine Learning (ML) models into batch and streaming pipeline using Databricks.
- Manages model lifecycle, including versioning, deployment and monitoring.
- Implements monitoring solutions to detect model drift and continuously refine solutions based on real-world performance.
- Collaborates with Data Analysts to extract actionable insights from datasets, including text data.
- Collaborates with Data Engineers to integrate NLP and ML models into scalable Extract, Transform, Load (ETL) pipelines.
- Partners with stakeholders to align technical solutions with business needs.
- Explores cutting edge NLP approaches, such as transformer-based models, for improving text matching.
- Evaluates new tools and frameworks, including vector databases, to enhance the AI/ML pipeline.
- Researches multilingual and cross-lingual NLP solutions for entity resolution.
- Regular attendance.
- Other duties as assigned.
- Supports our BMI Core Values and cultivates a culture of diversity and inclusion.
Education: Bachelor’s degree preferred
Experience: Minimum five years of experience with NLP techniques, including tokenization, embedding generation and text similarity measures. Minimum five years of experience working in a large data environment.
Skills And Abilities
Which may be representative but not all inclusive of those commonly associated with this position.
- Proficiency in using NLP libraries like Hugging Face Transformers, SpaCy, or NLTK.
- Familiarity with transformer-based models (e.g., BERT, RoBERTa) for text representation and fine-tuning.
- Strong knowledge of supervised and unsupervised machine learning techniques for classification, clustering, and entity resolution.
- Experience with Horovod for distributed training and Hyperopt for hyperparameter optimization.
- Proficiency in PySpark MLlib and Python libraries like Scikit-learn, TensorFlow, or PyTorch.
- Expertise in processing and analyzing large datasets using PySpark and Databricks.
- Experience integrating machine learning and NLP workflows into Databricks pipelines
- Strong programming skills in Python, with a focus on machine learning and NLP.
- Proficiency in SQL for querying and transforming large datasets.
- Experience with MLflow or equivalent tools for model management and deployment.
- Ability to monitor and fine-tune deployed models based on feedback and real-world performance.
- Familiarity with NLP-specific metrics (e.g., BLEU, ROUGE) and ML metrics (e.g., precision, recall, F1 score).
- Strong problem-solving skills with a focus on minimizing false positives in data matching tasks.
- Experience with multilingual NLP for handling datasets in multiple languages.
- Knowledge of vector databases like Faiss for efficient similarity search and nearest neighbor tasks.
- Familiarity with Azure tools, including Azure Data Factory, Hyperscale SQL, and ADLS a plus.
- Knowledge of large-scale distributed computing frameworks like Hadoop.
The specific base salary offered to a successful applicant will be based on individual qualifications, skills, experience, and education. The pay range is subject to change at any time based on various internal and external factors. The position may also be eligible for one or more performance-based bonuses. In addition to cash compensation, BMI offers a competitive portfolio of benefits to its employees, as described below.
What We Give To You
- Health, dental, and vision insurance
- 401K with employer match
- Flexible spending accounts
- Paid vacation and paid sick/personal time
- 12 paid calendar holidays
- Paid volunteer time off
- Summer hours that offer more time for fun in the sun
- Company paid life insurance
- Up to 12 weeks paid parental leave
- Tuition assistance for qualified team members
- Commuter benefits (New York)
- Amazing and engaging culture
- Employee Resource Groups