What are the responsibilities and job description for the Senior Data Engineer (Remote) position at sailteam.io?
Senior Data EngineerTeam: MLJob DescriptionJob Title: Senior Data EngineerLocation: Remote (Eastern Time Zone Hours)
Employment Type: Full-TimeAbout UsWe are a Computer Vision Product Company on a mission to dramatically increase the operational safety of critical rope applications by delivering real-time data to the right people before catastrophic failures occur.
Failures in critical rope applications are often due to inadequate visual inspection, a standard practice in industries such as Construction, Maritime Mooring, Mining, and Oil & Gas/Drilling. When these ropes fail, lives are lost, and company reputations suffer.
At Scope, we are leveraging the latest advancements in technology to solve this problem. Our current focus is Electric Utility Construction and Maintenance, where we equip operators with the ability to assess the break strength of their stringing lines—without destructive testing. This eliminates reliance on "educated guesses" and allows companies to confidently ensure their lines are fit for service.What You’ll Do
Employment Type: Full-TimeAbout UsWe are a Computer Vision Product Company on a mission to dramatically increase the operational safety of critical rope applications by delivering real-time data to the right people before catastrophic failures occur.
Failures in critical rope applications are often due to inadequate visual inspection, a standard practice in industries such as Construction, Maritime Mooring, Mining, and Oil & Gas/Drilling. When these ropes fail, lives are lost, and company reputations suffer.
At Scope, we are leveraging the latest advancements in technology to solve this problem. Our current focus is Electric Utility Construction and Maintenance, where we equip operators with the ability to assess the break strength of their stringing lines—without destructive testing. This eliminates reliance on "educated guesses" and allows companies to confidently ensure their lines are fit for service.What You’ll Do
- Architect and build scalable data pipelines and workflows using Dagster to move, transform, and make data available for machine learning and analytics.
- Design and optimize storage solutions for large-scale industrial and vision data, ensuring efficient retrieval and accessibility for ML engineers.
- Develop robust data ingestion frameworks for consuming live production images, video, and metadata in an extensible and scalable manner.
- Collaborate with ML engineers to ensure data- both computer vision and ancillary metadata- is structured and processed optimally for experimentation and model training.
- Work with Kubernetes-based environments to orchestrate and deploy data processing jobs.
- Enhance CI/CD for data workflows, ensuring automated deployment and testing via GitLab CI/CD. We deploy on merge and you’ll make that better, faster, safer, and cheaper.
- Own and maintain AWS-based data infrastructure, leveraging Terraform for Infrastructure as Code.
- Implement data governance best practices, including data quality validation, lineage tracking, and metadata management.
- Optimize batch and real-time processing frameworks, incorporating best practices for performance, scalability, and reliability.
- Act as a technical leader in data engineering, defining best practices and guiding future scaling efforts.
- 5 years of experience in data engineering, with a focus on scalable, production-grade data infrastructure.
- Strong Python skills with emphasis on type safety, functional programming patterns, and modern Python practices. The ideal candidate has used Rust, Scala, Kotlin, F#, and/or a lisp dialect before.
- Experience with data processing frameworks such as Pandas (with Pandera), PyArrow, or Dask.
- Deep expertise in data orchestration tools, preferably Dagster (experience with Prefect, Airflow, NiFi, or similar tools is acceptable).
- Experience with streaming and event-driven architectures such as Apache Ray Core, Kafka, Kinesis, Pulsar, Storm, or Dempsy, or real time data processing frameworks like Flink or Spark Streaming.
- Hands-on experience with Kubernetes, particularly in data pipeline orchestration.
- Experience deploying infrastructure via Terraform (or similar IaC tools).
- Proficiency in Cloud Services, preferably AWS. S3, EKS, Lambda, Glue, and RDS (or other-cloud equivalents).
- Strong database skills, including SQL, NoSQL, and columnar storage (e.g., Postgres, BigQuery, ClickHouse).
- Experience with strongly-typed ORMs (e.g., SQLAlchemy/SQLModel, Hibernate, Diesel) and data validation frameworks (e.g., Pydantic, Great Expectations).
- Comfortable with hybrid storage, combining databases and blob storage for large objects such as videos and computer vision datasets.
- CI/CD expertise, preferably with GitLab for managing automated data pipeline deployments.
- Familiarity with ML experiment tracking, metadata management, and data lineage tracking.
- Understanding of ML workflows and how data engineering enables efficient model training/deployment.
- Experience with embedding management, particularly for inference stores, using tools such as Chroma or pg_vector.
- Experience with video processing pipelines and efficient storage/retrieval of large media files.
- A chance to own and shape the data infrastructure at a fast-growing computer vision AI company.
- A highly collaborative, fast-paced environment working with cutting-edge ML and data engineering.
- Competitive salary, annual incentive plan, and benefits.
- Opportunities for growth and leadership as we scale our data team.
Salary : $109,100 - $148,200