What are the responsibilities and job description for the Sr. Staff/Staff AI/ML Engineer position at GEICO Tech?
About the job
Staff/Sr. Staff AI/ML Engineer
Overview
GEICO is seeking an experienced Staff or Sr. Staff Machine Learning Engineer to join the AI organization. This person will take on a critical leadership role in designing, implementing, and deploying cutting-edge machine learning models that solve real-world business challenges.
Key Responsibilities
- Lead the Design & Implementation of ML Models: Lead the architecture and implementation of machine learning models, working closely with Product, Business Units, and Engineering teams.
- Build Scalable Infrastructure: Design and develop scalable infrastructure for model training, automated hyperparameter tuning, and deployment pipelines, ensuring that systems are reliable and performant at scale.
- Write Production-Grade Code for ML Services and APIs: Write high-quality, maintainable production-grade code that turns machine learning models into deployable services and APIs. Ensure that code is modular and reusable for future ML projects.
- Optimize Model Performance and Resolve Issues: Debug and troubleshoot model performance issues, track key metrics, and continuously enhance model reliability, speed, and efficiency in production environments.
- End-to-End Model Lifecycle Management: Own the complete lifecycle of ML models, including monitoring, retraining, and managing versions of models to ensure they continue to meet business needs over time.
- Leadership and Mentorship: Guide and mentor junior machine learning engineers, promote best practices in software engineering, model development, and deployment. Lead technical decision-making processes and foster collaboration within the team.
- Collaboration Across Teams: Collaborate with cross-functional teams (e.g., data engineering, software development, and product management) to integrate machine learning models and ensure smooth deployment and operations in production systems.
- Stay Up to Date with Industry Trends: Continuously explore and integrate new machine learning techniques and system engineering tools, ensuring the team remains at the forefront of machine learning and systems architecture practices.
Basic Qualifications
- B.Sc. in Computer Science, Machine Learning, Engineering, or a related technical field.
- 6 years of hands-on experience applying machine learning techniques, including deep learning, reinforcement learning, and NLP in production environments.
- 6 years of experience utilizing open-source/cloud-agnostic components such as data warehouse (e.g. snowflake), streaming platform (e.g. Kafka), relational database (e.g. PostgreSQL), NoSQL (e.g. MongoDB, Cassandra), distributed processing (e.g. Spark, Ray), workflow management (e.g. Airflow, Temporal), etc.
- 6 years of professional software development experience with at least two general-purpose programming languages such as Java, C , Python or C#.
- 6 years of experience with machine learning frameworks such as TensorFlow, PyTorch, Scikit-learn for model development.
- At least 4 years of experience with cloud platforms (AWS, Azure, GCP) and containerization technologies such as Docker, as well as orchestration tools like Kubernetes.
- Proven experience in deploying machine learning models in a production environment, ensuring scalability, reliability, and high availability.
Core Engineering Skills & Knowledge
- Extensive experience with object-oriented design (OOD), design patterns, writing clean, and maintainable code. Proficiency in version control (Git) and familiarity with Agile methodologies.
- Solid understanding of distributed systems and the challenges associated with scaling machine learning models in production, such as managing distributed data processing and microservices architectures.
- Expertise in implementing MLOPs practices, including setting up continuous integration (CI), continuous delivery (CD), automated testing, and deployment pipelines for ML models.
- Strong understanding of system architecture, performance optimization, designing fault-tolerant systems that handle large-scale data and high-volume requests.
- Experience designing and deploying machine learning models using cloud-based environments like AWS, Azure, or Google Cloud. Familiarity with cloud-native tools such as AWS Sage Maker, GCP AI Platform, or Azure Machine Learning.
- Experience setting up monitoring and logging systems to track performance in production environments and ensuring efficient resource utilization.
Preferred Qualifications
- Experience with designing and building high-performance distributed systems that handle large-scale data ingestion and processing for machine learning workloads.
- Experience with real-time inference pipelines and low-latency model serving.
- Familiar with serverless computing or managed services for ML model deployment.
- Advanced degree (M.Sc., Ph.D.) in a related field is a plus.
- Experience in working with GPU/TPU optimization for accelerated model training and inference.
Benefits
We offer competitive pay, comprehensive benefits package, paid vacation, sick leave, parental leave, 401(k) plan, tuition assistance, and more!
Equal Employment Opportunity
GEICO is an equal employment opportunity employer and provides a work environment in which each associate is able to be productive and work to the best of their ability. We do not condone or tolerate an atmosphere of intimidation or harassment. We expect and require the cooperation of all associates in maintaining an atmosphere free from discrimination and harassment with mutual respect by and for all associates and applicants.