What are the responsibilities and job description for the Senior Systems Reliability Engineer position at CHESS Solutions, LLC?
WHO WE ARE
Chess Solutions, LLC is a Virginia-based government contracting company that builds digital vetting tools. We work collaboratively with a number of clients to develop tools to facilitate client goals and mission objectives. In conjunction with our parent company, Presage Technologies, we integrate state-of-the-art vision-based physiological analysis tools and digital media forensic tools into robust software platforms. Our goal is to provide our clients with the most accurate digital vetting and analysis tools possible in an easy-to-use, modern software experience.
WHAT YOU’LL DO
As a Senior Reliability Engineer specializing in AWS Cloud technologies, you will work closely
with our cross-functional engineering team to design, build, and maintain the cloud
infrastructure, automation, and backend services that power our products.
This role requires technical versatility —you will work across a range of architectures ,
including serverless , containerized, and VM-based environments. You’ll also play a key role
in ensuring the scalability, security, and performance of our systems while improving
developer workflows and automation. You will be responsible for building DevOps and MLOps infrastructure to support machine learning and software development as well as provide general network and systems administration support to users.
You are a strong collaborator and communicator. You are able to plan and estimate your time, are self-directed in development, and you communicate dependencies well in advance. You are able to build rapport and trust with customers, translate customer requirements into roadmap items, and develop consensus on prioritization across a wide set of customer constituencies. You are a bug hunter and default to system issues being in your part of the stack, and you expect others to operate similarly. You understand the value of unit tests, CI / CD pipelines, and establishing quality assurance metrics and processes.
You write excellent documentation at all phases of a project. You plan well to communicate intent and design, welcoming others to provide feedback and input into your project planning. You understand there are multiple levels of documentation to produce, including for internal development, external integrators, system security plans and compliance, and end-users.
This is a hybrid role , with an expectation of working in-office or collaborating with customers / teammates in Northern Virginia (Herndon / Leesburg) or St. Paul, Minnesota.
KEY RESPONSIBILITIES
DevOps & Infrastructure
- CI / CD Automation : Design and maintain pipelines for mobile applications, backend services, and machine learning workflows to ensure fast, reliable deployments.
- Infrastructure as Code (IaC) : Implement and manage AWS infrastructure using CloudFormation, Terraform, Helm, and Docker Compose .
- Optimization : Continuously monitor and optimize performance, cost, and reliability of applications.
- Observability : Deploy monitoring, logging, and alerting solutions to track system performance and detect anomalies.
- Developer Experience : Enhance development workflows by improving CI / CD and infrastructure automation .
- Troubleshooting : Investigate and resolve infrastructure and backend issues to ensure smooth deployments.
Cloud Architecture & Security
Machine Learning Operations (MLOps)
Backend Development
WHO YOU ARE
You are a self-driven, adaptable engineer who thrives in a fast-paced startup environment. You enjoy solving complex problems, collaborating with a team, and continuously learning new technologies .
To succeed in this role, you should have :
If you're driven by the challenge of tackling deepfake threats and passionate about pushing the boundaries of technology, we encourage you to apply for the Senior Systems Reliability Engineer role at Chess Solutions LLC. Join our innovative and collaborative team, where you'll work on real-world problems, grow alongside experts in media forensics, and contribute to a mission with global impact.
INTERVIEW PROCESS
Timeline : The entire process can occur in less than two weeks for the right candidate.