Demo

Senior Staff Engineer, Cloud SRE/DevOps (SD/DC/Remote) (R3138)

The Rundown
Washington, DC Remote Full Time
POSTED ON 2/14/2025
AVAILABLE BEFORE 5/11/2025

About Shield AI

Founded in 2015, Shield AI is a venture-backed defense technology company focused on protecting service members and civilians with intelligent systems. Its flagship autonomy software, Hivemind, powers aircraft, drones, and other platforms, enabling complex missions with high reliability in contested environments. With offices in San Diego, Dallas, Washington, D.C., and internationally, Shield AI’s products actively support U.S. and allied operations worldwide. For more information, visit www.shield.ai. Follow Shield AI on LinkedIn, Twitter, and Instagram.

As a Cloud SRE / DevOps Engineer on the Forge team, you will be responsible for optimizing Forge’s cloud deployments and owning the processes that enable customers to deploy their own Forge instances. You will manage Shield AI’s internal Hivemind instances, working closely with the software operations and engineering teams to ensure Forge can scale for simulation, testing, and bursts of use. You will also enable seamless upgrades, canary deployments, and system robustness. Additionally, you’ll serve as the primary point of contact for the customer engagement team, providing expert guidance on deploying Forge in customer environments.

What You'll Do :

  • Optimize cloud deployments of Forge to ensure scalability, reliability, and cost efficiency.
  • Design and document processes for external customers to deploy Forge instances using the SDK in on-premises or hybrid environments.
  • Manage and maintain internal Hivemind instances, ensuring their ability to handle large-scale simulation and testing workloads.
  • Collaborate with the software operations team to enhance Forge’s ability to scale dynamically, accommodate bursts of use, and support continuous upgrades with minimal disruption.
  • Develop tools and processes for canary deployments, ensuring smooth rollouts of new features and updates.
  • Serve as the primary technical consultant for the customer engagement team, providing expertise on deploying and managing Forge in external environments.
  • Create and maintain detailed, user-friendly documentation and tutorials for deployment processes, catering to both internal teams and external customers.
  • Monitor, troubleshoot, and resolve issues related to Forge deployments, ensuring high availability and performance.

Required Qualifications :

  • Typically requires a minimum of 10 years of related experience with a Bachelor’s degree; or 9 years and a Master’s degree; or 7 years with a PhD; or equivalent work experience.
  • 8 years of experience in DevOps, Site Reliability Engineering, or cloud infrastructure roles.
  • Expertise in cloud platforms such as AWS, Azure, or GCP, including deploying and managing scalable, distributed systems.
  • Strong experience with Kubernetes and containerization.
  • Experience creating Helm charts.
  • Solid understanding of infrastructure-as-code tools like Terraform, CloudFormation, or similar.
  • Proficiency in scripting and programming languages such as Python, Golang, or Bash.
  • Demonstrated experience optimizing CI / CD pipelines, implementing canary deployments, or tools like ArgoCD and FluxCD.
  • Familiarity with networking concepts and protocols, as well as system monitoring tools (e.g., Prometheus, Grafana).
  • Experience deploying and configuring databases such as Postgres.
  • Excellent technical writing skills, with a proven ability to create clear, comprehensive documentation and tutorials.
  • BS / MS in Computer Science, Engineering, or equivalent practical experience.
  • Ability to work cross-functionally and communicate effectively with engineering, operations, and customer-facing teams.
  • Preferred Qualifications :

  • Experience with secure software deployments in regulated industries such as aerospace, defense, or finance.
  • Systems software development experience using programming languages like C , Rust or Golang.
  • Experience building software development kits or productized tools for deploying cloud systems.
  • Knowledge of hybrid and on-premises deployment strategies and challenges.
  • Hands-on experience with database performance optimization and scaling strategies.
  • Familiarity with configuration management tools like Ansible, Chef, or Puppet.
  • Experience building robust monitoring and alerting systems for mission-critical applications.
  • Background in managing high-throughput simulation or testing environments.
  • Experience optimizing databases.
  • Shield AI is proud to be an equal opportunity workplace and is an affirmative action employer. We are committed to equal employment opportunity regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, marital status, disability, gender identity or Veteran status. If you have a disability or special need that requires accommodation, please let us know.

    J-18808-Ljbffr

    If your compensation planning software is too rigid to deploy winning incentive strategies, it’s time to find an adaptable solution. Compensation Planning
    Enhance your organization's compensation strategy with salary data sets that HR and team managers can use to pay your staff right. Surveys & Data Sets

    What is the career path for a Senior Staff Engineer, Cloud SRE/DevOps (SD/DC/Remote) (R3138)?

    Sign up to receive alerts about other jobs on the Senior Staff Engineer, Cloud SRE/DevOps (SD/DC/Remote) (R3138) career path by checking the boxes next to the positions that interest you.
    Income Estimation: 
    $71,493 - $96,419
    Income Estimation: 
    $92,369 - $122,605
    Income Estimation: 
    $92,369 - $122,605
    Income Estimation: 
    $117,024 - $149,811
    Income Estimation: 
    $117,024 - $149,811
    Income Estimation: 
    $137,568 - $176,908
    Income Estimation: 
    $137,568 - $176,908
    Income Estimation: 
    $158,960 - $205,707
    View Core, Job Family, and Industry Job Skills and Competency Data for more than 15,000 Job Titles Skills Library

    Job openings at The Rundown

    The Rundown
    Hired Organization Address San Francisco, CA Full Time
    At Hugging Face, we’re on a journey to democratize good AI. We are building the fastest growing platform for AI builders...
    The Rundown
    Hired Organization Address San Francisco, CA Full Time
    Meta is seeking Research Scientists to join our AI Research and Development organizations. The ideal candidate will have...
    The Rundown
    Hired Organization Address Sunnyvale, CA Full Time
    Meta is seeking an ASIC Engineer, Design to join our Infrastructure organization to build cutting edge ASICs in fields s...
    The Rundown
    Hired Organization Address Washington, DC Full Time
    About Shield AI Founded in 2015, Shield AI is a venture-backed defense technology company focused on protecting service ...

    Not the job you're looking for? Here are some other Senior Staff Engineer, Cloud SRE/DevOps (SD/DC/Remote) (R3138) jobs in the Washington, DC area that may be a better fit.

    Senior Cloud DevOps Engineer

    Graham Technologies, Springfield, VA

    AI Assistant is available now!

    Feel free to start your new journey!