What are the responsibilities and job description for the Lead SRE Engineer position at AGM Tech Solutions, LLC?
Job Details
Title: Lead SRE Engineer
Location: Florham Park, NJ (Hybrid)
Responsibilities:
- Work with R&D teams to understand the standards of Product Development and recommend changes towards increased stability of the products and applications.
- Building software to improve DevOps, ITOps, and support processes which support the everything as code model such as Infrastructure as code , Platform as a service, etc.
- Perform safe reliable deployments of all appropriate software artifacts into various systems from Development, Staging to Production.
- Maintain services once they are live by measuring and monitoring availability, latency and overall system health.
- Scale systems sustainably through mechanisms like automation and evolve systems by pushing for changes that improve reliability and velocity.
- Create / Maintain plan for disaster recovery in the staging and production environments
- Analyze system problems including root cause determination and manage any needed recovery process to ensure a quick restoration of service without loss of data.
- Maintains a broad knowledge of state-of-the-art technology, equipment, and/or systems
- Able to understand RESTful services, even using APIs to help towards automation goals
- Maintain network and system security, understand security protocols, certificate management
Experience/Skills:
- Experience working under a Scrum methodology
- Ability to analyze and resolve problems in systems, networks, software, and APIs; understanding where all sources of information can come from.
- Understanding of source/version control such as GIT or BitBucket.
- DevOps processes and tools such as Azure DevOps or Jenkins
- Involvement with containerization, such as Docker or Kubernetes
- CI/CD implementation expertise
- Good knowledge on cloud native applications (preferable AWS).
- Experience with IT automation in general. Using tools like Ansible, coding with programming languages like Python, Groovy, PowerShell or Bash scripts
- Windows and Linux OS knowledge preferred.
- Use of monitoring and logging tools such as Splunk, Dynatrace or similar
- Advanced English proficiency
- Understanding Microsoft suite of development tools is a plus, including Visual Studio, IIS, MS SQL Server, .NET
Generally speaking, the candidate should have a good balance between troubleshooting an issue, understanding the potential problems with an OS, Network, Security, and Database. It is a plus to understand how to code, write scripts to do automation tasks, and even better if knowledge of how to call APIs repeatedly in that script.
We are looking for someone that is curious, understands the current technologies such as Windows/IIS/C#/Java/Tomcat/JBoss/Linux; but is also interested in the newer technologies such as containerization or GenAI, prompt engineering.
Someone who is able to troubleshoot, break down and understand complex systems is a plus.