What are the responsibilities and job description for the Senior Site Reliability Engineer (SRE) in Multiple Locations (Hybrid) position at Connvertex Technologies Inc.?
Job Title: Senior Site Reliability Engineer (SRE)
- Applicants must be legally authorized to work in the United States without the need for current or future visa sponsorship.
- Candidates must be based in or willing to relocate to the following locations: Coppell, TX; Boston, MA; Jersey City, NJ; Tampa, FL.
About the Role:
We are seeking a highly skilled Senior Site Reliability Engineer (SRE) to join our team within the financial security industry. This role is pivotal in ensuring the reliability, scalability, and performance of mission-critical applications and infrastructure that safeguard sensitive financial data. The ideal candidate will bring a strong emphasis on observability while working to drive down Mean Time to Detection (MTTD) and Mean Time to Resolution (MTTR).
Key Responsibilities:
- Incident Response: Lead and coordinate incident response efforts, quickly diagnosing and resolving critical issues to minimize downtime and financial risk. Drive improvements to reduce MTTD and MTTR.
- Monitoring & Alerting: Design and implement robust monitoring and alerting systems using tools like Prometheus, Grafana, and Datadog to identify issues before they impact operations.
- Automation: Automate routine operational tasks using Bash, Python, or PowerShell to improve efficiency and accuracy.
- Capacity Planning: Forecast resource needs, optimize system performance, and ensure scalability to meet regulatory and business demands.
- Security: Work closely with security teams to implement best practices for threat mitigation, incident response, and regulatory compliance.
- Collaboration: Partner with development, operations, and security teams to ensure seamless system integration and deployments.
- Problem Solving: Troubleshoot complex technical issues, delivering long-term, sustainable solutions.
- Observability: Develop and maintain observability pipelines (logging, tracing, metrics) to improve system visibility and accelerate incident response.
Required Skills and Experience:
- Experience: 10 years as a Site Reliability Engineer in the financial security industry (cybersecurity, fraud prevention, risk management).
- Technical Proficiency:
- Programming skills in Java and Python.
- Experience with cloud platforms (AWS, GCP, Azure) focusing on security and compliance.
- Proficiency with containerization and orchestration tools (Docker, Kubernetes).
- Strong scripting abilities in Bash, Python, or PowerShell.
- Knowledge of networking concepts (TCP/IP, DNS, load balancing, firewalls).
- Hands-on experience with CI/CD tools like Jenkins or GitLab CI/CD.
- Security Expertise: Strong understanding of security best practices, tools, and risk assessment methodologies.
- Observability: Demonstrated experience in designing observability solutions for logging, tracing, and metrics.
Preferred Skills:
- Infrastructure as Code (IaC): Experience with Terraform or CloudFormation.
- Mainframe Systems: Exposure to mainframe technologies is a plus.
- Communication Skills: Ability to communicate technical concepts to non-technical audiences effectively.
Salary : $54 - $72