What are the responsibilities and job description for the Scalability QA Engineer position at RedBalloon?
Company: Patmos
RedBalloon posts jobs on behalf of client companies
Job Summary
We are seeking an experienced Scalability QA Manager to lead the quality assurance efforts for our multi-region Kubernetes (K8s) and S3-based web application. In this critical role, you will oversee the testing processes to ensure our application performs reliably under varying loads, proactively identifying bottlenecks, configuration issues, and potential points of failure before they impact users. You’ll be directing QA consulting firms and collaborating with engineering, DevOps, and product teams to maintain uptime and optimize performance across our Cloudflare-enabled, cache-heavy infrastructure. If you’re passionate about scalability, thrive in complex distributed systems, and have a knack for spotting issues before they escalate, we’d love to hear from you.
Responsibilities
RedBalloon posts jobs on behalf of client companies
Job Summary
We are seeking an experienced Scalability QA Manager to lead the quality assurance efforts for our multi-region Kubernetes (K8s) and S3-based web application. In this critical role, you will oversee the testing processes to ensure our application performs reliably under varying loads, proactively identifying bottlenecks, configuration issues, and potential points of failure before they impact users. You’ll be directing QA consulting firms and collaborating with engineering, DevOps, and product teams to maintain uptime and optimize performance across our Cloudflare-enabled, cache-heavy infrastructure. If you’re passionate about scalability, thrive in complex distributed systems, and have a knack for spotting issues before they escalate, we’d love to hear from you.
Responsibilities
- Testing Oversight: Lead the design, development, and execution of scalability, performance, and reliability testing strategies for a multi-region web application built on Kubernetes and AWS S3.
- Proactive Bottleneck Detection: Identify and analyze potential performance bottlenecks, misconfigurations, and resource constraints in the application stack, including K8s clusters, S3 storage, Cloudflare CDN, and caching layers.
- Test Planning & Execution: Develop comprehensive test plans to simulate real-world traffic patterns, peak loads, and failure scenarios across multiple regions, ensuring the system remains resilient and responsive.
- Collaboration: Partner with DevOps and development teams to replicate production-like environments, validate fixes, and recommend configuration improvements (e.g., caching policies, K8s resource limits, Cloudflare settings).
- Observability, Monitoring & Reporting: Establish KPIs and monitoring frameworks to track system performance, latency, and uptime; provide detailed reports on findings and actionable recommendations to prevent downtime.
- Tooling & Automation: Implement and maintain automated testing tools and scripts to stress-test the application, focusing on scalability, failover, and recovery scenarios.
- Risk Mitigation: Anticipate and address configuration issues (e.g., improper K8s pod scaling, S3 throttling, or cache invalidation delays) that could lead to outages or degraded user experience.
- Documentation: Maintain clear documentation of test cases, results, and best practices to ensure knowledge sharing across teams.
- Experience: 5 years in QA or performance engineering, with at least 2 years focused on scalability testing for distributed systems.
- Technical Expertise:
- Hands-on experience with Kubernetes (K8s) testing, including cluster scaling, pod management, and resource optimization.
- Strong understanding of S3 and its performance characteristics (e.g., request rates, multipart uploads).
- Familiarity with Cloudflare features (CDN, caching, rate limiting) and their impact on webapp performance.
- Testing Tools: Proficiency with load testing tools (e.g., Locust, JMeter, Gatling) and monitoring solutions (e.g., Prometheus, Grafana).
- Automation: Experience scripting in Python, Bash, or similar languages to automate testing workflows.
- Problem-Solving: Proven ability to proactively identify and resolve bottlenecks in complex, multi-region architectures.
- Communication: Excellent verbal and written skills to collaborate with cross-functional teams and present findings to stakeholders.
- Education: Bachelor’s degree in Computer Science, Engineering, or a related field (or equivalent experience).
Salary : $100,000 - $140,000