What are the responsibilities and job description for the Site Reliability Engineer (Not DevOps) position at TrueSkilla?
Role: Site Reliability and operations Engineer (SRE) (Not DevOps)
Work Location: IRVING, TX (Hybrid)- 3 days
Type: W2 Only
Duration: 12 Months
We are looking for a highly skilled Site Reliability and operations Engineer (SRE) with extensive experience in Kubernetes-based distributed caching and compute grid solutions. This role requires a strong foundation in software development, infrastructure automation, and reliability engineering. You will be responsible for designing, implementing, and maintaining high-performance distributed systems, ensuring reliability, scalability, and efficiency.
Development & Implementation:
• Design, develop, and optimize distributed caching and compute grid solutions on Kubernetes/OpenShift
• Understanding of microservices and containerized workloads using Kubernetes, Docker, and Helm.
• Implement high-throughput compute grid solutions using IBM Spectrum Symphony, Tibco Grid Server or similar technologies.
• Optimize application performance by leveraging parallel compute strategies, load balancing, and efficient data distribution.
Site Reliability Engineering (SRE):
• Ensure high availability, scalability, and reliability of distributed systems.
• Implement observability, logging, and monitoring using tools like Prometheus, Grafana, ELK, or OpenTelemetry.
• Automate infrastructure provisioning and deployments using Ansible, and Helm Charts.
• Understanding of CI/CD pipelines for seamless software deployment.
• Troubleshoot and resolve incidents related to platform, infrastructure and distributed compute platforms, ensuring minimal downtime.
Required Skills & Qualifications:
• Strong experience in Kubernetes (OpenShift and on-prem/cloud clusters).•
• Understanding of programming languages like Java, Go, or Python.
• Experience with containerization technologies (Docker, Helm, etc.).
• Strong knowledge of CI/CD pipelines (Jenkins, ArgoCD, GitHub Actions).
• Hands-on experience with observability tools (Prometheus, Grafana, Loki, Jaeger).
• Understanding of networking, service meshes (Istio/Linkerd), and security best practices in Kubernetes.
• Experience with multi-cluster and hybrid cloud Kubernetes deployments.
Salary : $80 - $90