What are the responsibilities and job description for the Java Site Reliability Engineer (Cloud - Azure) position at Artmac?
Who We Are
Artmac Soft is a technology consulting and service-oriented IT company dedicated to providing innovative technology solutions and services to Customers.
Job Description
Job Title : Java Site Reliability Engineer (Cloud - Azure)
Job Type : W2
Experience : 8 - 20 Years
Location : Plano, Texas
Required Skills
Site Reliability Engineers for Cloud (Azure) to proactively drive Resiliency as new functionality is launched, by reviewing User Stories & Code changes being performed by Scrum Teams to eliminate weak points of failure, configure Alerting, Monitoring, Dashboarding and impact analysis tools, optimize on-call processes & procedures, documenting "tribal" knowledge, conducting post-incident reviews, and drive lower MTTR reduction. During outage help provide impact, mitigation and drive our SLAs.
Key Roles & Responsibilities
Artmac Soft is a technology consulting and service-oriented IT company dedicated to providing innovative technology solutions and services to Customers.
Job Description
Job Title : Java Site Reliability Engineer (Cloud - Azure)
Job Type : W2
Experience : 8 - 20 Years
Location : Plano, Texas
Required Skills
Site Reliability Engineers for Cloud (Azure) to proactively drive Resiliency as new functionality is launched, by reviewing User Stories & Code changes being performed by Scrum Teams to eliminate weak points of failure, configure Alerting, Monitoring, Dashboarding and impact analysis tools, optimize on-call processes & procedures, documenting "tribal" knowledge, conducting post-incident reviews, and drive lower MTTR reduction. During outage help provide impact, mitigation and drive our SLAs.
Key Roles & Responsibilities
- Coordinate and guide Cloud migration of microservice based architecture on cloud various environments.
- Building and implementing Cloud service for the high availability, performance, monitoring, and incident response.
- Enable and Provide infrastructure support for DevOps team including on-prem and Cloud administration.
- Implement and enhance Automation framework for delivery of microservices based arch applications using Java, J2EE, Jenkins, Maven, linux,K8s, on both on-prem and in cloud.
- Work with Application Developers on a day-to-day basis to collect requirements for next release.
- Implement monitoring and alerting creating Dashboards for specific metrics, set thresholds, and trigger alerts based on those thresholds interpret the alerts and automatically heal system.
- Perform root cause analysis brainstorming session on incident resolutions provide corrective and preventative measures to perform & avoid or mitigate future incidents working with DevOps teams.
- Exercise a high degree of responsibility for the processes, systems, and tools created and managed.
- Ability to work across teams to continuously analyze system performance in production, troubleshoot consumer and engineering reported issues, and proactively identify areas in need of optimization.
- Work with team to gather requirements, research, evaluate, design, plan, deploy, and support the ELK stack on Linux. Build highly-resilient, high-performance, scalable, and flexible systems.
- Azure Cloud/Linux systems administration and scripting/automation experience.
- Experience in designing, analyzing, and troubleshooting large-scale distributed systems.
- Debug production issues across services and levels of the stack.
- Experience with one or more orchestration, deployment tools Azure Resource Manager (ARM), Terraform, Ansible.
- Familiarity with Git or other source control systems.
- Experience with TFS or Visual Studio Team Services (VSTS).
- Experience using tools to create and manage CI (continuous integration) and CD (continuous delivery) pipelines.
- Experience working with Microsoft Azure Public Cloud.
- PowerShell or Python experience, specifically for systems automation.
- RESTful and WebSocket APIs.
- Working knowledge of the TCP/IP stack, internet routing and load balancing.
- Experience with monitoring alerting using technologies like Log Analytics, Dynatrace Prometheus, Nagios, Kafka.
- Experience implementing, designing, deploying Docker, Kubernetes, Serverless (Function or Lambda’s).
- Previous experience working with geographically-distributed coworkers.
- Strong interpersonal communication skills (including listening, speaking, and writing) and ability to work well in a diverse, team-focused environment with other SREs, Engineers, Product Managers, etc.
- Bachelor’s degree in Computer Science, Information Systems or related field.
- 3 to 5 years of Azure Cloud Admin or solution architect, Azure certification desirable.
- Bachelor’s degree in Computer Science, Information Technology, or a related field.