What are the responsibilities and job description for the SRE / DevOps (JIRA and GCP infrastructure) position at Appko, Inc.?
Job Details
Job Description
Sr.DevOps/Site Reliability/Fullstack Engineer
Coverage time: 7am - 4pm PST (Remote / Mt View Onsite)
We are looking for a highly motivated DevOps/Site Reliability Engineer to join our exceptional team. The candidate we are looking for is ready to design, automate and support our GCP and JIRA server infrastructure, back-end systems and do technical integration with our client tools and apps that support autonomous software end to end testing processes. The ideal candidate would have strong performance tuning, technical design, fullstack skills and support of GCP and Jira Production Environments for 1000 end user base. Jira is currently utilized for end to end automated test plan implementation/enhancements/maintenance via JIRA structures hierarchy and workstreams. (includes manual and automated processes).
Responsibilities
We are looking for a highly motivated DevOps/Site Reliability Engineer to join our exceptional team. The candidate we are looking for is ready to design, automate and support our GCP and JIRA server infrastructure, back-end systems and do technical root cause analysis. As an SRE, you will play a critical role in helping us shape our software stack and hardware infrastructure. Your knowledge of design, analytics, development, coding, testing and application programming will enhance our development team to satisfy customer business and functional requirements. This person will also be instrumental in maintaining, designing, implementing, and supporting a high availability infrastructure environment for a client focusing on autonomous software and software testing infrastructure. We are seeking an individual that can make an immediate impact on improving application and infrastructure production performance.
Performance Tuning:
Expertise of Jira/Testing tools Production Environment to meet SLA requirements
Understands and takes ownership Jira Server to met SLA s
JVM (Java Virtual Machine)
JVM Configuration / Insufficient Memory
Garbage Collection logging
File Systems
Database performance
Network and Load balancing
CPU/RAM/OS & Architecture/Physical or Virtual
Research and Implement Atlassian/GCP best practices to current infrastructure
Database Server Performance improvements to met SLA expectations (MySQL)
Identify and root cause Jira database crashes in production environment.
Dashboards and Reports loading issues
Improve Jira latency issues
Daily Activities:
Take ownership of test and production environments for infrastructure for performance to meet customers and end users SLA requirements.
Ensuring design, system reliability, availability, and serviceability of GCP Cloud and JIRA infrastructure.
Improve the product life cycle through inception, design, deployment, operation and refinement
Capacity planning, Disaster Recovery planning, System Scaling, Overloads and Failure, and Communication Planning
Write automation code for provisioning and operating infrastructure at massive scale
Establish end-to-end monitoring and alerting on all critical components of the applications, including availability, latency and overall system health
Participate in the on-call rotation supporting the platform and/or the production application
Direct root-cause-corrective-action analysis of critical business and production issues
Develop standard methodology for Infra orchestration and troubleshooting application service in production
Represent DevOps/SRE in design reviews and works with Engineering teams on operational readiness
Technical Qualifications:
BS Computer Science, Engineering or a related field, or equivalent professional experience
Experience with Unix/Linux operating systems internals and administration
Good understanding of networking technologies such as SDN, NFV, SD-WAN and sound knowledge of Ethernet switch and routing technology
Good understanding in the areas of server & network virtualization, and global infrastructure, distributed systems, load balancing and security
Experience with at least one configuration management solution with hands-on experience in server virtualization (i.e.: VMware ESXi, KVM, Hyper-V)
Expertise in configuration management with a framework such as Ansible, Chef, Puppet or Terraform
Experience in AWS and GCP cloud computing and its related services
Strong fundamentals in HTTP including HTTP headers, Process and System API services; experience working with third party RESTful APIs
Experience with Python, Go and/or C
Experience with CI/CD pipeline, GitHub and Jenkins
Ability to debug and optimize code
Passion for automation and monitoring instrumentation in the code
Knowledge of best practices related to security, performance, and disaster recovery
Other Qualifications:
Ability to communicate effectively and succinctly
Strong systematic problem solving skills and able to work in ambiguity
Excellent written and verbal communication, able to collaborate and rally support
Excellent interpersonal skills and the ability to work well in a team
Self-disciplined, self-managed, self-motivated and strong sense of ownership, urgency and drive; positive attitude with the ability to quickly learn new technologies and effectively manage parallel projects
Ability to diagnose and troubleshoot complex distributed systems handling high volume transactions
Passionate to learn, understand, and dissect new technologies quickly and independently
Preferred Qualifications:
5 years of related experience
Experience with modern logging/reporting tools such as Prometheus
Experience with networking (e.g., TCP/IP, routing, network topologies and hardware, SDN, NFV)
Experience with implementing monitoring tools such as Grafana, collectd, and Zabbi
Experience with etcd, NoSQL and time series Databases
Proven experience working with customers and vendors
Proven leadership of small informal teams
Company Description