What are the responsibilities and job description for the Lead Systems Engineer Datadog Administrator position at 1 Point System?
Job Details
Role: Lead Systems Engineer Datadog Administrator
Location: Washington, DC Hybrid (need local only)
Duration: 12-month contract
Lead Systems Engineer with Datadog Administration experience to support Systems Monitoring initiatives for a leading health insurance company.
The Datadog administrator will be responsible for software tool administration for systems and applications monitoring tools. Expertise with at least one of the Monitoring tools like Datadog.
Required Skills
Datadog Administration experience on Linux platform to instrument Java-based applications running on Tomcat Application Server.
Configuration experience in Infrastructure Monitoring, Network Monitoring, and Centralized Logging; or similar Administration experience with ELK Stack - Elasticsearch (search and analytics engine), Logstash (ingest pipeline), and Kibana (visualization and creating dashboards).
Strong Linux platform (Red Hat) background.
Automation experience with scripting (Python, Shell, ANSIBLE) preferred.
Understanding of SSL setup on Linux servers. Installing CA certs etc.
Experience with Network Monitoring and knowledge of Network components like Switches, Routers, Palo Alto Network utilization SNMP, F5 Load Balancers, WebSeal, Info Blocks, Gigamon, and Network Mapping is a plus.
Working knowledge of other monitoring tools like Big Panda, CloudBeat (Synthetic Monitoring) is desired. These tools are used to monitor applications and business transactions that impact the business and customers, currently.
Responsibilities include script writing, installing, managing, and maintaining the monitoring tools, as needed, as well as integration with other tools and collaboration with other groups and their tools
Specific Required Skills
5-8 years of strong IT experience and good working knowledge of a variety of technology platforms in a distributed environment including Microsoft systems (e.g., Windows 2012 and 2016 Server, Active Directory, Exchange, SharePoint), Linux/Unix, VMWare, SQL Server, database architectures, TCP/IP, VPNs, Mainframe, LAN/WAN technologies and architectures
A minimum of 3 years of hands-on experience installing, integrating, managing, and maintaining monitoring tools like Datadog administration and support; or similar Log Management experience with ELK Stack - Elasticsearch (search and analytics engine), Logstash (ingest pipeline), and Kibana (visualization and creating dashboards)
Experience in writing Shell, Python, Selenium, and VuGen scripts
Experience with SSL certs, encryption methods on Linux
Experience in developing and implementing systems monitoring and alerting strategies in diverse, large-scale environments
Experience developing and documenting processes, procedures, and policies for tool usage and integration
Author tool maintenance and training documentation as well as support requests for training on tool usage
Knowledge and experience with configuring alerts, dashboards, and ad-hoc reports
Strong understanding of service level management (SLAs, SLRs, etc.)
Determine and document tool backup and recovery procedures
Experience with data management tools and databases (e.g., DB2, SQL -familiarity desired)
Experience in systems and Java applications troubleshooting using monitoring tools like Datadog
Understanding and experience with both waterfall and agile Software Development Life Cycles (SDLC)
Bachelor of Science in Computer Science or related field (i.e., Engineering, Applied Science, Math, etc.) or equivalent experience.
Experience with SAFe agile methodologies
LICENSES/CERTIFICATIONS
ITIL Foundations v3 within 180 Days Pref
SAFe Certification