What are the responsibilities and job description for the Data Scientist (Data Engineer) position at EpochGeo?
Mid - Senior Data Engineer
EpochGeo is looking for a mid or senior level Data Engineer to support one of the most exciting and growing automated analytic projects in the national security space.
Are you a Data Engineer who loves to work with data, build pipelines, and deliver data products that empower actionable intelligence? Are you a software engineer who enjoys working with data infrastructure, both cloud-based and on-premise? If so, please reach out to us at EpochGeo to chat.
Location-based big data can be overwhelming, inaccessible, and noisy. EpochGeo is a data services firm specializing in supporting the full spectrum data cycle: beginning with scalable data storage and ending in producing impactful analytics. Our developers, analysts, and data scientists have a proven track record applying open-source innovative technology, data science, and actionable analytics to inform customers’ data driven decisions. As a Data Engineer on our team, you will have the opportunity to enable our data professionals to meet our customers’ mission requirements.
Your background likely includes:
- 4 - 10 years of experience working with data: data analytics, science, or engineering; or as a software engineer building data-intensive applications
- Education or military experience in Computer Science, Data Science, etc.
- Experience building, maintaining, and monitoring extract, transform, and load (ETL) pipelines
- Experience with Apache Airflow, or familiarity with workflow orchestration via a similar product
- Experience writing production-grade code with Python and Spark (we use PySpark)
- Experience with Amazon Web Services (AWS), our current stack makes use of: EMR (Spark, Hive, and Trino), EC2, S3, Lambda, Athena, and RDS
- Experience in any of the following disciplines is a plus: Linux systems administration, platform/infrastructure engineering, DevOps
What you will be doing:
- Interacting with data scientists, analysts, and collectors to gauge requirements and make multiple big datasets more readily accessible
- Building, maintaining, and monitoring data pipelines in AWS
- Contributing to custom Python libraries that enable analysts and Data Scientists to access data at scale
- Exploiting big data environments for insights that can be automated at scale
- Working in a dynamic environment with ever evolving requirements.
Bonus:
- Master’s degree in Computer Science or Data Science
- Demonstrable experience with Apache Spark via any API
- Familiarity with the “data lakehouse” concept and open-table formats, e.g. Hudi, Iceberg, and Delta
Additional:
- Must hold a TS/SCI clearance and be willing to submit for a CI Poly
Benefits Include:
- 100% health care premiums covered, FSA, HSA
- 401k: 6% match, immediate vesting
- 14 holidays: All 11 Federal holidays, day after Thanksgiving, 24 DEC, 31 DEC
- PTO: 4 weeks annually
- 3 week sabbatical every 3 years with the company