What are the responsibilities and job description for the Senior Data Engineer position at Gotham Technology Group?
As a Data Engineer you will be responsible for developing and optimizing data environments that enable the smooth extraction, transformation, and loading (ETL) of large and complex datasets. You will work with protein data, implementing efficient pipelines for storing, manipulating, and cross-matching data to enable similarity searching and data-driven decision-making for model training. The ideal candidate will have extensive experience with data engineering in cloud environments, particularly with Databricks and Azure technologies.
Key Responsibilities :
- Design, develop, and maintain efficient ETL pipelines to handle large-scale, customized datasets from various sources to destinations, ensuring data integrity and accessibility.
- Work closely with bioinformatics and data science teams to create data structures optimized for model training, enabling quick access and cross-matching for similarity searches.
- Implement, organize, and optimize protein data sets within Databricks or Microsoft Fabric, ensuring compatibility with AI / ML workflows.
- Manage data lakes and data warehouses, ensuring data consistency, accuracy, and optimal performance in a cloud-based environment.
- Collaborate with stakeholders to understand data needs, defining data architecture and structures that meet both current and future requirements.
- Ensure secure handling and transfer of sensitive research data, complying with company policies and regulatory standards.
- Build automated data pipelines for cross-matching datasets and generating insights using similarity searches and other bioinformatics techniques.
- Maintain documentation and best practices for data handling, processing, and storage to ensure smooth operations and reproducibility.
Required Qualifications
Preferred Qualifications :