What are the responsibilities and job description for the Data Engineer position at Gotham Technology Group?
We are seeking a highly skilled and innovative Data Engineer to develop and optimize data environments that drive research advancements in protein data analysis. In this role, you will design and maintain robust data pipelines to handle large, complex datasets, enabling similarity searching and data-driven decision-making for model training. With a strong focus on cloud-based environments, particularly Databricks and Azure, you will collaborate closely with bioinformatics and data science teams to build scalable, efficient data solutions that fuel cutting-edge research.
Key Responsibilities:
- ETL Development: Design, develop, and maintain efficient ETL pipelines to process large-scale datasets, ensuring data integrity and accessibility.
- Data Architecture: Manage cloud-based data lakes and warehouses, optimizing data storage and retrieval performance for bioinformatics workflows.
- Collaboration: Partner with bioinformatics and data science teams to structure data environments that support model training and enable similarity searching.
- Data Security: Ensure secure handling and transfer of sensitive research data in compliance with company policies and regulatory requirements.
- Data Insights: Build automated data pipelines for cross-matching datasets, generating insights through advanced bioinformatics techniques.
- Documentation & Best Practices: Maintain detailed documentation and establish best practices for data handling, processing, and storage to ensure reproducibility.
Required Qualifications:
- Bachelor's degree in Computer Science, Data Engineering, Bioinformatics, or a related field.
- Proven experience in developing and managing customized data environments, including ETL processes, data transformation, and manipulation.
- Proficient in SQL and Python for data pipeline development and analysis.
- Strong understanding of data modeling, data architecture, and database optimization techniques.
- Experience working in collaborative, fast-paced research environments.
- Excellent analytical and problem-solving skills, with strong attention to detail.
Preferred Qualifications:
- Familiarity with big data frameworks such as Apache Spark and Apache Kafka.
- Experience supporting machine learning workflows with efficient data pipelines.
- Knowledge of bioinformatics tools and libraries related to protein data analysis.
- Understanding of regulatory requirements for research data (e.g., HIPAA, GxP).
Salary : $120,000 - $150,000