What are the responsibilities and job description for the High-Performance Data Pipeline Designer position at Elicit?
What You'll Do
- Build and Optimize Academic Research Paper Pipeline
- Architect and implement robust, scalable solutions to handle our growing data needs while maintaining high performance and data quality.
- Work on efficiently processing, deduplicating, and indexing hundreds of millions of research papers.
- Enhance Elicit's Data Infrastructure
- Optimize our Spark jobs and data pipelines to handle large amounts of data efficiently.
- Implement data partitioning strategies in our distributed systems to improve performance.
- Maintain and Improve Data Quality
- Implement robust data quality management processes to ensure the accuracy and reliability of our academic database.
- Work on developing defenses against unexpected changes from publishers to maintain data integrity.