What are the responsibilities and job description for the GCP Data Engineer position at Anagha Technosoft?
Note: Please refrain from submitting fake experience profiles.
Note: W2 contract only || No C2C & No Full-time
Job Title: GCP Data Engineer
Location: Dallas, TX
Job Type: W2 Contract
Experience Level: Senior Level [9 years]
Job Overview :
We are seeking a talented and motivated Data Engineer with strong expertise in SQL, Python, PySpark, and Google Cloud Platform (GCP) to join our data engineering team. As a Data Engineer, you will be responsible for designing, building, and maintaining scalable data pipelines and architectures. You will collaborate closely with data scientists, analysts, and other engineers to ensure data is available, reliable, and optimized for analytics and decision-making.
Key Responsibilities :
- Data Pipeline Development: Design, develop, and maintain scalable ETL/ELT pipelines using Python, SQL, and PySpark to process large datasets efficiently.
- Cloud Data Infrastructure: Implement and optimize data storage and processing solutions using Google Cloud Platform (GCP) services such as BigQuery, Dataflow, Cloud Storage, Pub/Sub, and Cloud Composer (Airflow).
- Data Integration: Integrate data from various sources including APIs, databases, and cloud platforms to create a unified data environment.
- Database Management: Manage and optimize relational and NoSQL databases, ensuring high performance, scalability, and security.
- Data Modeling: Create and maintain robust data models, including relational and star-schema designs, to support business reporting and analytics.
- Data Quality: Monitor and ensure data quality and integrity across all data pipelines and storage solutions.
- Performance Tuning: Optimize SQL queries and PySpark jobs for performance, ensuring efficient processing of large datasets.
- Collaboration: Work closely with data analysts, data scientists, and business stakeholders to gather requirements, design data solutions, and ensure alignment with business goals.
- Automation & Monitoring: Develop automated workflows and implement monitoring solutions to ensure data availability and reliability.
- Documentation: Maintain up-to-date documentation for data pipelines, models, and processes to ensure clear communication and transparency.