What are the responsibilities and job description for the Data Engineer/DevOps Engineer Trainee position at Adbakx?
Job Title: Data Engineer & DevOps Engineer (1-3 Years’ Experience)
Location: Pleasanton, CA
Job Type: Unpaid internship then based on performance its converted into Full-time
Job Description: We are looking for a Data Engineer with DevOps expertise who has 1-3 years of experience in building scalable data pipelines, managing cloud infrastructure, and automating deployments. The ideal candidate should have experience with Spark, ETL tools, cloud platforms (AWS/GCP/Azure), Databricks, SQL, Python, Snowflake, REST API development & testing, Docker, Kubernetes, CI/CD pipelines, and monitoring tools like Grafana.
Key Responsibilities:
Data Engineering:
Design, develop, and maintain scalable ETL pipelines for processing large datasets.
Work with Apache Spark (PySpark, Databricks) for big data processing.
Develop data pipelines for batch and real-time processing.
Manage and optimize data storage using Snowflake or other cloud-based data warehouses.
Work with SQL to transform and query structured and semi-structured data.
Implement data governance, quality checks, and security best practices.
Backend Development & API Development:
Develop and maintain RESTful APIs using Python Django for data access and processing.
Perform API testing and ensure best security practices.
Optimize API performance for scalability and high availability.
DevOps & Cloud Engineering:
Deploy and manage containerized applications using Docker & Kubernetes.
Work with cloud platforms (AWS, GCP, Azure) to deploy scalable infrastructure.
Implement CI/CD pipelines using GitHub Actions, Jenkins, or GitLab CI/CD.
Automate infrastructure provisioning using Terraform or CloudFormation.
Manage cloud networking, IAM roles, and security policies.
Work on serverless computing (AWS Lambda, Google Cloud Functions, Azure Functions).
Monitoring & Performance Optimization:
Set up monitoring and logging tools like Grafana, Prometheus, ELK Stack, or CloudWatch.
Optimize data pipelines for cost efficiency and performance.
Implement error handling, alerting, and observability for infrastructure and pipelines.
Required Skills & Experience:
1-3 years of experience in data engineering and DevOps.
Strong knowledge of Python for data processing and backend API development.
Experience with Apache Spark (PySpark, Databricks).
Hands-on experience with SQL and data warehousing (Snowflake, Redshift, BigQuery).
Expertise in ETL tools and data pipeline orchestration.
Proficiency in Docker & Kubernetes for containerized deployments.
Experience with CI/CD pipelines using GitHub Actions, Jenkins, or GitLab CI/CD.
Hands-on experience with cloud platforms (AWS, GCP, Azure).
Understanding of Terraform, CloudFormation, or other IaC tools.
Knowledge of monitoring tools (Grafana, Prometheus, ELK Stack, CloudWatch).
Strong problem-solving and debugging skills.
Preferred Qualifications:
Experience with Kafka or other data streaming technologies.
Exposure to Airflow, Prefect, or other workflow orchestration tools.
Knowledge of security best practices for cloud and data engineering.
Familiarity with log management and distributed tracing tools.