Key Responsibilities:

Design, implement, and optimize data pipelines for large-scale data processing.
Develop and maintain ETL/ELT workflows using Spark, Hadoop, Hive, and Airflow.
Collaborate with data scientists, analysts, and engineers to ensure data availability and quality.
Write efficient and optimized SQL queries for data extraction, transformation, and analysis.
Leverage PySpark and cloud tools (preferably Google Cloud Platform) to build reliable and scalable solutions.
Monitor and troubleshoot data pipeline performance and reliability issues.

Required Skills:

Good to Have:

Experience with data modeling and data warehousing concepts.
Exposure to DevOps and CI/CD practices for data pipelines.
Familiarity with other programming/scripting languages (Python, Shell scripting).

Educational Qualification:

Bachelors or Master's degree in Computer Science, Information Technology, Engineering, or a related field.

Data Engineer