ST

Data Engineering

Stack Digital
Bangalore3-7 LPA Posted 19 Jun 2025
FULL TIME
Data Modeling
Etl
Apache Hive
Apache Hadoop
Pyspark
+1 more

Job Description

  • Design, develop, and optimize large-scale data processing pipelines using PySpark.
  • Utilize Apache tools and frameworks (e.g., Hadoop, Hive, HDFS) for data ingestion, transformation, and management.
  • Ensure high performance and reliability of ETL jobs in production environments.
  • Collaborate with Data Scientists, Analysts, and stakeholders to deliver robust data solutions.
  • Implement data quality checks and maintain data lineage for transparency and auditability.
  • Handle ingestion, transformation, and integration of structured and unstructured data sources.
  • (If applicable) Leverage Apache NiFi for automated, repeatable data flow management.
  • Write clean, efficient, and maintainable code in Python and Java.
  • Contribute to architecture, performance tuning, and scalability strategies.

Required Skills:

  • 5–7 years of experience in data engineering.
  • Strong hands-on experience with PySpark and distributed data processing.
  • Deep knowledge of Apache ecosystem: Hadoop, Hive, Spark, HDFS.
  • Solid understanding of ETL principles, data warehousing, and data modeling.
  • Experience with large-scale datasets and performance tuning.
  • Familiarity with SQL and NoSQL databases.
  • Proficient in Python and intermediate knowledge of Java.
  • Experience with Git and CI/CD pipelines.

Nice-to-Have Skills:

  • Hands-on experience with Apache NiFi.
  • Real-time streaming pipeline development experience.
  • Exposure to cloud platforms like AWS, Azure, or GCP.
  • Familiarity with Docker or Kubernetes.

Soft Skills:

  • Strong analytical and problem-solving capabilities.
  • Excellent communication and collaboration skills.
  • Self-motivated with the ability to work both independently and in teams.
Join WhatsApp Channel