STStack Digital
Data Engineering
Kolkata ₹3-7 LPA Posted 19 Jun 2025
FULL TIME
Data Modeling
Etl
Apache Hive
Apache Hadoop
Pyspark
+1 more
Job Description
- Design, develop, and optimize large-scale data processing pipelines using PySpark.
- Utilize Apache tools and frameworks (e.g., Hadoop, Hive, HDFS) for data ingestion, transformation, and management.
- Ensure high performance and reliability of ETL jobs in production environments.
- Collaborate with Data Scientists, Analysts, and stakeholders to deliver robust data solutions.
- Implement data quality checks and maintain data lineage for transparency and auditability.
- Handle ingestion, transformation, and integration of structured and unstructured data sources.
- (If applicable) Leverage Apache NiFi for automated, repeatable data flow management.
- Write clean, efficient, and maintainable code in Python and Java.
- Contribute to architecture, performance tuning, and scalability strategies.
Required Skills:
- 5–7 years of experience in data engineering.
- Strong hands-on experience with PySpark and distributed data processing.
- Deep knowledge of Apache ecosystem: Hadoop, Hive, Spark, HDFS.
- Solid understanding of ETL principles, data warehousing, and data modeling.
- Experience with large-scale datasets and performance tuning.
- Familiarity with SQL and NoSQL databases.
- Proficient in Python and intermediate knowledge of Java.
- Experience with Git and CI/CD pipelines.
Nice-to-Have Skills:
- Hands-on experience with Apache NiFi.
- Real-time streaming pipeline development experience.
- Exposure to cloud platforms like AWS, Azure, or GCP.
- Familiarity with Docker or Kubernetes.
Soft Skills:
- Strong analytical and problem-solving capabilities.
- Excellent communication and collaboration skills.
- Self-motivated with the ability to work both independently and in teams.