EPEpsilon Data Management
Senior Associate Technology L1
Bangalore ₹2-5 LPA Posted 3 Jun 2025
FULL TIME
Etl
Pyspark
Databricks
Python
Job Description
Key Responsibilities
As a PySpark Data Engineer, you will:
- Design and develop scalable PySpark data pipelines to ensure efficient processing of large datasets, enabling faster insights and business decision-making.
- Leverage Databricks notebooks for collaborative data engineering and analytics, improving team productivity and reducing development cycle times.
- Write clean, modular, and reusable Python code to support data transformation and enrichment, ensuring maintainability and reducing technical debt.
- Implement data quality checks and validation logic within ETL workflows to ensure trusted data is delivered for downstream analytics and reporting.
- Optimize Spark jobs for performance and cost-efficiency by tuning partitions, caching strategies, and cluster configurations, resulting in reduced compute costs.
Qualifications: Your Skills & Experience
- Solid understanding of Python programming fundamentals, especially in building modular, efficient, and testable code for data processing.
- Familiarity with libraries like pandas, NumPy, and SQLAlchemy (for lightweight transformations or metadata management).
- Proficient in writing and optimizing PySpark code for large-scale distributed data processing.
- Deep knowledge of Spark internals, partitioning, shuffling, lazy evaluation, and performance tuning.
- Comfortable using Databricks notebooks, clusters, and Delta Lake.
Set Yourself Apart With
- Familiarity with cloud-native services like AWS S3, EMR, Glue, Lambda, or Azure Data Factory.
- Experience deploying or integrating pipelines within a cloud environment adds flexibility and scalability.
- Experience with tools like Great Expectations or custom-built validation logic to ensure data trustworthiness.