As a PySpark Data Engineer, you will:

Design and develop scalable PySpark data pipelines to ensure efficient processing of large datasets, enabling faster insights and business decision-making.
Leverage Databricks notebooks for collaborative data engineering and analytics, improving team productivity and reducing development cycle times.
Write clean, modular, and reusable Python code to support data transformation and enrichment, ensuring maintainability and reducing technical debt.
Implement data quality checks and validation logic within ETL workflows to ensure trusted data is delivered for downstream analytics and reporting.
Optimize Spark jobs for performance and cost-efficiency by tuning partitions, caching strategies, and cluster configurations, resulting in reduced compute costs.

Solid understanding of Python programming fundamentals, especially in building modular, efficient, and testable code for data processing.
Familiarity with libraries like pandas, NumPy, and SQLAlchemy (for lightweight transformations or metadata management).
Proficient in writing and optimizing PySpark code for large-scale distributed data processing.
Deep knowledge of Spark internals, partitioning, shuffling, lazy evaluation, and performance tuning.
Comfortable using Databricks notebooks, clusters, and Delta Lake.

Familiarity with cloud-native services like AWS S3, EMR, Glue, Lambda, or Azure Data Factory.
Experience deploying or integrating pipelines within a cloud environment adds flexibility and scalability.
Experience with tools like Great Expectations or custom-built validation logic to ensure data trustworthiness.

Senior Associate Technology L1