Design, develop, maintain efficient and scalable solutions using PySpark
Ensure data quality and integrity by implementing robust testing, validation and cleansing processes
Integrate data from various sources, including databases, APIs, external datasets etc.
Optimize and tune PySpark jobs for performance and reliability
Document data engineering processes, workflows and best practices
Strong understanding of databases, data modelling, and ETL tools and processes
String programming skills in python and proficiency with PySpark, SQL
Experience with relational databases, Spark, AWS, Python skill
Excellent communication and collaboration skills

Key Responsibilities:

Design and Development: Create, develop, and maintain robust solutions using PySpark to handle large-scale data processing.
Data Quality Assurance: Implement thorough testing, validation, and cleansing processes to ensure data quality and integrity.
Data Integration: Integrate data from diverse sources including databases, APIs, and external datasets to create unified data solutions.
Performance Optimization: Optimize and tune PySpark jobs for maximum performance and reliability.
Documentation: Document data engineering processes, workflows, and best practices to enhance team collaboration and knowledge sharing.
Database Management: Utilize strong understanding of databases, data modeling, and ETL processes to support data architecture needs.
Programming Expertise: Leverage programming skills in Python and proficiency with SQL and PySpark for effective data manipulation and analysis.
Collaboration: Work closely with cross-functional teams to understand data requirements and deliver solutions that meet business needs.

Data Engineer