VIVimerse Infotech
Data Engineer
Pune ₹3-12 LPA Posted 17 Jun 2025
FULL TIME
Spark
Etl
Pyspark
Sql
Aws
+1 more
Job Description
- Design, develop, maintain efficient and scalable solutions using PySpark
- Ensure data quality and integrity by implementing robust testing, validation and cleansing processes
- Integrate data from various sources, including databases, APIs, external datasets etc.
- Optimize and tune PySpark jobs for performance and reliability
- Document data engineering processes, workflows and best practices
- Strong understanding of databases, data modelling, and ETL tools and processes
- String programming skills in python and proficiency with PySpark, SQL
- Experience with relational databases, Spark, AWS, Python skill
- Excellent communication and collaboration skills
Key Responsibilities:
- Design and Development: Create, develop, and maintain robust solutions using PySpark to handle large-scale data processing.
- Data Quality Assurance: Implement thorough testing, validation, and cleansing processes to ensure data quality and integrity.
- Data Integration: Integrate data from diverse sources including databases, APIs, and external datasets to create unified data solutions.
- Performance Optimization: Optimize and tune PySpark jobs for maximum performance and reliability.
- Documentation: Document data engineering processes, workflows, and best practices to enhance team collaboration and knowledge sharing.
- Database Management: Utilize strong understanding of databases, data modeling, and ETL processes to support data architecture needs.
- Programming Expertise: Leverage programming skills in Python and proficiency with SQL and PySpark for effective data manipulation and analysis.
- Collaboration: Work closely with cross-functional teams to understand data requirements and deliver solutions that meet business needs.