FUFusion Plus Solutions
Databricks Pyspark
Hyderabad ₹4-12 LPA Posted 17 Jul 2025
FULL TIME
Github
Databricks
Job Description
- Develop and optimize data processing jobs using PySpark to handle complex data transformations and aggregations efficiently.
- Design and implement robust data pipelines on the AWS platform, ensuring scalability and efficiency (Databricks exposure will be an advantage)
- Leverage AWS services such as EC2, S3, etc. for comprehensive data processing and storage solutions.
- Expertly manage SQL database schema design, query optimization, and performance tuning to support data transformation and loading processes.
- Design and maintain scalable and performant data warehouses, employing best practices in data modeling and ETL processes.
- Utilize modern data platforms for collaborative data science, integrating seamlessly with various data sources and types.
- Ensure high data quality and accessibility by maintaining optimal performance of Databricks clusters and Spark jobs.
- Develop and implement security measures, backup procedures, and disaster recovery plans using AWS best practices.
- Manage source code and automate deployment using GitHub along with CI/CD practices tailored for data operations in cloud environments.
- Provide expertise in troubleshooting and optimizing PySpark scripts, Databricks notebooks, SQL queries, and Airflow DAGs.
- Keep abreast of latest developments in cloud data technologies and advocate for the adoption of new tools and practices that can benefit the team.
- Use Apache Airflow to orchestrate and automate data workflows, ensuring timely and reliable execution of data jobs across various data sources and systems.
- Collaborate closely with data scientists and business analysts to design data models and pipelines that support advanced analytics and machine learning projects.