FU

python,pyspark

Fusion Plus Solutions
Hyderabad4-7 LPA Posted 17 Jul 2025
FULL TIME
Github
Ec2
S3
Pyspark
Sql
+1 more

Job Description

  • Develop and optimize data processing jobs using PySpark to handle complex data transformations and aggregations efficiently.
  • Design and implement robust data pipelines on the AWS platform, ensuring scalability and efficiency (Databricks exposure will be an advantage)
  • Leverage AWS services such as EC2, S3, etc. for comprehensive data processing and storage solutions.
  • Expertly manage SQL database schema design, query optimization, and performance tuning to support data transformation and loading processes.
  • Design and maintain scalable and performant data warehouses, employing best practices in data modeling and ETL processes.
  • Utilize modern data platforms for collaborative data science, integrating seamlessly with various data sources and types.
  • Ensure high data quality and accessibility by maintaining optimal performance of Databricks clusters and Spark jobs.
  • Develop and implement security measures, backup procedures, and disaster recovery plans using AWS best practices.
  • Manage source code and automate deployment using GitHub along with CI/CD practices tailored for data operations in cloud environments.
  • Provide expertise in troubleshooting and optimizing PySpark scripts, Databricks notebooks, SQL queries, and Airflow DAGs.
  • Keep abreast of latest developments in cloud data technologies and advocate for the adoption of new tools and practices that can benefit the team.
  • Use Apache Airflow to orchestrate and automate data workflows, ensuring timely and reliable execution of data jobs across various data sources and systems.
  • Collaborate closely with data scientists and business analysts to design data models and pipelines that support advanced analytics and machine learning projects.
Join WhatsApp Channel