SI
Job Description
Key Responsibilities
- Design, develop, and optimize scalable data pipelines using PySpark.
- Build and manage data solutions on Cloudera Data Platform (CDE, CDW, Ozone, Airflow).
- Implement and manage data governance and security using Apache Ranger.
- Work with Hive Metastore and distributed data systems architecture.
- Develop and manage data workflows in AWS environments (EMR, S3, MWAA).
- Implement metadata management and governance using Atlan.
- Design efficient data models using Iceberg and Parquet, applying effective partitioning and bucketing strategies.
- Manage AWS data services including Glue Catalog and Lake Formation (security, tagging, data sharing).
- Ensure high performance, reliability, and scalability of data solutions.
- Collaborate with cross-functional teams to support analytics and reporting requirements.