Job description

As a Data Engineer , you are required to

Design, build, and maintain data pipelines that efficiently process and transport data from various sources to storage systems or processing environments while ensuring data integrity, consistency, and accuracy across the entire data pipeline.
Integrate data from different systems, often involving data cleaning, transformation (ETL), and validation. Design the structure of databases and data storage systems, including the design of schemas, tables, and relationships between datasets to enable efficient querying. Work closely with data scientists, analysts, and other stakeholders to understand their data needs and ensure that the data is structured in a way that makes it accessible and usable.
Stay up-to-date with the latest trends and technologies in the data engineering space, such as new data storage solutions, processing frameworks, and cloud technologies. Evaluate and implement new tools to improve data engineering processes.
Qualification Bachelor's or Master's in Computer Science & Engineering, or equivalent. Professional Degree in Data Science, Engineering is desirable.

Experience level At least 3 - 5 years hands-on experience in Data Engineering

Desired Knowledge & Experience

Spark: Spark 3.x, RDD/DataFrames/SQL, Batch/Structured Streaming
Knowing Spark internalsCatalyst/Tungsten/Photon
Databricks: Workflows, SQL Warehouses/Endpoints, DLT, Pipelines, Unity, Autoloader
IDE: IntelliJ/Pycharm, Git, Azure Devops, Github Copilot
Test: pytest, Great Expectations
CI/CD Yaml Azure Pipelines, Continuous Delivery, Acceptance Testing
Big Data Design: Lakehouse/Medallion Architecture, Parquet/Delta, Partitioning, Distribution, Data Skew, Compaction
Languages: Python/Functional Programming (FP)
SQL TSQL/Spark SQL/HiveQL
Storage Data Lake and Big Data Storage Design
additionally it is helpful to know basics of:
Data Pipelines ADF/Synapse Pipelines/Oozie/Airflow
Languages: Scala, Java
NoSQL : Cosmos, Mongo, Cassandra
Cubes SSAS (ROLAP, HOLAP, MOLAP), AAS, Tabular Model
SQL Server TSQL, Stored Procedures
Hadoop HDInsight/MapReduce/HDFS/YARN/Oozie/Hive/HBase/Ambari/Ranger/Atlas/Kafka
Data Catalog Azure Purview, Apache Atlas, Informatica

Required Soft skills & Other Capabilities

Data Engineer-Python,PySpark,SQL ,Spark Architecture,Azure Databricks