RA

Databricks Data Engineer

RARR Technologies
Remote5-10 LPA Posted 14 May 2025
FULL TIME
Performance Tuning
Data Modeling
Debugging
Query Optimization
Data Processing
+1 more

Job Description

Key Responsibilities

  • Develop & Optimize ETL Pipelines
  • Build robust, scalable data pipelines using Azure Data Factory (ADF), Databricks, and Python.
  • Handle data ingestion, transformation, and loading processes efficiently.
  • Data Modeling & Systematic Layer Modeling
  • Design logical, physical, and systematic models for both structured and unstructured data.
  • SAP IS-Auto Integration
  • Extract, transform, and load data from SAP IS-Auto into Azure-based data platforms.
  • Database Management
  • Develop and optimize SQL queries, stored procedures, and indexing strategies to improve performance.
  • Big Data Processing
  • Utilize Azure Databricks, Apache Spark, and Delta Lake for distributed and large-scale data processing.
  • Data Quality & Governance
  • Implement data validation, lineage tracking, and security controls to ensure high data quality and compliance.
  • Cross-Functional Collaboration
  • Collaborate with business analysts, data scientists, and DevOps teams to ensure data availability and usability.
  • Testing and Debugging
  • Write unit tests and debug issues to ensure a robust and error-free data implementation.
  • Conduct performance optimization and security audits.

Required Skills and Qualifications

  • Azure Cloud Expertise
  • Strong experience in Azure Data Factory (ADF), Databricks, and Azure Synapse.
  • Programming Skills
  • Proficiency in Python for data processing, automation, and scripting tasks.
  • SQL & Database Proficiency
  • Advanced knowledge in SQL, T-SQL, or PL/SQL for efficient data manipulation.
  • SAP IS-Auto Integration
  • Proven experience in integrating SAP IS-Auto into data pipelines.
  • Data Modeling Expertise
  • Hands-on experience with dimensional modeling, systematic layer modeling, and ER modeling.
  • Big Data Frameworks
  • Strong knowledge of Apache Spark, Delta Lake, and distributed computing principles.
  • Performance Optimization
  • Expertise in query optimization, indexing, and performance tuning strategies.
  • Data Governance & Security
  • Understanding of RBAC, encryption methods, and data privacy standards.

Preferred Qualifications

  • Experience with CI/CD pipeline setup using Azure DevOps.
  • Familiarity with Kafka or Event Hub for real-time data streaming.
  • Knowledge of Power BI or Tableau for data visualization (nice to have, not mandatory).