Data Engineer
Job Description
Job responsibilities:
o Data Architecture & ETL Development: Design & develop robust data architecture. Design & develop scalable ETL design to build performant data pipeline solutions & feature engineering. Write well-structured, efficient, and maintainable code.
o Data Analysis & Feature Engineering: Develop Data Quality checks, perform data analysis and identify gaps for feature engineering. Able to develop/reuse data models & perform preprocessing, perform data mapping & implement data pipelines
o Technical Leadership: Provide technical guidance and mentorship to junior developers. Develop and maintain ETL routines using ETL and orchestration tools such as Airflow.
o Collaboration: Work closely with Architect, Platform owner, cross-functional team such as Business Analysts, product managers, and Data Scientists, to deliver high-quality data pipelines. Collaborate with data scientists to understand data sources & then integrate, design & implement scalable and efficient data pipelines.
o Documentation: Create and maintain data models, data mapping, technical documentation, including design documents and code documentation.
Qualifications & Skills:
o Bachelor's or Master's degree in Computer Science, Data Science, or a related field with Proven experience of 5+ years working as a Data Engineer, in a cloud based environment
o Proficiency of developing batch & streaming ETL pipelines using Python, SQL, Pyspark , open source format files (e.g., Iceberg/Hudi/Delta, Parquet, Avro, ORC) and spark optimisation
o Proven track record in leveraging or developing base frameworks, reusable programs and defining standards eg. Kedro, Great Expectations, etc.
o Experience of analysing, cleaning & transforming vast amounts of raw data from various systems using Spark to provide ready-to-use data to our data scientists, business analysts.
o Experience with cloud platforms AWS specifically S3, Lambda, EC2, EMR, RDS, AWS Glue, etc.
o Understanding of Distributed systems like Hadoop, Streaming - Kafka, Flink, Spark streaming
o Experience of designing & developing data architecture as per data lake paradigms
o Familiarity with Snowflake or other data warehousing platforms
o Experience with data mapping, data lineage, data modelling & understanding of data governance, data security practices
o Proficiency in database design and SQL and NoSQL databases like Redis, DynamoDB.
o Good understanding of the Agile methodologies and working in Scrum/SAFE practices
o Experience on Advanced Analytics project & understanding of AI/ML techniques
o Knowledge of software development best practices, including version control (Git) and continuous integration (CI/CD) processes.
o Strong problem-solving and debugging skills.
o Effective communication skills and the ability to work collaboratively with cross-functional teams.