SO
Job Description
- Design and architect enterprise-scale data platforms, integrating diverse data sources and tools.
- Develop real-time and batch data pipelines to support analytics and machine learning.
- Define and enforce data governance strategies to ensure security, integrity, and compliance.
- Optimize data pipelines for high performance, scalability, and cost efficiency in cloud environments.
- Implement solutions for real-time streaming data (Kafka, AWS Kinesis, Apache Flink).
- Adopt DevOps/DataOps best practices for deployment and monitoring.
- Required Skills:
- Strong experience in designing scalable, distributed data systems.
- Programming skills in Python, Scala, or Java.
- Expertise in Apache Spark, Hadoop, Flink, Kafka, and cloud platforms (AWS, Azure, GCP).
- Proficiency in data modeling, governance, and warehousing (Snowflake, Redshift, BigQuery).
- Familiarity with security/compliance standards such as GDPR and HIPAA.
- Hands-on experience with CI/CD tools (Terraform, CloudFormation, Airflow, Kubernetes).
- Experience with data infrastructure optimization using tools like Prometheus and Grafana.
- Nice to Have:
- Experience with graph databases, real-time analytics, and IoT solutions.
- Integration of machine learning pipelines.
- Contributions to open-source data engineering communities.