PHPhygital Insights
Spark ( Pyspark ) Developer
Bangalore ₹3-15 LPA Posted 20 Aug 2025
FULL TIME
hdfs
Hive
Spark
Cassandra
S3
+2 more
Job Description
- The developer must have sound knowledge in Apache Spark and Python programming.
- Deep experience in developing data processing tasks using pySpark such as reading data from external sources, merge data, perform data enrichment and load in to target data destinations.
- Experience in deployment and operationalizing the code is added advantage Have knowledge and skills in Devops/version control and containerization. Preferable having deployment knowledge.
- Create Spark jobs for data transformation and aggregation Produce unit tests for Spark transformations and helper methods
- Write Scaladoc-style documentation with all code
- Design data processing pipelines to perform batch and Real- time/stream analytics on structured and unstructured data
- Spark query tuning and performance optimization Good understanding of different file formats (ORC, Parquet, AVRO) to optimize queries/processing and compression techniques.
- SQL database integration (Microsoft, Oracle, Postgres, and/or MySQL)
- Experience working with (HDFS, S3, Cassandra, and/or DynamoDB)
- Deep understanding of distributed systems (e.g. CAP theorem, partitioning, replication, consistency, and consensus)
- Experience in building cloud scalable high-performance data lake solutions
- Hands on expertise in cloud services like AWS, and/or Microsoft Azure.
Required Skills
- Hive
- Spark
- SQL