2023-09-12

Run spark-submit for Apache Spark (PySpark) using Docker

3 mins read Pre-Requisites docker-compose file Below is a docker-compose file to set up a Spark cluster with 1 master and 2 worker […]
2023-01-11

Machine Learning for Big Data using PySpark with real-world projects

10 mins read Introduction I have prepared a GitHub Repository that provides a set of self-study tutorials on Machine Learning for big data […]
2022-02-17

A guide on PySpark Window Functions with Partition By

11 mins read When analyzing data within groups, Pyspark window functions can be more useful than using groupBy for examining relationships. First, a […]