Run spark-submit for Apache Spark (PySpark) using Docker

3 mins read Pre-Requisites docker-compose file Below is a docker-compose file to set up a Spark cluster with 1 master and 2 worker […]

Machine Learning for Big Data using PySpark with real-world projects

10 mins read Introduction I have prepared a GitHub Repository that provides a set of self-study tutorials on Machine Learning for big data […]

A guide on PySpark Window Functions with Partition By

11 mins read When analyzing data within groups, Pyspark window functions can be more useful than using groupBy for examining relationships. First, a […]