Useful magic commands in Jupyter Notebook/Lab

30 mins read Jupyter Notebook/Lab is the go-to tool used by data scientists and developers worldwide to perform data analysis nowadays. It provides […]

Different approaches for finding feature importance using Random Forests

16 mins read In many (business) cases it is equally important to not only have an accurate, but also an interpretable model. Oftentimes, […]


18 mins read GROUP BY A table in a database has columns of information in it. Each column in a table represents an […]

Common loss functions for training deep neural networks with Keras examples

30 mins read Deep neural networks are trained using the stochastic gradient descent optimization algorithm. As part of the optimization algorithm, the error for […]

Handling skewness in features by applying transformation in Python

13 mins read In this tutorial, you will learn how to deal with your data when it is not following the normal distribution. One […]

A tutorial on Apache Cassandra data modeling – RowKeys, Columns, Keyspaces, Tables, and Keys

24 mins read In this post, I will discuss the basic concepts of data modeling in Apache Cassandra. It is important to understand […]

Understanding Cassandra Partition Key, Composite Key, and Clustering Key

13 mins read 1. Overview Data distribution and data modeling in the Cassandra NoSQL database are different from those in a traditional relational […]

Out of Bag (OOB) score in Random Forests with example

12 mins read Introduction This post describes the intuition behind the Out of Bag (OOB) score in Random forest, how it is calculated, […]

Understanding the Random Forest algorithm and its hyperparameters

17 mins read In this post, we will see how the Random Forest algorithm works internally. To truly appreciate it, it might be […]

Machine Learning From Scratch Series: K-means Clustering

22 mins read Introduction Clustering is one of the most common exploratory data analysis techniques used to get an intuition about the structure of […]

Connect to Cassandra Cluster with Dbeaver Community edition

2 mins read DataStax offers the JDBC driver from Magnitude (formerly Simba) to users at no cost so you should be able to […]

Difference between discriminative and generative machine learning models

8 mins read Introduction In today’s world, Machine learning becomes one of the popular and exciting fields of study that gives machines the ability […]

Feature selection for categorical data with Python code

17 mins read Feature selection is the process of identifying and selecting a subset of input features that are most relevant to the target […]

Basic feature engineering tasks for numeric and categorical data with Python code

34 mins read Machine learning pipelines Any intelligent system basically consists of an end-to-end pipeline starting from ingesting raw data and leveraging data […]

Understanding Expectation-Maximization (EM) algorithm

18 mins read The EM algorithm is often used in machine learning as an algorithm for data clustering.​​ Sometimes, one of the clustering problems […]

A guide to different Cross-Validation methods in Machine Learning

19 mins read In machine learning (ML), generalization usually refers to the ability of an algorithm to be effective across various inputs. It […]

Understanding the Dummy Variable Trap with example

4 mins read Linear regression is a method we can use to quantify the relationship between one or more predictor variables and a response variable. […]

Interpreting ACF and PACF Plots for AR and MA models

12 mins read Autocorrelation analysis is an important step in the Exploratory Data Analysis of time series forecasting. The autocorrelation analysis helps detect patterns […]

Identifying order of Auto Regression and Moving Average processes using ACF and PACF Plots

5 mins read Selecting candidate Auto Regressive Moving Average (ARMA) models for time series analysis and forecasting, understanding Autocorrelation function (ACF), and Partial autocorrelation function (PACF) plots of the […]

Understanding Alternating Least Squares algorithm for implicit collaborative filtering recommendations

23 mins read Overview We’re going to write a simple implementation of an implicit (more on that below) recommendation algorithm. We want to […]