A tutorial on Apache Cassandra data modeling – RowKeys, Columns, Keyspaces, Tables, and Keys

24 mins read In this post, I will discuss the basic concepts of data modeling in Apache Cassandra. It is important to understand […]

Understanding Cassandra Partition Key, Composite Key, and Clustering Key

13 mins read 1. Overview Data distribution and data modeling in the Cassandra NoSQL database are different from those in a traditional relational […]

Out of Bag (OOB) score in Random Forests with example

12 mins read Introduction This post describes the intuition behind the Out of Bag (OOB) score in Random forest, how it is calculated, […]

Understanding the Random Forest algorithm and its hyperparameters

17 mins read In this post, we will see how the Random Forest algorithm works internally. To truly appreciate it, it might be […]

Connect to Cassandra Cluster with Dbeaver Community edition

2 mins read DataStax offers the JDBC driver from Magnitude (formerly Simba) to users at no cost so you should be able to […]

Difference between discriminative and generative machine learning models

8 mins read Introduction In today’s world, Machine learning becomes one of the popular and exciting fields of study that gives machines the ability […]

Feature Selection for categorical data with Python code

17 mins read Feature selection is the process of identifying and selecting a subset of input features that are most relevant to the target […]

Basic feature engineering tasks for numeric and categorical data with Python code

34 mins read Machine learning pipelines Any intelligent system basically consists of an end-to-end pipeline starting from ingesting raw data and leveraging data […]

Understanding Expectation-Maximization (EM) algorithm

18 mins read The EM algorithm is often used in machine learning as an algorithm for data clustering.​​ Sometimes, one of the clustering problems […]

A guide to different Cross-Validation methods in Machine Learning

19 mins read In machine learning (ML), generalization usually refers to the ability of an algorithm to be effective across various inputs. It […]

Understanding the Dummy Variable Trap with example

4 mins read Linear regression is a method we can use to quantify the relationship between one or more predictor variables and a response variable. […]

Understanding Alternating Least Squares algorithm for implicit collaborative filtering recommendations

23 mins read Overview We’re going to write a simple implementation of an implicit (more on that below) recommendation algorithm. We want to […]

Understanding AdaBoost algorithm and its mathematics

14 mins read If you’re going through this tutorial, you’ve probably heard of XGBoost, LightGBM, or something of those sorts before. These are […]

Theory of Generalization: growth function, dichotomies, and break points

15 mins read The size of our data set N plays a major role when it comes to the reliability of the generalization Ein […]

Mathematical view of Bias-Variance trade-off

6 mins read The bias-variance trade-off is an important concept in statistics and machine learning. This is used to get better performance out […]