2022-05-26

Handling cyclical features, such as hours in a day, for machine learning pipelines with Python example

11 mins read What’s the difference between 23 and 1? If we’re talking about time, it’s 2. Hours of the day, days of […]
2022-05-26

Common mistakes to avoid as a Machine Learning Engineer

5 mins read In machine learning, there are many ways to build a product or solution and each way assumes something different. Many […]
2022-05-25

What are skip connections in deep learning?

17 mins read Nowadays, there is an infinite number of applications that someone can do with Deep Learning. However, in order to understand […]
2022-05-24

Performing A/B test in Python example – A case study from Udacity Data Scientist Nano Degree

11 mins read This is a simple walkthrough of an A/B test case study developed and used by Udacity. It is part of […]
2022-05-19

Understanding the basics of Bayesian Inference with Python Code

10 mins read Why did someone have to invent the Bayesian Inference? In one sentence: to update the probability as we gather more data. The […]
2022-05-16

SQL Window Functions explained with example

32 mins read All database users know about regular aggregate functions which operate on an entire table and are used with a GROUP […]
2022-05-08

Encoding categorical features using the category_encoders package

11 mins read There are loads of different ways to convert categorical variables into numeric features so they can be used within machine […]
2022-05-04

Understand different feature scaling techniques with Python code

19 mins read In many machine learning algorithms, to bring all features in the same standing, we need to do scaling so that […]
2022-04-27

Understanding and interpreting Residuals Plot for linear regression

27 mins read Interpreting Residual Plots to Improve Your Regression When you run a regression, calculating and plotting residuals help you understand and improve your […]
2022-04-24

Implementing Transformers step-by-step in PyTorch from scratch

14 mins read Doing away with clunky for-loops, the transformer instead finds a way to allow whole sentences to simultaneously enter the network […]
2022-04-10

Understanding ROC and Precision-Recall curves

25 mins read It can be more flexible to predict probabilities of an observation belonging to each class in a classification problem rather […]
2022-04-09

Delving into GPT-2 and GPT-3 Language Models

32 mins read This year, we saw a dazzling application of machine learning. The OpenAI GPT-2 exhibited an impressive ability to write coherent and passionate […]
2022-03-28

A tutorial on data science project experimentation with Jupyter, Papermill, and MLflow

7 mins read Your company (e.g., an e-commerce platform across several countries) is starting a new project on fraud detection. You begin by […]
2022-03-27

Interpreting coefficients of Dummy Variables in a Linear Regression Model

5 mins read Linear regression is a method we can use to quantify the relationship between one or more predictor variables and a response variable. […]
2022-03-26

Styling Pandas dataframes using Styler

7 mins read What is styling and why care? The basic idea behind styling is that a user will want to modify the way […]
2022-03-24

When to avoid using Random Forest Regression?

8 mins read In this article, we’ll look at a major problem with using Random Forest for Regression which is extrapolation.  Random Forest Regression […]
2022-03-23

A comprehensive tutorial on Transformers Architecture

43 mins read We’ve been hearing a lot about Transformers and with good reason. They have taken the world of NLP by storm […]
2022-03-22

Categorical data type in Pandas

8 mins read You may have categorical data in your dataset. A categorical data is a type with two or more categories. If […]
2022-03-22

NumPy Broadcasting tutorial

13 mins read In operations between NumPy arrays (ndarray), each shape is automatically converted to be the same by broadcasting. This article describes the following […]
2022-03-22

PySpark equivalent methods for Pandas dataframes

8 mins read Pandas is the go-to library for every data scientist. It is essential for every person who wishes to manipulate data […]