2022-04-14

What are Anchors, Aliases, and Extensions in Docker Compose YAML Files?

8 mins read Docker Compose files are a great way to define multiple containers and services that work together as a stack. But, […]
2022-04-10

Understanding ROC and Precision-Recall curves

25 mins read It can be more flexible to predict probabilities of an observation belonging to each class in a classification problem rather […]
2022-04-09

Finding an optimized portfolio of machine learning models using Sklearn, LazyPredict, and Precise Packages

10 mins read In this post, I will provide an example of the use of the precise Python package (and PyPortfolioOpt) to create a diversified portfolio of […]
2022-04-09

Delving into GPT-2 and GPT-3 Language Models

32 mins read This year, we saw a dazzling application of machine learning. The OpenAI GPT-2 exhibited an impressive ability to write coherent and passionate […]
2022-04-08

Deploy Standalone (Single Node) MinIO server using Docker Compose on Linux

4 mins read MinIO is a high-performance object storage solution with native support for Kubernetes deployments that provides an Amazon Web Services S3-compatible […]
2022-03-30

Understanding Discrete Fourier Transformation with mathematics and Python codes

16 mins read Introduction The Fourier Transformation is applied in engineering to determine the dominant frequencies in a vibration signal. When the dominant […]
2022-03-28

Bulk Boto3 (bulkboto3): Python package for fast and parallel transferring a bulk of files to S3 based on boto3!

5 mins read Table of Contents: Introduction About bulkboto3 Getting Started Prerequisites Installation Usage Contributing Conclusion Introduction “How to transfer a bulk of […]
2022-03-28

A tutorial on data science project experimentation with Jupyter, Papermill, and MLflow

7 mins read Your company (e.g., an e-commerce platform across several countries) is starting a new project on fraud detection. You begin by […]
2022-03-27

Interpreting coefficients of Dummy Variables in a Linear Regression Model

5 mins read Linear regression is a method we can use to quantify the relationship between one or more predictor variables and a response variable. […]
2022-03-26

Styling Pandas dataframes using Styler

7 mins read What is styling and why care? The basic idea behind styling is that a user will want to modify the way […]
2022-03-26

What are Data drift and Concept drift in machine learning pipelines in production?

18 mins read No model lasts forever. Even if the data quality is fine, the model itself can start degrading. What does this […]
2022-03-25

Different Python package import patterns using __init__.py file

10 mins read I have had a few conversations lately about Python packaging, particularly around structuring the import statements to access the various modules of […]
2022-03-24

Feature Importance calculation using Random Forest

5 mins read The feature importance (variable importance) describes which features are relevant. It can help with a better understanding of the solved […]
2022-03-24

When to avoid using Random Forest Regression?

8 mins read In this article, we’ll look at a major problem with using Random Forest for Regression which is extrapolation.  Random Forest Regression […]
2022-03-24

Mel Spectrogram Explained with Python Code

6 mins read Signals A signal is a variation in a certain quantity over time. For audio, the quantity that varies is air pressure. How […]
2022-03-23

A comprehensive tutorial on Transformers Architecture

43 mins read We’ve been hearing a lot about Transformers and with good reason. They have taken the world of NLP by storm […]
2022-03-22

Categorical data type in Pandas

8 mins read You may have categorical data in your dataset. A categorical data is a type with two or more categories. If […]
2022-03-22

NumPy Broadcasting tutorial

13 mins read In operations between NumPy arrays (ndarray), each shape is automatically converted to be the same by broadcasting. This article describes the following […]
2022-03-22

PySpark equivalent methods for Pandas dataframes

8 mins read Pandas is the go-to library for every data scientist. It is essential for every person who wishes to manipulate data […]
2022-03-17

Methods for sampling from complex distributions

8 mins read This writeup includes descriptions from a recent paper on algorithmic sampling, to describe in simpler terms the motivation and approach for […]