Audio source separation (vocal remover) system based on Deep Learning

12 mins read Table of Contents: Introduction Source Separation Problem Source Separation Use Cases Deep Model Architecture Architecture Training Output Signal Reconstruction Sample […]

Bulk Boto3 (bulkboto3): Python package for fast and parallel transferring a bulk of files to S3 based on boto3!

5 mins read Table of Contents: Introduction About bulkboto3 Getting Started Prerequisites Installation Usage Contributing Conclusion Introduction “How to transfer a bulk of […]

Walk-forward optimization for algorithmic trading strategies on cloud architecture

11 mins read Table of Contents: Introduction Terminology Walk-forward Optimization Design of walk-forwards The Architecture Configuring cloud machines using Ansible Docker Swarm Optimization […]

Coursera Deep Learning Specialization Notes

2 mins read A couple of years ago I completed Deep Learning Specialization taught by AI pioneer Andrew Ng. I found this series […]

How to return pandas dataframes from Scikit-Learn transformations: New API simplifies data preprocessing

3 mins read Scikit-learn is a widely-used Python library in machine learning. In fact, it is usually one of the first libraries we […]

Setting up Apache Airflow using Docker-Compose

11 mins read Although being pretty late to the party (Airflow became an Apache Top-Level Project in 2019), I still had trouble finding […]

Setup collaborative MLflow with PostgreSQL as Tracking Server and MinIO as Artifact Store using docker containers

14 mins read In this post, I will show how to configure MLflow in a way that allows multiple data scientists using different […]

Understanding Gradient Boost Regression by numerical examples and Python Code

13 mins read Gradient boost is a machine learning algorithm that works on the ensemble technique called ‘Boosting’. Like other boosting models, Gradient […]

Make Python faster using Numba

15 mins read As you may know, python is an interpreted language. This means that python code is not directly compiled to machine […]

A simple tutorial on Sampling Importance and Monte Carlo with Python codes

16 mins read Introduction In this post, I’m going to explain the importance sampling. Importance sampling is an approximation method instead of a […]

What is Reservoir Sampling in Stream Processing?

4 mins read Reservoir sampling is a fascinating algorithm that is especially useful when you have to deal with streaming data, which is […]

A comprehensive tutorial on MLflow for MLOps: From experimentation to production

39 mins read After reading this post you will be able to: Understand how you and your Data Science teams can improve your […]

Understanding DenseNet architecture with PyTorch code

20 mins read DenseNet Architecture Introduction In a standard Convolutional Neural Network, we have an input image, that is then passed through the network […]

Partial Dependence Plots with Python code

17 mins read What Are Partial Dependence Plots Some people complain machine learning models are black boxes. These people will argue we cannot see how […]

Understanding Deep U-Nets for Semantic Segmentation: A salt identification case study with Keras

19 mins read Introduction Deep Learning has enabled the field of Computer Vision to advance rapidly in the last few years. In this […]

Understanding Transposed Convolution with Python example

25 mins read Transposed Convolutions is a revolutionary concept for applications like image segmentation, super-resolution, etc but sometimes it becomes a little trickier […]

Understanding the basics of audio data with Python code

36 mins read Overview A huge amount of audio data is being generated every day in almost every organization. Audio data yields substantial […]

The default Random Forest feature importance is not reliable: Understanding Permutation Feature Importance

47 mins read The scikit-learn Random Forest feature importance and R’s default Random Forest feature importance strategies are biased. To get reliable results […]

Setup Apache Spark on a multi-node cluster

12 mins read This article covers basic steps to install and configure Apache Spark Apache Spark 3.1.1 on a multi-node cluster which includes installing spark […]

Stratified K-fold Cross Validation for imbalanced classification tasks

10 mins read Model evaluation involves using the available dataset to fit a model and estimate its performance when making predictions on unseen […]

How to select classification threshold for imbalanced datasets

21 mins read Classification predictive modeling typically involves predicting a class label. Nevertheless, many machine learning algorithms are capable of predicting a probability […]

Predicting Customer Churn with Machine Learning: From EDA to Classification

27 mins read Table of Contents Introduction Objective Libraries Parameters and Variables Functions A Quick Look at our Data Creating a Test Set […]