Kernel Density Estimation


FEBRUARY 18, 2020

Kernel density estimation with python


Before starting let’s get some background on Estimators, they're classified into two classes

1. Parametric

2. Non-Parametric

Parametric make assumptions about the population from which a sample of data is drawn. Often this assumption is that the population is normally distributed, i.e. bell-shaped. This assumption allows the development of a theory that allows us to draw inferences about the population based on a sample taken from it.

The other family of estimators is Non-Parametric this set of distribution makes no distributional assumptions no fixed structure and depends upon all the data points to reach an estimate. Kernel density estimators belong to this class.

So why Kernel Density Estimation let us see how histograms are just not sufficient.

Histograms are not smooth, depend on the width of the bins and the endpoints of the bins, This is where kernel density estimators alleviate the problem.

let’s see how histogram are affected by bins

#Importing libraries
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import pylab
from scipy.stats.distributions import norm
# Plotting a normal distribution with different bins
mu, sigma = 0, 0.1 # mean and standard deviation
s = np.random.normal(mu, sigma, 1000)
#plotting the different bins
from matplotlib.pyplot import figure
figure(num=None, figsize=(10, 8), dpi=80, facecolor='w', edgecolor='k')

Comparison of bins on Histogram

So we see in the above visualization how bin changes the normalization look

So how do we overcome this?

To remove the dependence on the endpoints of the bins, kernel estimators center a kernel function at each data point. We place a kernel function on every data point to get the density estimates. Just like in high school getting the value of the function on at a given point of x

y = f(x)

Kernal function

Kernel Density Estimate

Kernel Function typically has these following properties

Everywhere non-negative: K(x)≥0 ∀ x∈X

Symmetric : K(x) = K(-x) ∀ x∈X

Decreasing : K`(x) ≤ 0 ∀ x >0

Different Kernel Functions


Different Kernels

The quality of a kernel estimate depends less on the shape of the K than on the value of its bandwidth h. It’s important to choose the most appropriate bandwidth as a value that is too small or too large is not useful.

x = np.concatenate([norm(-1, 1.).rvs(400),norm(1, 0.3).rvs(100)])
sns.kdeplot(x,data2=None ,bw=2,color="yellow",label="bw:2")
sns.kdeplot(x,data2=None ,bw=1,color="red",label="bw: 0.2")
sns.kdeplot(x,data2=None ,bw=.5,color ="blue",label="bw: 0.5") 
sns.kdeplot(x,data2=None ,bw=.3,color="green",label="bw: 0.3")
sns.kdeplot(x,data2=None ,bw=.1,color="grey",label="bw: 0.1")
sns.kdeplot(x,data2=None ,bw=.05,color="grey",label="bw: 0.05")

Different bandwidths

The smoothing bandwidth h plays a key role in the quality of KDE. Here is an example of applying different h to the dataset we see that when h is too small (the gray curve), there are many wiggly structures on our density curve this is under smoothing. On the other hand, when h is too large (the yellow curve), we see that the two bumps are smoothed out. This situation is called over smoothing–some important structures are obscured by the huge amount of smoothing.

Bandwidth selection methods, univariate case

Subjective choice

The natural way for choosing ℎ is to plot out several curves and choose the estimate that best matches one’s prior (subjective) ideas, However, this method is not practical in high-dimensional data.

Maximum likelihood cross-validation

Reference to a standard distribution


The idea of Kernel Density Estimators is to give you an idea about the distribution.


Ronak Chhatbar

AI Developer at

Want to explore all the ways you can start, run & grow your business?

Fill out the information below and we will get in touch with you shortly.