## Kernel Density Estimation

Before starting let’s get some background on Estimators, they're classified into two classes

*1. Parametric*

*2. Non-Parametric*

Parametric make assumptions about the population from which a sample of data is drawn. Often this assumption is that the population is normally distributed, i.e. bell-shaped. This assumption allows the development of a theory that allows us to draw inferences about the population based on a sample taken from it.

The other family of estimators is Non-Parametric this set of distribution makes no distributional assumptions no fixed structure and depends upon all the data points to reach an estimate. Kernel density estimators belong to this class.

*So why Kernel Density Estimation let us see how histograms are just not sufficient.*

Histograms are not smooth, depend on the width of the bins and the endpoints of the bins, This is where kernel density estimators alleviate the problem.

##### let’s see how histogram are affected by bins

```
#Importing libraries
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import pylab
from scipy.stats.distributions import norm
# Plotting a normal distribution with different bins
mu, sigma = 0, 0.1 # mean and standard deviation
s = np.random.normal(mu, sigma, 1000)
#plotting the different bins
from matplotlib.pyplot import figure
figure(num=None, figsize=(10, 8), dpi=80, facecolor='w', edgecolor='k')
plt.hist(s,bins=10,label="10")
plt.hist(s,bins=50,label="50",color="green")
plt.hist(s,bins=300,label="300",color="orange")
plt.hist(s,bins=500,label="500",color="white")
plt.show()
```

Comparison of bins on Histogram

So we see in the above visualization how bin changes the normalization look

*So how do we overcome this?*

To remove the dependence on the endpoints of the bins, kernel estimators center a kernel function at each data point. We place a kernel function on every data point to get the density estimates. Just like in high school getting the value of the function on at a given point of x

y = f(x)

##### Kernal function

Kernel Density Estimate

##### Kernel Function typically has these following properties

Everywhere non-negative: **K(x)≥0 ∀ x∈X**

Symmetric : **K(x) = K(-x) ∀ x∈X**

Decreasing : **K`(x) ≤ 0 ∀ x >0**

Different Kernel Functions

```
sns.kdeplot(x,data2=None,bw=.4,color=”yellow”,label=”gaussian”,kernel=”gau”)
sns.kdeplot(x,data2=None,bw=.4,color=”black”,label=”biw”,kernel=”biw”)
sns.kdeplot(x,data2=None,bw=.4,color=”red”,label=”cos”,kernel=”cos”)
sns.kdeplot(x,data2=None,bw=.4,color=”green”,label=”epa”,kernel=”epa”)
sns.kdeplot(x,data2=None,bw=.4,color=”blue”,label=”tri”,kernel=”tri”)
sns.kdeplot(x,data2=None,bw=.4,color=”green”,label=”triw”,kernel=”triw”)
```

Different Kernels

The quality of a kernel estimate depends less on the shape of the K than on the value of its bandwidth h. It’s important to choose the most appropriate bandwidth as a value that is too small or too large is not useful.

```
x = np.concatenate([norm(-1, 1.).rvs(400),norm(1, 0.3).rvs(100)])
sns.kdeplot(x,data2=None ,bw=2,color="yellow",label="bw:2")
sns.kdeplot(x,data2=None ,bw=1,color="red",label="bw: 0.2")
sns.kdeplot(x,data2=None ,bw=.5,color ="blue",label="bw: 0.5")
sns.kdeplot(x,data2=None ,bw=.3,color="green",label="bw: 0.3")
sns.kdeplot(x,data2=None ,bw=.1,color="grey",label="bw: 0.1")
sns.kdeplot(x,data2=None ,bw=.05,color="grey",label="bw: 0.05")
plt.legend();
```

Different bandwidths

The smoothing bandwidth h plays a key role in the quality of KDE. Here is an example of applying different h to the dataset we see that when h is too small (the gray curve), there are many wiggly structures on our density curve this is under smoothing. On the other hand, when h is too large (the yellow curve), we see that the two bumps are smoothed out. This situation is called over smoothing–some important structures are obscured by the huge amount of smoothing.

##### Bandwidth selection methods, univariate case

*Subjective choice*

The natural way for choosing ℎ is to plot out several curves and choose the estimate that best matches one’s prior (subjective) ideas, However, this method is not practical in high-dimensional data.

*Maximum likelihood cross-validation*

*Reference to a standard distribution*

*Conclusion*

The idea of Kernel Density Estimators is to give you an idea about the distribution.

### Want to explore all the ways you can start, run & grow your business?

Fill out the information below and we will get in touch with you shortly.