# Bayesian Networks – Probabilistic Neural Community (PNN)

## Introduction

Bayesian Networks or statistics type an integral a part of many statistical studying approaches. It entails utilizing new proof to change the prior chances of an occasion. It makes use of conditional chances to enhance the prior chances, which ends up in posterior chances. In easy phrases, suppose you need to verify the chance of whether or not your mates will conform to play a match of badminton given sure climate circumstances. Equally, Bayes Inference kinds an integral a part of Bayesian Networks as a instrument for modeling unsure beliefs. On this article, we discover one sort of Bayesian Networks utility, a Probabilistic Neural Community(PNN), and study in-depth about its implementation by a sensible instance.

**Studying Aims**

- Understanding PNN and its associated ideas
- Ideas of Parzen Window or KDE(kernel density estimate)
- Kernel features as non-parametric methodology to a sure knowledge distribution by an instance.
- Implementation of PNN utilizing python for classification duties

This text was printed as part of the Data Science Blogathon.

## Desk of Contents

## What’s Bayesian Community?

A Bayesian Community makes use of the Bayes theorem to function and offers a easy method of utilizing the Bayes Theorem to resolve advanced issues. In distinction to different methodologies the place chances are decided primarily based on historic knowledge, this theorem entails the research of chance or perception in a end result.

Though the chance distributions for the random variables (nodes) and the connections between the random variables (edges), that are each described subjectively, usually are not completely Bayesian by definition, the mannequin might be thought-about to embody the “perception” a few advanced area.

In distinction to the frequentist methodology, the place chances are solely depending on the earlier incidence of the occasion, bayesian chance entails the research of subjective chances or perception in an end result.

A Bayesian community captures the joint chances of the occasions the mannequin represents.

## What’s Probabilistic Neural Community(PNN)?

A Probabilistic Neural Community (PNN) is a kind of feed-forward ANN by which the computation-intensive backpropagation just isn’t used It’s a classifier that may estimate the pdf of a given set of information. PNNs are a scalable different to conventional backpropagation neural networks in classification and sample recognition purposes. When used to resolve issues on classification, the networks use chance principle to scale back the variety of incorrect classifications.

Supply: Paper by Specht 1990

The PNN goals to construct an ANN utilizing strategies from chance principle like Bayesian classification & different estimators for pdf. The appliance of kernel features for discriminant evaluation and sample recognition gave rise to the widespread use of PNN.

## Ideas of Probabilistic Neural Networks (PNN)

An accepted norm for resolution guidelines or methods used to categorise patterns is that they achieve this in a method that minimizes the “anticipated threat.” Such methods are referred to as “Bayes methods” and might be utilized to issues containing any variety of classes/lessons.

Within the PNN methodology, a Parzen window and a non-parametric operate approximate every class’s father or mother chance distribution operate (PDF). The Bayes’ rule is then utilized to assign the category with the best posterior chance to new enter knowledge. The PDF of every class is used to estimate the category chance of recent enter knowledge. This strategy reduces the chance of misclassification. This Kernel density estimation(KDE) is analogous to histograms, the place we calculate the sum of a gaussian bell computed round each knowledge level. A KDE is a sum of various parametric distributions produced by every commentary level given some parameters. We’re simply calculating the chance of information having a selected worth denoted by the x-axis of the KDE plot. Additionally, the general space beneath the KDE plot sums as much as 1. Allow us to perceive this utilizing an instance.

By changing the sigmoid activation operate, usually utilized in neural networks, with an exponential operate, a probabilistic neural community ( PNN) that may compute nonlinear resolution boundaries that strategy the Bayes optimum is shaped.

#### Parzen Window

The Parzen-Rosenblatt window methodology, also referred to as the Parzen-window methodology, is a popular non-parametric strategy for estimating a chance density operate p(x) for a specific level p(x) from a pattern p(xn), which doesn’t necessitate any prior data or underlying distribution assumptions. This course of is also referred to as kernel density estimation.

Estimating the class-conditional density (“likelihoods”) p(x|wi) in classification utilizing the coaching dataset the place p(x) refers to a multi-dimensional pattern that belongs to a specific class wi is a outstanding utility of the Parzen-window method.

For detailed description of Parzen home windows, check with this link.

#### Understanding Kernel Density Estimation

Kernel density estimation(KDE) is analogous to histograms, the place we calculate the sum of a gaussian bell computed round each knowledge level. A KDE is a sum of various parametric distributions produced by every commentary level given some parameters. We’re simply calculating the chance of information having a selected worth denoted by the x-axis of the KDE plot. Additionally, the general space beneath the KDE plot sums as much as 1. Allow us to perceive this utilizing an instance.

Now we are going to see a distribution of the “sepal size” characteristic of Iris Dataset and its corresponding kde.

Now utilizing the above-mentioned kernel features, we are going to attempt to construct kernel density estimate for sepal size for various values of smoothing parameter(bandwidth).

As we will see, triangle, gaussian, and epanechnikov give higher approximations at 0.8 and 1.0 bandwidth values. As we improve, the bandwidth curve turns into extra clean and flattened, and if we lower, the bandwidth curve turns into extra zigzag and sharp-edged. Thus, bandwidth in PNN might be thought-about much like the ok worth in KNN.

#### KNN and Parzen Home windows

Parzen home windows might be thought-about a k-Nearest Neighbour (KNN) method generalization. Quite than selecting ok nearest neighbors of a check level and labeling it with the weighted majority of its neighbors’ votes, one can contemplate all observations within the voting scheme and assign their weights utilizing the kernel operate.

Within the Parzen home windows estimation, the interval’s size is mounted, however the variety of samples that fall inside an interval modifications over time. For the ok nearest neighbor density estimate, the alternative is true.

## Structure of PNN

The under picture describes the structure of PNN, which consists of 4 vital layers, and they’re:

- Enter Layer
- Sample Layer
- Summation Layer
- Output Layer

Allow us to now attempt to perceive every layer one after the other.

#### Enter Layer

On this layer, every characteristic variable or predictor of the enter pattern is represented by a neuron within the enter layer. For instance, in case you have a pattern with 4 predictors, the enter layer ought to have 4 neurons. If the predictor is a categorical variable with N classes, then we convert it to an N-1 dummy and use N-1 neurons. We additionally normalize the info utilizing appropriate scalers. The enter neurons then ship the values to every of the neurons within the hidden layer, the subsequent sample layer.

#### Sample Layer

This layer has one neuron for every commentary within the coaching knowledge set. A hidden neuron first determines the Euclidean distance between the check commentary and the sample neuron to use the radial foundation kernel operate. For the Gaussian kernel, the multivariate estimates might be expressed as,

the place,

For every neuron “i” within the sample layer, we discover the Euclidean distance between the check enter and the sample.

Sigma = Smoothing parameter

d= every characteristic vector dimension

x = check enter vector

xi = sample ith neuron vector

#### Summation Layer

This layer consists of 1 neuron for every class or class of the goal variable. Suppose now we have three lessons. Then we can have three neurons on this layer. Every Kind of sample layer neuron is joined to its corresponding Kind neuron within the summation layer. Neurons on this layer sum and common the values of sample layer neurons connected to it. **vi **is the output of every neuron right here.

#### Output Layer

The output layer predicts the goal class by evaluating the weighted votes gathered within the sample layer for every goal class.

## Algorithm of PNN

The next are the high-level steps of the PNN algorithm:

1. Standardize the enter options and feed them to the enter layer.

2. Within the sample Layer, every coaching commentary kinds one neuron and kernel with a selected smoothing parameter/bandwidth worth used as an activation operate. For every enter commentary, we discover the kernel operate worth Okay(x,y) from every sample neuron, i.e., coaching commentary.

3. Then sum up the Okay(x,y) values for patterns in the identical class within the summation layer. Additionally, take a median of those values. Thus, the variety of outputs for this layer equals the variety of lessons within the “goal” variable.

4. The ultimate layer output layer compares the output of the previous layer, i.e., the summation layer. It checks the utmost output for which class label is predicated on common Okay(x,y) values for every class within the previous layer. The anticipated class label is assigned to enter commentary with the best worth of common Okay(x,y).

## Code Instance on Iris Dataset

The next is a python code instance of implementing PNN on the iris dataset and predicting labels for the check set. We’ll undergo every step introduced within the algorithm, so open your pocket book and begin coding!

#### Step 1 – Load the dataset and import libraries

Importing essential libraries

```
from sklearn.datasets import load_iris
import numpy
import pandas
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score,f1_score
```

Loading iris dataset from **sklearn.datasets.**

```
iris = load_iris();
knowledge = pd.DataFrame(knowledge= np.c_[iris['data'], iris['target']],
columns= iris['feature_names'] + ['target']);
knowledge.head(5)
```

#### Step 2- Construct Enter Layer

We standardize the dataset and cut up it into check and prepare

```
# Standardise enter and cut up into prepare and check units
X = knowledge.drop(columns="goal",axis=1);
Y = knowledge[['target']]
scaler = StandardScaler();
X_scaled = scaler.fit_transform(X);
X_train,X_test,Y_train,Y_test = train_test_split(X,Y,train_size=0.8,random_state=12);
```

#### Step 3 – Assemble Kernel features

```
uniform = lambda x,b: (np.abs(x/b) <= 1) and 1/2 or 0
triangle = lambda x,b: (np.abs(x/b) <= 1) and (1 - np.abs(x/b)) or 0
gaussian = lambda x,b: (1.0/np.sqrt(2*np.pi))* np.exp(-.5*(x/b)**2)
laplacian = lambda x,b: (1.0/(2*b))* np.exp(-np.abs(x/b))
epanechnikov = lambda x,b: (np.abs(x/b)<=1) and ((3/4)*(1-(x/b)**2)) or 0
```

**Step 4 – Construct Sample Layer**

```
def pattern_layer(inp,kernel,sigma):
k_values=[];
for i,p in enumerate(X_train.values):
edis = np.linalg.norm(p-inp); #discover eucliden distance
ok = kernel(edis,sigma); #move values of euclidean dist and
#smoothing parameter to kernel operate
k_values.append(ok);
return k_values;
```

#### Step 5 – Construct Summation Layer

```
def summation_layer(k_values,Y_train,class_counts):
# Summing up every worth for every class after which averaging
summed =[0,0,0];
for i,c in enumerate(class_counts):
val = (Y_train['target']==class_counts.index[i]).values;
k_values = np.array(k_values);
summed[i] = np.sum(k_values[val]);
avg_sum = checklist(summed/Y_train.value_counts());
return avg_sum
```

#### Step 6 – Construct Output Layer

```
def output_layer(avg_sum,class_counts):
maxv = max(avg_sum);
label = class_counts.index[avg_sum.index(maxv)][0];
return label
```

#### Step 7- Bringing collectively all layers beneath PNN Mannequin

```
## Bringing all layers collectively beneath PNN operate
def pnn(X_train,Y_train,X_test,kernel,sigma):
# Initialising variables
class_counts = Y_train.value_counts()
labels=[];
#Passing every pattern commentary
for s in X_test.values:
k_values = pattern_layer(s,kernel,sigma);
avg_sum = summation_layer(k_values,Y_train,class_counts);
label = output_layer(avg_sum,class_counts);
labels.append(label);
print('Labels Generated for bandwidth:',sigma);
return labels;
```

#### Step 8 – Producing Predictions

```
#Candidate Kernels
kernels = ['Gaussian','Triangular','Epanechnikov'];
sigmas = [0.05,0.5,0.8,1,1.2];
outcomes = pd.DataFrame(columns=['Kernel','Smoothing Param','Accuracy','F1-Score']);
for ok in kernels:
if ok=='Gaussian':
k_func = gaussian;
elif ok=='Triangular':
k_func = triangle;
else:
k_func = epanechnikov;
for b in sigmas:
pred = pnn(X_train,Y_train,X_test,k_func,b);
accuracy = accuracy_score(Y_test.values,pred);
f1= f1_score(Y_test.values,pred,common="weighted")
outcomes.loc[len(results.index)]=[k,b,accuracy,f1];
```

#### Step 9 – Evaluating scores of various kernels at totally different smoothing parameter values

```
plt.rcParams['figure.figsize'] = [10, 5]
plt.subplot(121)
sns.lineplot(y=outcomes['Accuracy'],x=outcomes['Smoothing Param'],hue=outcomes['Kernel']);
plt.title('Accuracy for Totally different Kernels',loc="proper");
plt.subplot(122)
sns.lineplot(y=outcomes['F1-Score'],x=outcomes['Smoothing Param'],hue=outcomes['Kernel']);
plt.title('F1-Rating for Totally different Kernels',loc="left");
plt.present()
```

## Conclusion

Thus, we noticed utilizing PNN; we get excessive accuracy and f1 rating with primarily based on optimum kernel and bandwidth choice. Additionally, the best-performing kernels had been Gaussian, Triangular, and Epanechnikov kernels. The next are the important thing takeaways:

1. PNN permits us to construct quick and fewer advanced networks involving few layers.

2. We noticed varied combos of kernel features might be employed, and the optimum kernels might be chosen primarily based on efficiency metrics.

3. PNN is much less time-consuming because it doesn’t contain advanced computations.

4. PNN can seize advanced resolution boundaries because of nonlinearity launched by kernels that are current as activation features.

Thus, PNN has extensive scope and implementations in varied domains.

**The media proven on this article just isn’t owned by Analytics Vidhya and is used on the Creator’s discretion. **