Skip to main content
HomeTutorialsDeep Learning

Deduce the Number of Layers and Neurons for ANN

There is an optimal number of hidden layers and neurons for an artificial neural network (ANN). This tutorial discusses a simple approach for determining the optimal numbers for layers and neurons for ANN's.
Sep 2018  · 9 min read

deduce the number of layers and neurons for ann banner

Beginners in artificial neural networks (ANNs) are likely to ask some questions. Some of these questions include what is the number of hidden layers to use? How many hidden neurons in each hidden layer? What is the purpose of using hidden layers/neurons? Is increasing the number of hidden layers/neurons always gives better results? I am pleased to tell we could answer such questions. To be clear, answering such questions might be too complex if the problem being solved is complicated. By the end of this article, you could at least get the idea of how these questions are answered and be able to test yourself based on simple examples.

diagram

Introduction

ANN is inspired by the biological neural network. For simplicity, in computer science, it is represented as a set of layers. These layers are categorized into three classes which are input, hidden, and output.

Knowing the number of input and output layers and number of their neurons is the easiest part. Every network has a single input and output layers. The number of neurons in the input layer equals the number of input variables in the data being processed. The number of neurons in the output layer equals the number of outputs associated with each input. But the challenge is knowing the number of hidden layers and their neurons.

Here are some guidelines to know the number of hidden layers and neurons per each hidden layer in a classification problem:

  • Based on the data, draw an expected decision boundary to separate the classes.
  • Express the decision boundary as a set of lines. Note that the combination of such lines must yield to the decision boundary.
  • The number of selected lines represents the number of hidden neurons in the first hidden layer.
  • To connect the lines created by the previous layer, a new hidden layer is added. Note that a new hidden layer is added each time you need to create connections among the lines in the previous hidden layer.
  • The number of hidden neurons in each new hidden layer equals the number of connections to be made.

To make things clearer, let’s apply the previous guidelines for a number of examples.

Example One

Let’s start with a simple example of a classification problem with two classes as shown in the following figure. Each sample has two inputs and one output that represents the class label. It is much similar to XOR problem.

classification problem with two classes diagram

The first question to answer is whether hidden layers are required or not. A rule to follow in order to determine whether hidden layers are required or not is as follows:

In artificial neural networks, hidden layers are required if and only if the data must be separated non-linearly.

Looking at next figure, it seems that the classes must be non-linearly separated. A single line will not work. As a result, we must use hidden layers in order to get the best decision boundary. In such case, we may still not use hidden layers but this will affect the classification accuracy. So, it is better to use hidden layers.

Knowing that we need hidden layers to make us need to answer two important questions. These questions are:

  1. What is the required number of hidden layers?

  2. What is the number of the hidden neurons across each hidden layer?

Following the previous procedure, the first step is to draw the decision boundary that splits the two classes. There is more than one possible decision boundary that splits the data correctly as shown in the figure below. The one we will use for further discussion is in right part of the figure (b).

decision boundary splitting two classes diagram

Following the guidelines, next step is to express the decision boundary by a set of lines.

The idea of representing the decision boundary using a set of lines comes from the fact that any ANN is built using the single layer perceptron as a building block. The single layer perceptron is a linear classifier which separates the classes using a line created according to the following equation:

y=w1x1+w2x2+ ⋯ +wixi+b

Where xi is the ith input, wi is its weight, b is the bias, and y is the output. Because each hidden neuron added will increase the number of weights, thus it is recommended to use the least number of hidden neurons that accomplish the task. Using more hidden neurons than required will add more complexity.

Returning back to our example, saying that the ANN is built using multiple perceptron networks is identical to saying that the network is built using multiple lines.

In this example, the decision boundary is replaced by a set of lines. The lines start from the points at which the boundary curve change direction. At such point, two lines are placed, each in a different direction.

Because there is just one point at which the boundary curve change direction as shown in next figure by a gray circle, then there will be just two lines required. In other words, there are two single layer perceptron networks. Each perceptron produces a line.

two single layer perceptron networks diagram

Knowing that there are just two lines required to represent the decision boundary tells us that the first hidden layer will have two hidden neurons.

Up to this point, we have a single hidden layer with two hidden neurons. Each hidden neuron could be regarded as a linear classifier that is represented as a line as in the previous figure. There will be two outputs, one from each classifier (i.e. hidden neuron). But we are to build a single classifier with one output representing the class label, not two classifiers. As a result, the outputs of the two hidden neurons are to be merged into a single output. In other words, the two lines are to be connected by another neuron. The result is shown in the next figure.

Fortunately, we are not required to add another hidden layer with a single neuron to do that job. The output layer neuron will do the task. Such neuron will merge the two lines generated previously so that there is only one output from the network.

merged layer perceptron networks diagram

After knowing the number of hidden layers and their neurons, the network architecture is now complete as shown in the next figure.

completed network architecture diagram

Example Two

Another classification example is shown in the next figure. It is similar to the previous example in which there are two classes where each sample has two inputs and one output. The difference is in the decision boundary. The boundary of this example is more complex than the previous example.

two inputs one output classification figure

According to the guidelines, the first step is to draw the decision boundary. The decision boundary to be used in our discussion is shown in left-most part of the next figure (a).

The next step is to split the decision boundary into a set of lines, where each line will be modeled as a perceptron in the ANN. Before drawing lines, the points at which the boundary change direction should be marked as shown in the right-most part of the next figure (b).

decision boundary figure

The question is how many lines are required? Each of top and bottom points will have two lines associated to them for a total of 4 lines. The in-between point will have its two lines shared from the other points. The lines to be created are shown in figure below.

Because the first hidden layer will have hidden layer neurons equal to the number of lines, the first hidden layer will have 4 neurons. In other words, there are 4 classifiers each created by a single layer perceptron. At the current time, the network will generate 4 outputs, one from each classifier. Next is to connect these classifiers together in order to make the network generating just a single output. In other words, the lines are to be connected together by other hidden layers to generate just a single curve.

four connected classifiers figure

It is up to the model designer to choose the layout of the network. One feasible network architecture is to build a second hidden layer with two hidden neurons. The first hidden neuron will connect the first two lines and the last hidden neuron will connect the last two lines. The result of the second hidden layer. The result of the second layer is shown in figure below.

second layer with two hidden neutrons diagram

Up to this point, there are two separated curves. Thus there are two outputs from the network. Next is to connect such curves together in order to have just a single output from the entire network. In this case, the output layer neuron could be used to do the final connection rather than adding a new hidden layer. The final result is shown in following figure.

connected curves figure

After network design is complete, the complete network architecture is shown in the following figure.

completed network architecture figure

For More Info

SlideShare

YouTube

Topics

Learn more about Deep Learning

Course

Introduction to Deep Learning in Python

4 hr
243K
Learn the fundamentals of neural networks and how to build deep learning models using Keras 2.0 in Python.
See DetailsRight Arrow
Start Course
See MoreRight Arrow
Related

What is Stable Code 3B?

Discover everything you need to know about Stable Code 3B, the latest product of Stability AI, specifically designed for accurate and responsive coding.
Javier Canales Luna's photo

Javier Canales Luna

11 min

The 11 Best AI Coding Assistants in 2024

Explore the best coding assistants, including open-source, free, and commercial tools that can enhance your development experience.
Abid Ali Awan's photo

Abid Ali Awan

8 min

How the UN is Driving Global AI Governance with Ian Bremmer and Jimena Viveros, Members of the UN AI Advisory Board

Richie, Ian and Jimena explore what the UN's AI Advisory Body was set up for, the opportunities and risks of AI, how AI impacts global inequality, key principles of AI governance, the future of AI in politics and global society, and much more. 
Richie Cotton's photo

Richie Cotton

41 min

The Power of Vector Databases and Semantic Search with Elan Dekel, VP of Product at Pinecone

RIchie and Elan explore LLMs, vector databases and the best use-cases for them, semantic search, the tech stack for AI applications, emerging roles within the AI space, the future of vector databases and AI, and much more.  
Richie Cotton's photo

Richie Cotton

36 min

An Introduction to the Mamba LLM Architecture: A New Paradigm in Machine Learning

Discover the power of Mamba LLM, a transformative architecture from leading universities, redefining sequence processing in AI.
Kurtis Pykes 's photo

Kurtis Pykes

9 min

Getting Started with Claude 3 and the Claude 3 API

Learn about the Claude 3 models, detailed performance benchmarks, and how to access them. Additionally, discover the new Claude 3 Python API for generating text, accessing vision capabilities, and streaming.
Abid Ali Awan's photo

Abid Ali Awan

See MoreSee More