What kind of network is this?

Normally, implementing a neural network involves creating fully connected arrangements of neurons in layers. The "classic" network treats inputs to a layer as a 1 x n matrix and the weights in the layer as a m x n matrix, where there are n inputs and m outputs to each layer. In part it makes it easier to deal with training when weights can be treated as matricies. In part you can apply second order optimizations and heuristics if you have nice matrices at each layer.

This library makes connections between neurons, not between layers. Using the concept of layers as a point of reference, this library allows you to skip a layer when you wire up the neurons. In fact, there are no real layers. It uses a message passing paradigm to move data from one network to the next network until it reaches an output neuron.

If a neuron is connected to three input neurons, it accumulates messages from the three input networks. Once all the data arrives, it calculates an output and forwards that data to any neurons registered as output neurons. During training, it uses a plain vanilla back-propagation algorithm. Once a neuron receives the error propagation from its output neurons, it caculates its local weight updates, and passes the error information to input neurons. Currently the library won't work with recurrent networks.

What was the motiviation?

I originally wrote the first iteration of this library as part of a graduate school project to try facial recognition using neural networks. The network had alternating layers that performed feature detection and averaging. They were not fully connected layers. I could have written something that was special purpose, i.e. hard wired for the type of network I was using. Instead I wrote a more general library to explore "odd-ball" networks.

For example, during back-propagation error information is attributed to hidden layer weights using using both the relative weights and the error information from the higher layers. A phenomena known as delta attenuation causes the error information to essentially become "dispersed" or "diluted." By connecting "lower" level neurons to "higher" level neurons (closer to the output) error information is subject to less attenuation. Conversely, connecting inputs (for example) directly to higher level neurons increases the contribution of that input to the network outputs.

Other possibilities that I wanted to explore was the role of structure to network performance. For example, if we have a sparsely connected network that means the network has fewer weights. Fewer weights means that training should happen faster. It may also mean that it is less likely to over-train the network since with fewer weights the network is less likely to simply memorize the training data.

Last edited Jun 21, 2009 at 6:12 AM by phoehne, version 1


No comments yet.