-
Editing In Progress!
On a double XOR task with 2 flag bits (6inputUnits,2outputs) and a 75% random dataset patterns (00 00 00) through (11 11 11) with all beginning with '00' having nonrandom targets, I tested two kinds of prediction by adding an extra output layer whose targets were either the current activations for the hidden layer, or the activations for the input layer (the pattern).
AA = Auto-Association /Input Prediction HA = Hidden Anticipation / Hidden Layer Prediction
Prediction Error and Output Error signify error for specific pattern [00 00 00]
Each describes a situation a different hidden layer size:
Below: Using 4 Hidden Units, we can clearly see that prediction seems to hinder learning of the non-random patterns. Auto-Association may not learn at all, as we can see the TSS remains unchanged after 1500 epochs. Hidden Anticipation allows for a faster learning at one point, but begins to hinder further progress.
![]() |
|![]() |
![]() |
|![]() |
Below: Here HA shows a clear advantage, while AA appears to hinder learning.
![]() |
|![]() |
![]() |
|![]() |
Below: Both HA and AA show clear advantages in learning non-random patterns. Auto-Association begins to outperform hidden prediction.
![]() |
|![]() |
![]() |
|![]() |
Below: Both HA and AA show clear advantages in learning non-random patterns. The network with Auto-Association is able to learn the non-random portion of the dataset quickly.
![]() |
|![]() |
![]() |
|![]() |
















