Neural Networks Tutorial
Hydroinformatics 98
Lars Kai Hansen
Department of Mathematical Modeling
Building 321
Technical University of Denmark
DK-2800 Lyngby, DENMARK
email: lkhansen@imm.dtu.dk
http://eivind.imm.dtu.dk
Neural networks are increasingly
popular tools for modeling of complex dynamics, noisy time series and pattern recognition problems
which arises, e.g., in hydroinformatics.
Neural networks are often considered as so-called
black box models. They are indeed very well suited for
modeling systems in which the underlying rules
are hard to reveal. Neural nets learn
statistical relations from observations rather than relying on algorithmic
"solutions".
Most standard
neural network architectures posses the property of being universal
learners, i.e., by choosing the architecture carefully it is possible to
learn any task (static relation).
The network provides a relation from a set of input to a set of
output variables, hence captures aspects of the conditional distribution
of the output variables - conditioned on the inputs.
The network is trained to perform the desired task by minimizing
a performance measure or cost function with respect to the
network parameters on the set of training data consisting of
input-output examples. Typical cost functions are mean square errors
(for regression or function approximation) and the entropic error
measure (for pattern recognition nets). Costfunctions can be derived
applying maximum likelihood methods or using the so-called Baysian framework.
Application of Neural Networks
Basic considerations
Three basic issues should be addressed
before applying neural networks in the real world:
What are the input variables of interest?
What should be predicted?
How is succes measured?
Checklist: Topics for choice of Neural Network Paradigm
We here list the most important questions
arising when choosing among the different
neural network methodologies (paradigms).
See
Chris Bishops recent textbook
for a general introduction to these topics.
Computational universality.
A wide class of feed-forward
networks have been shown to be universal functional approximators, i.e., they can model any reasonable function.
In plain words: you can be sure that it is possible to solve
the task with a network in the class.
Efficient learning schemes. It is comforting to know
that the task in question can be solved, but equally
important is it to be sure that the learning
algorithm makes efficient use of the invoked floating point
calculations, and identifies a relevant network solution, if not optimal then useful.
Architecture Optimization.
The ``hidden agenda'' in statistical modeling is that
we would like to be able to
perform well on ALL conceivable data,-- not just the data
in the training set --ie., we would like to perform
well on future test-inputs. Merely fitting the training set is of
no interest. Most learning problems and statistical
modeling tasks are concerned with the compromise between misfit and
overfitting,- aka the bias-variance lemma. If the model
complexity is too low, the learning machine will have a large
misfit, e.g., training error. If the model complexity is too high
the model will memorize the training data, hence , have a very low
training error, but the test error will be high. The test error
is defined as the expected error on an hithereto unseen test example.
The two most widely used techniques for implementing this
compromise are regularization and pruning.
Both schemes have been shown to improve test performance
significantly.
Statistical Evaluation. The test error -- being defined as an
average quantity --cannot be measured. It is however possible to
estimate test performance, either using statistical theory or test
sets. Test sets are data from the database that are hold out during
training. By repeating the training procedure
with different training/test set splittings of the database, a
so-called cross validation scheme may be implemented.
Active learning. For a number of the standard neural net models it
has shown possible to use the trained neural net to guide the expert in
providing more examples. This can, e.g., be implemented by computing
the kinds of input, for which getting the ``teacher'' output
will be most informative and hence, after subsequent training, increase
the retrained networks' test performance.
Ensembles
When the specifics of the problem has been resolved, and a given family
of network algorithms has been chosen one faces an optimization problem
with many feasible or near-feasible solutions. Since neural networks
involve non-linear adaptation, the training process often provides a
wide variety of solutions,
e.g., due to random initializations or random sequencing of examples.
Rather than dismissing solutions it is recommended to form collective
decisions, i.e., by forming consensus, among the ensemble of network
solutions. A related, though different approach, has been advocated in
work on "Mixture of experts". These algorithms involve a gating
mechanism that makes decision about
which network (or group of networks) to rely on for a specific input.
One may think of this tool as a way of using network that are
specialised in certain regions of input space.
Of course one may also consider to form consensus among models derived
from different network families. There is not reported much experience
with such heterogeneous ensembles.
The DTU groups neural network WWW repository
Our on-line postscript papers
Selected references
H. Akaike: "Fitting Autoregressive
Models for Prediction", Annals of the Institute of Statistical Mathematics, vol. 21, 243--247, 1969.
Key paper on generalization error estimation
S. Geman, E. Bienenstock and R. Doursat, "Neural Networks and the
Bias/Variance Dilemma". Neural Computation, vol. 4, pp. 1--58, 1992.
Best description of the Bias-Variance dilemma I know.
L.K. Hansen and J. Larsen: "Linear Unlearning for Cross-Validation". Advances in Computational Mathematics.
Introduces unlearning for neural networks as a technique
for approximate cross-validation.
L.K. Hansen and P. Salamon: "Neural Network Ensembles". IEEE
Transactions on Pattern Analysis and Machine Intelligence, vol. 12, no.
10, pp. 993--1001, Oct. 1990. One of the first papers on neural net
ensembles.
J. Hertz, A. Krogh and R.G. Palmer: "Introduction to the Theory of
Neural Computation". Redwood City, California: Addison-Wesley
Publishing Company, 1991. Classic neural network textbook, still one of
the best texts to learn neural computing theory from.
J. Larsen & L.K. Hansen: "Empirical Generalization Assessment
of Neural Network Models". In F. Girosi, J. Makhoul, E. Manolakos &
E. Wilson (eds.), Proceedings of the IEEE Workshop on Neural Networks
for
Signal Processing V, Piscataway, New Jersey: IEEE, pp. 30--39, 1995.
Proposes many measures of generalization performance.
Y. Le Cun, J.S. Denker and S.A. Solla: "Optimal Brain Damage". In
D.S. Touretzky (ed.) Advances in Neural Information Processing Systems
2, Proceedings of the 1989 Conference, San Mateo, California: Morgan
Kaufmann Publishers, 1990, pp. 598--605. Seminl pruning paper
J. Moody: "The Effective Number of Parameters: An Analysis of
Generalization and Regularization
in Nonlinear Learning Systems". In J.E. Moody, S.J. Hanson, R.P.
Lippmann (eds.) Advances in Neural Information
Processing Systems 4, Proceedings of the 1991 Conference,
San Mateo, California: Morgan Kaufmann Publishers, 1992, pp. 847--854.
Seminal paper on application of generalization error estimates.
N. Murata, S. Yoshizawaand and S. Amari: "Network
Information Criterion --- Determining the Number of Hidden Units for an
Artificial Neural Network Model".
IEEE Transactions on Neural Networks, vol. 5, no. 6, pp. 865--872, Nov.
1994. Rather detailed review of generalization error estimates for
general neural models
C. Svarer, L.K. Hansen, and J. Larsen: "On Design and Evaluation of Tapped Delay Line Networks".
In Proceedings of the 1993 IEEE International Conference on Neural Networks, San Francisco, vol. 1, 46--51, 1993a.
First paper to give a detalied recipe for pruning and
evaluation of networks for time series prediction
Get postscript
A.S. Weigend, B.A. Hubermann and D.E. Rumelhart:
"Predicting the Future: A Connectionist Approach". International
Journal of Neural Systems, vol. 1, no. 3, pp. 193--209, 1990. Widely
recognised contribution on time series prediction by neural networks
A.S. Weigend and N. Gershenfeld: "Time Series Analysis: Predicting
the Future and Understanding the Past". Lecture Notes Santa Fe
Institute, Addison Wesley (1994).
Great book on modeling and forecasting in time series.
H. White: "Learning in Artificial Neural Networks: A Statistical
Perspective". Neural Computation, vol. 1, pp. 425--464, 1989. Great
paper with a statistical analysis
of generalization
Return to homepage.