ICA - CNL

Neural Networks Tutorial
Hydroinformatics 98

Lars Kai Hansen
Department of Mathematical Modeling
Building 321
Technical University of Denmark
DK-2800 Lyngby, DENMARK
email: lkhansen@imm.dtu.dk
http://eivind.imm.dtu.dk

Neural networks are increasingly popular tools for modeling of complex dynamics, noisy time series and pattern recognition problems which arises, e.g., in hydroinformatics. Neural networks are often considered as so-called black box models. They are indeed very well suited for modeling systems in which the underlying rules are hard to reveal. Neural nets learn statistical relations from observations rather than relying on algorithmic "solutions".

Most standard neural network architectures posses the property of being universal learners, i.e., by choosing the architecture carefully it is possible to learn any task (static relation). The network provides a relation from a set of input to a set of output variables, hence captures aspects of the conditional distribution of the output variables - conditioned on the inputs.
The network is trained to perform the desired task by minimizing a performance measure or cost function with respect to the network parameters on the set of training data consisting of input-output examples. Typical cost functions are mean square errors (for regression or function approximation) and the entropic error measure (for pattern recognition nets). Costfunctions can be derived applying maximum likelihood methods or using the so-called Baysian framework.

Application of Neural Networks

Basic considerations

Three basic issues should be addressed before applying neural networks in the real world:

What are the input variables of interest?

What should be predicted?

How is succes measured?

Checklist: Topics for choice of Neural Network Paradigm

We here list the most important questions arising when choosing among the different neural network methodologies (paradigms). See Chris Bishops recent textbook for a general introduction to these topics.

Computational universality. A wide class of feed-forward networks have been shown to be universal functional approximators, i.e., they can model any reasonable function. In plain words: you can be sure that it is possible to solve the task with a network in the class.

Efficient learning schemes. It is comforting to know that the task in question can be solved, but equally important is it to be sure that the learning algorithm makes efficient use of the invoked floating point calculations, and identifies a relevant network solution, if not optimal then useful.

Architecture Optimization. The ``hidden agenda'' in statistical modeling is that we would like to be able to perform well on ALL conceivable data,-- not just the data in the training set --ie., we would like to perform well on future test-inputs. Merely fitting the training set is of no interest. Most learning problems and statistical modeling tasks are concerned with the compromise between misfit and overfitting,- aka the bias-variance lemma. If the model complexity is too low, the learning machine will have a large misfit, e.g., training error. If the model complexity is too high the model will memorize the training data, hence , have a very low training error, but the test error will be high. The test error is defined as the expected error on an hithereto unseen test example. The two most widely used techniques for implementing this compromise are regularization and pruning. Both schemes have been shown to improve test performance significantly.

Statistical Evaluation. The test error -- being defined as an average quantity --cannot be measured. It is however possible to estimate test performance, either using statistical theory or test sets. Test sets are data from the database that are hold out during training. By repeating the training procedure with different training/test set splittings of the database, a so-called cross validation scheme may be implemented.

Active learning. For a number of the standard neural net models it has shown possible to use the trained neural net to guide the expert in providing more examples. This can, e.g., be implemented by computing the kinds of input, for which getting the ``teacher'' output will be most informative and hence, after subsequent training, increase the retrained networks' test performance.

Ensembles

When the specifics of the problem has been resolved, and a given family of network algorithms has been chosen one faces an optimization problem with many feasible or near-feasible solutions. Since neural networks involve non-linear adaptation, the training process often provides a wide variety of solutions, e.g., due to random initializations or random sequencing of examples. Rather than dismissing solutions it is recommended to form collective decisions, i.e., by forming consensus, among the ensemble of network solutions. A related, though different approach, has been advocated in work on "Mixture of experts". These algorithms involve a gating mechanism that makes decision about which network (or group of networks) to rely on for a specific input. One may think of this tool as a way of using network that are specialised in certain regions of input space. Of course one may also consider to form consensus among models derived from different network families. There is not reported much experience with such heterogeneous ensembles.

The DTU groups neural network WWW repository
Our on-line postscript papers

Selected references

H. Akaike: "Fitting Autoregressive Models for Prediction", Annals of the Institute of Statistical Mathematics, vol. 21, 243--247, 1969. Key paper on generalization error estimation

S. Geman, E. Bienenstock and R. Doursat, "Neural Networks and the Bias/Variance Dilemma". Neural Computation, vol. 4, pp. 1--58, 1992. Best description of the Bias-Variance dilemma I know.

L.K. Hansen and J. Larsen: "Linear Unlearning for Cross-Validation". Advances in Computational Mathematics. Introduces unlearning for neural networks as a technique for approximate cross-validation.

L.K. Hansen and P. Salamon: "Neural Network Ensembles". IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 12, no. 10, pp. 993--1001, Oct. 1990. One of the first papers on neural net ensembles.

J. Hertz, A. Krogh and R.G. Palmer: "Introduction to the Theory of Neural Computation". Redwood City, California: Addison-Wesley Publishing Company, 1991. Classic neural network textbook, still one of the best texts to learn neural computing theory from.

J. Larsen & L.K. Hansen: "Empirical Generalization Assessment of Neural Network Models". In F. Girosi, J. Makhoul, E. Manolakos & E. Wilson (eds.), Proceedings of the IEEE Workshop on Neural Networks for Signal Processing V, Piscataway, New Jersey: IEEE, pp. 30--39, 1995. Proposes many measures of generalization performance.

Y. Le Cun, J.S. Denker and S.A. Solla: "Optimal Brain Damage". In D.S. Touretzky (ed.) Advances in Neural Information Processing Systems 2, Proceedings of the 1989 Conference, San Mateo, California: Morgan Kaufmann Publishers, 1990, pp. 598--605. Seminl pruning paper

J. Moody: "The Effective Number of Parameters: An Analysis of Generalization and Regularization in Nonlinear Learning Systems". In J.E. Moody, S.J. Hanson, R.P. Lippmann (eds.) Advances in Neural Information Processing Systems 4, Proceedings of the 1991 Conference, San Mateo, California: Morgan Kaufmann Publishers, 1992, pp. 847--854. Seminal paper on application of generalization error estimates.

N. Murata, S. Yoshizawaand and S. Amari: "Network Information Criterion --- Determining the Number of Hidden Units for an Artificial Neural Network Model". IEEE Transactions on Neural Networks, vol. 5, no. 6, pp. 865--872, Nov. 1994. Rather detailed review of generalization error estimates for general neural models

C. Svarer, L.K. Hansen, and J. Larsen: "On Design and Evaluation of Tapped Delay Line Networks". In Proceedings of the 1993 IEEE International Conference on Neural Networks, San Francisco, vol. 1, 46--51, 1993a. First paper to give a detalied recipe for pruning and evaluation of networks for time series prediction Get postscript

A.S. Weigend, B.A. Hubermann and D.E. Rumelhart: "Predicting the Future: A Connectionist Approach". International Journal of Neural Systems, vol. 1, no. 3, pp. 193--209, 1990. Widely recognised contribution on time series prediction by neural networks

A.S. Weigend and N. Gershenfeld: "Time Series Analysis: Predicting the Future and Understanding the Past". Lecture Notes Santa Fe Institute, Addison Wesley (1994). Great book on modeling and forecasting in time series.

H. White: "Learning in Artificial Neural Networks: A Statistical Perspective". Neural Computation, vol. 1, pp. 425--464, 1989. Great paper with a statistical analysis of generalization

Return to homepage.