An investigation of genetic algorithms and genetic programming
Date published
Free to read from
Authors
Supervisor/s
Journal Title
Journal ISSN
Volume Title
Publisher
Department
Type
ISSN
Format
Citation
Abstract
There are many regression techniques that try and fit known models to sets of data. For them we assume the functional form of the model and use analytic or statistical techniques to find the values of any unknown model parameters. If the target model is thought to be of sufficiently complex a form, the above techniques (i.e. analytic and statistical techniques) may fail to provide the desired results and alternative methods have to be used. This is even more important if the underlying model is itself unknown. Genetic algorithms and genetic programming are two techniques that may help in the search for suitable models. Unfortunately, however, both of these techniques have themselves parameters that need to be specified and there are no clear guidelines to aid such choice. A number of other implementation issues are also open questions and in this thesis we look at a number of ways of implementing genetic algorithms and genetic programs to evaluate alternatives. Simple target models are used throughout most of this work so that the effects of changes to the method's parameters can be monitored. We look at how population size, crossover probability and mutation rate affect the speed of convergence of the genetic algorithm to an acceptable model. One of the most difficult aspects of genetic programming is the issue of the meaning of the offspring produced by crossover or mutation. Some systems arrange that any offspring that do not have meaning are removed from the population. Others ensure that no such offspring can arise. In this work we look at what might happen if we always impose a meaning on all possible offspring. In the genetic programming part of this work we look at two representations of our models. In the first we used a fixed length representation, whilst in the second we used a tree to represent each member of the population. We also look at a number of fitness functions. The commonest such functions are based upon errors between the model and the data. For our fitness functions we also use their correlation coefficient. We found that a strategy that starts by using correlation coefficient and then a fitness that combines both correlation coefficient and error worked better.