To: Optim who wrote (239 ) 10/16/1998 1:49:00 AM From: Bill Scoggin Respond to of 805
For what its worth, one book I have describes the Simulated Annealing process as this: If you picture the possible values that the networks error terms can be as a 3D surface such as a valley surrounded by mountains, the most desired weight set will place the error term at the very lowest point of the valley's basin. As the net trains, the error term will slowly move its way down the mountains toward lower levels. Ultimately, we want to find the absolute lowest error term possible for all training data sets - making that error point be the absolute lowest Global Minimum of the Error term (the square of the largest error between training data and calculated outputs - squared to get rid of negative signs - or something to that affect). The simulated annealing process imitates the process of heating up a piece of metal and slowly cooling it off such that the molecular structure will settle to its lowest energy state - in this case, the error term will settle to its lowest possible value - which in theory will provide the most accurate net. The reason this is such a strong training tool is that it will supposedly find the Global minimum and not just a local minimum. One of the problems with Back Propagation training is that the net can settle into a Local Minimum point (pictured as the bottom of a small ridge or valley ON the side of one of those 3D mountain surfaces instead of the Global minimum). If the net's error term settles into one of these areas, then learning will cease, ie: the error term is in a low spot that it can't get out of, but its not THE lowest spot. The book also points out that often times the network will be plenty accurate enough even though it is not at the Global minimum for some applications. Another theory suggested that if enough hidden layer neurons exist, then the probability of settling into a local minimum us almost zero - at the cost of longer training times of course. Basically, it said that by trial and error, the best fit of number of neurons, weights, etc. could be found. One of my books is about 4 years old so there could be some better training technologies out there now than when this was written. I know that the Quick Prop routine is not covered in this book and it is very popular in other software packages (this routine considers the error term as a parabola and by knowing where your at on a parabola, you can calculate how much and which direction to move to find the bottom of the parabola). Similar idea but Quick Prop trains more rapidly. Anyway, I think I have correctly paraphrased the ideas. The book that most of this came out of is "Neural Network and Fuzzy Logic Applications using C/C++" by Stephen T. Welstead - ordered it from the local bookstore - about $40. It would be nice if we had access to the Error term of the network in Neurostock instead of just the confidence level, etc. - whatever that is exactly. Bill