Hi Cat Lady,
Comments on your post:
"I think I've seen several nets that achieved their best verification fit after just a few of hours training, but the training data fit kept improving for many more hours, while the verification fit got worse. To me, it seems that it is possible to overtrain a net despite what the documentation says on that point."
Overfitting would behave exactly the way that you describe...the net gets better on the training set, but degrades on the verification set. I've seen this happen on totally unrelated NN problems. (See below for some further comments on overtraining, but only if you are interested.)
"I'm using about 3 years of training data and 4 to 6 months of verification data with my nets, sometimes more, sometimes less. If it looks good at that setting, I'll retrain it with a shorter verification set."
This seems quite a reasonable approach. You are using the maximum amount of data, but yet testing the network quite carefully for evidence of generalization. Another way, slightly more painful, might be to have two nets for each issue, one of which used a verification set and the other of which did not. One would trade this issue as long as the two nets agree for "todays" data, and at least be careful if they did not agree, since it could be an indication that the pattern was changing.
"How long have you trained your successful nets? How do you decide when you've trained them long enough?"
Frankly I don't have a good answer. Generally, I think I train quite a bit longer than Len and Jay, and as I mentioned earlier, I don't see the extremely high confidence levels that they do, normally. Basically, I train everything for 12 hours before I even look at it. I have trained certain nets for as long as 150 hours, but I'm completely unconvinced that it either helped or was necessary. (However, I have a friend who uses Neurostock, and who trades only the Nasdaq 100 options, and he trained his net for almost a month (!!!) before he began to get usable results. I actually don't understand HOW this could have been a good thing, but he's profitable, even now, so I can't argue.)
It's also a bit puzzling, since I have seen the confidence level actually decrease with training, and Andrew told me that it was simply a matter of the 'net understanding it's limitations better, and adjusting the confidence level to reflect that.
Bottom line: I guess I normally train a net for around 20 hours, watching for some sort of "saturation" of the performance. Unfortunately, since I frequently end up with nets that have low confidences I frequently try considerable additional training--up to 50-75 hours. If they are still not effective, then I usually re-build them.
Sorry for the lengthy, non-committal reply. It would be interesting if among us we have any data for the training time dependence of the confidence level to see how it evolves over time, especially as the number of neurons and inputs is varied for the same issue.
Dave
Additional comments on overtraining...only if you are interested...it's a little technical and probably isn't worth a "hill of beans", but if there any REAL NN expert lurkers out there, perhaps they will comment?
I've actually been very concerned about this overtaining/curve-fitting question, and I suspect that CL has demonstrated that overtraining is an issue under some circumstances.
On the other hand, I've not queried Andrew carefully about this, and he does not reveal the structure of the nets so one can not judge exactly how many variables (weights) are being fitted. However, it pretty clearly can be quite a large number. For example, there are two outputs, and numerous inputs--40 or more(!), so if in addition, you are using 40 or so neurons, it is pretty easy to generate hundreds of weights, maybe even thousands. (Note: the number of inputs is NOT the number of "relateds" that you put in. Depending on the number of influence days that you use, there can be substantially more inputs. Of course, we don't know the net architecture, so there is considerable speculation on how they match up with the number of neurons and outputs.)
If you look at one of the *.neu files with a text editor, you will see all your input info in the beginning, and then, there is a very long list of numbers, which I presume are actually weights. The problem is, if you are using only 3 years of data, or about 750 days, and you have hundreds of weights, I would think it would be nearly impossible to avoid "curve-fitting" or "memorization" IF you train to "completion"! (A rule of thumb I've used for OTHER NN problems is that one should have at least 3X the number of data values as there are weights to be fitted.)
One possible way that all the observations that we seem to have made together is if Andrew is relying on a technique related to "early stopping". In this technique, the whole idea is to have an enormous number of weights, compared to the amount of data. This is in contrast to conventional back-prop in which it is almost ALWAYS best to have far more data than weights. In the ES case, it might actually be BAD to either train too long, or to use too few related inputs, since that would generate fewer weights, and would essentially lead to overtraining. BUT, since having more inputs makes it less likely that you will train too long, then everything is OK if you use the full number inputs allotted for your version and the full number of neurons as well.
Anybody out there have any comments? Andrew, if you are there, can you offer any insight without revealing too much proprietary info? |