"few investors run simulations of thousands of trades on different markets at different times to see how well their strategy performs"
Part of the problem is that at a certain point accumulating theoretical trades becomes a redundant exercise, not least because many of the observations mainly amount to multiple versions of the same observation. Just to give a simplistic example from the stock world, if I develop a system that happens to buy in and around October '99 and sell a few months later, I can accumulate hundreds ir not thousands of excellent breakout trades that, for all intents and purposes, are the same trade. A similar problem or set of problems apply, if obviously in more complex and complicated ways, to testing of much larger data sets, even those based on so-called "synthetic data."
The approach advocated in most of the system trading literature is to accumulate as much data as you can, even to the extent of creating synthetic data, and to seek a parameter set whose positive results are robust across the entire data set. It's not surprising that the resultant systems are only barely profitable - or, put differently, that they tend to regress to a mean of minimal profitability. It's as though, in designing a bridge, you had to settle on a single design that would adequately on every river in the world. It would be hugely overlong for some rivers, and hugely overbuilt for some traffic patterns, and might on the other fall a bit short crossing some "outlying" rivers, but, on average, it would tend to solve the worldwide river-crossing problem.
The alternative approach is fraught with all the dangers of curve-fitting and data-mining, but, at least to me, is much more intellectually satisfying and interesting. (I haven't, however, been at it long enough to make any great claims for its profitability.) If I run tests producing thousands of theoretical trades on, say, thirty tradables, and I achieve consistently outstanding results on some in the group, but generally fair to poor ones on some others, then I'll see if I can find any objectively measurable differences between the strong performers and the weak ones. Lately, for instance, I've been testing, using, and even discretionarily tweaking a system designed to enter positions on pullbacks in the day's developing trend and to close them either at the end of the day, the early conclusion or reversal of the trend, or at a maximum acceptable loss point. The testing results were, for the most part, entirely unsurprising - really terrific on stocks with "hops" (high volatility, high daily ranges, high measures of trendiness and momentum in various time frames), little better or even worse than random on stocks that can't jump at all.
One alternative would be to throw out the approach (even though it's based on what I thought were sound, simple premises and entry/exit rules) because it achieves consistently weak results theoretically day-trading MSFT, and consistently mediocre results theoretically day-trading MER or QQQ - even though it cleans up on EBAY, ain't too shabby on QCOM, and probably would have made me feel pretty darn wealthy if I'd been using it on QLGC and RIMM over the last couple of years. Another alternative would be to seek a parameter set that achieves consistently mediocre (but at least profitable), results across all testing subjects, even MSFT and MER. The third alternative is to seek to define what made QLGC and EBAY so tradable, and to concentrate on stocks displaying such objectively measurable tradability until and unless it disappears (until and unless EBAY "turns into" MSFT for trading purposes). Though I believe almost any system-trading approach eventually involves some element of filtering or selection of this kind, most system trading approaches intentionally avoid its full utilization. To whatever extent this preference is based on historical accident and conditions that either no longer obtain or do not apply to individual traders (such as the early orientiation of most trading systems to trading commodity markets over long time frames), it's questionable to me, and I would submit that it's at least possible that over-systemization and over-testing can be almost as if not more dangerous to effective trading than curve-fitting. |