The Myth of the January Barometer Explained
csf.colorado.edu
The so-called January Barometer has been much touted recently in light of January’s positive performance by the DJIA and S&P 500. Prevailing theory has it that if the stock market is up (or down) in January, the rest of the year will be up (or down) also. Yale Hirsch, editor and publisher of the Stock Trader’s Almanac, claims that this “barometer” has worked 44 out of the past 49 calendar years, or with 90% accuracy. The problem is, his work is inaccurate - the January Barometer is a “mis-leading” indicator.
While Hirsch claims only five failures of the January Barometer since 1950, there are in fact 13 failures when properly computed. The eight failures overlooked by Hirsch involve two types of calculation errors. His first error was to include January in the calendar-year performance figures it purports to predict, thus double counting it. In statistical terms, Hirsch erred by including the independent variable in the dependent, or forecast, variable. It is only meaningful to analyze January’s predictive power in relation to the next eleven months, not the twelve months including January. This error yielded six of the eight failures Hirsch missed: one false positive (1987), in which a 13.2% gain in January was followed by a 6.4% loss over the next 11 months, and five false negatives (1953, 1978, 1981, 1984 and 1998), in which negative January performances were erased by greater gains over the ensuing 11 months.
Hirsch’s second calculation error was to ignore dividends. While the impact of dividends on total return is small for the month of January, their inclusion in the perfor-mance of the next 11 months can change the direction of the rest of the year altogether. Stated another way, dividends are 11 times more significant to the predicted outcome than they are to January’s input. By using total return in the calculations, two additional failures are uncovered: one false positive (1966) and one false negative (1956); and three failures are found to contain both calculation errors: 1953, 1981 and 1984.
When Hirsch’s calculation errors are corrected, the Barometer rule of thumb worked just 36 out of the past 49 years, or with just 73% accuracy. This is less accuracy than would have been achieved by the default “forecast” that the stock market will simply be up every calendar year, a forecast that was 80% accurate during the upwardly-biased stock market period since 1950 to which Hirsch limits his study. As a predictive tool, the January Barometer is largely meaningless. A second area of dispute with regard to the January Barometer concerns the computational technique used to derive it. At work beneath Hirsch’s claim is Bayes Conditional Probability Formula, a shortcut technique to computing “if-then” probabilities. It is a blunt tool at best, since it ignores the magnitude of percentage changes. The danger of looking only at the direction (up or down, gain or loss) of market performance, as Hirsch does, is that an apparent pattern may emerge that cannot be confirmed when both direction and magnitude are considered.
The January Barometer, if true, would amount to a claim of a linear relationship of directional equivalency; that is, any up (or down) in January implies any up (or down) for the rest of the year. For the directional equivalency to be meaningful, that is, for it to be a reliably repeating pattern worthy of being considered a value-added forecasting tool, a proportional relationship should also be reflected in the magnitudes of the data. There should be a high coefficient of variation using regression analysis on the data.
Regression analysis is a much more powerful tool than Bayesian analysis when two-dimensional data is available, as it is in this case (direction and magnitude). Bayesian analysis only produces percentage accuracy as a single probability, while regression analysis yields much more valuable information: a single-point forecast, along with its standard deviation and standard error estimates, creating a whole range of probabilities and associated confidence intervals.
A linear regression of the data since 1950 reveals that as a predictor, January has an R-squared, or coefficient of variation, of only 10.8% (see accompanying chart). For comparison, random-walk holds that an average month has an R-squared of 8.3%, since each is one-twelfth of a year. However, because January historically contributes 1.5 times the average monthly total return of the other months, the random-walk notion is that January should have an R-squared 1.5 times higher, or 12.0%. Since its R-squared is only 10.8%, this means that during the past 49 years, the market has decided to discount January’s value as a predictor by about one-third compared to the other months on average. Thus, a linear regression shows that the associative notion of “strength begets strength” momentum - that is, out-sized performance gains in January leading to further market gains for the rest of the year - does not work in this case. While January may be the month with the best chance of being a good predictor, it actually has very little incremental value over any other month as a forecasting tool for the following 11 months. Using regression analysis we can uncover more of the source of the problem of the Barometer’s apparent pattern. Regression analysis considers not only the direction of the independent and dependent variables, but also the proportionality (direct or inverse) of the size of their gains or losses. This analysis technique prevents small-performance Januarys - whether up or down - from creating large directional implications for the rest of the year, a result embodied in Hirsch’s false claims of accuracy for the Barometer. Remember that Hirsch’s calculation errors, discussed above, have a significant effect on the small-performance January inputs in his analysis, and correction of those errors significantly reduces his claims for the Barometer’s directional accuracy.
For example, correctly calculated, the 11 smallest January performances since 1950, or a large 23% of the database selected by Hirsch, have a Bayesian accuracy of 82% (9 out of 11). However, regression analysis gives these cases very little weight because of their small magnitudes. Further, this means that the remaining 77% of his limited database has an even worse Bayesian probability accuracy of 71% (27 out of 38) - that is, a far larger portion of the database has a probability accuracy well below the 80% accuracy of the default “forecast” during the period since 1950 (see page 2). Apart from Hirsch’s calculation errors, we now know how the Barometer’s misleading pattern emerges: because the magnitudes of the gains and losses are ignored in his analysis, the small-performance Januarys actually generate the bulk of his asserted pattern. We hasten to point out that if Hirsch’s data selection and calculations were not so contrived in the first place, there would be no significant difference between Bayesian and regression analyses.
Another way to see the incremental power of regression analysis in examining Hirsch’s claims is to compare the forecasting errors of regression analysis with the directional failures of Bayesian analysis. The 13 largest forecast errors using linear regression have only four calendar years in common with the 13 directional failures using Bayesian analysis. For example, the first and third largest regression errors were 1974 and 1954, respectively, which Bayesian analysis crudely claims as directional successes. The substantial difference between the forecast errors and the directional failures argues for using the more powerful regression analysis technique. Even the Barometer’s regression formula itself (see chart) reveals January’s relative unimportance in predicting rest-of-year performance: the bigger predictor term in the equation (alpha) is a 10.2% gain each calendar year, while the second term is a fraction (9/10) of January’s performance, a much smaller number.
As mentioned above, the short and biased database Hirsch has selected to support his claim is another area of concern. He compounds the inaccuracy of his calculations by his use of the upwardly-biased stock market period since 1950. This more bullish market period allows Hirsch’s double counting of January - a month which already has 1.5 times the average monthly total return of the other months - to minimize the Barometer’s largest incidence of directional failures: false negatives, i.e., down Januarys and up rest-of-years. Utilizing an expanded database back to 1926, or even better, back to the 1890s when the Dow Jones averages were first constructed, we expect an even lower level of reliability would be discovered for the January Barometer, whether using regression or Bayesian analysis techniques. We believe those 25 to 50 years of higher market cyclicality would further expose the Barometer’s ineffectiveness in predicting cyclical stock market performance with linear extrapolation.
One final consideration is the problematic choice of which index to use in applying the January Barometer. While weighted indexes, such as the S&P 500 Hirsch uses, were up for January 1999, an equally-weighted index such as the Value Line Geometric Composite was down (-1.4%). What sense does a “barometer” make when widely different rest-of-year trends are forecasted by simply using different stock market indexes? The best the Barometer can do by making linear extrapolations over extremely long periods is to suggest that such divergences will continue. However, not only are there much better tools for spread analysis, but we know that such divergences usually do not persist for long periods of time. Efficient markets ensure that such divergences are much more temporary than price trends themselves. Thus the Barometer’s problem of different market indexes yielding conflicting rest-of-year directional forecasts under-scores a fundamental flaw: its linear extrapolation over relatively long periods (at the ratio of 11:1). Aside from the other errors and problems with the Barometer we have critiqued, this flaw is no doubt the basis of the suspicions of those old pros who never believed that the Barometer passed the “smell test” in the first place.
In short, when calculated and analyzed properly, January’s performance - whatever the index used - says very little about the performance of the stock market for the rest of the year - or for any other period, which also can be demonstrated. This conclusion is consistent with the fact that the capital markets exhibit cyclical movement and that linear extrapolations don’t work in a cyclical world - especially when the extrapolation period is 11 times longer.
No doubt the popularity of the January Barometer is due in part to its appearance in the New Year time frame. It also borrows from the appeal of the “strength-begets-strength” type of momentum investing. And like all promoters of myths, Hirsch attempts to give the Barometer an aura of believability by referencing some rational, fundamental cause. He claims, “the passage of the Twentieth Amendment to the Constitution fathered the January Barometer.” He points out that Congress convenes in January, setting the political tone and inspiring the stock market direction for the rest of the year. He likes this explanation so well that he claims the Barometer has a perfect directional forecasting record in odd years - when new Congresses convene that really set the political tone and really inspire the stock market direction for the year. Again Hirsch is wrong, since the Barometer, when correctly calculated, did not work in the odd years of 1953, 1981, and 1987, as noted above. More importantly, of course, the January Barometer cannot rise above its statistical and analytical limitations and failings discussed above.
What our analysis shows is that, in addition to his gross calculation errors, Hirsch has the sense of the relevance of January performance exactly wrong. In his widely published material, Hirsch claims that big-January magnitudes are the important predictor data, though he completely ignores the magnitudes of gains and losses in his shallow analysis of the accuracy of his claims for the Barometer. Further, we have shown that the stock market disagrees with him, discounting big Januarys by at least one-third on average over the past 49 years. Finally, Hirsch apparently fails to appreciate that in fact the directional changes of small Januarys are primarily responsible for creating the anomalous pattern that drives his false conclusions.
The January Barometer is another example of numerology trying to masquerade as good research, reminding us that “things not worth doing, are not worth doing well.” The real danger of such a “barometer” is that it is a concept in search of a time frame for which its predictive powers can be claimed. While the January Barometer may be a more meaningful predictor than the Super Bowl indicator, even that may not be true this year - after all, the Denver Broncos’ win “predicts” a decline in the market!
February 12, 1999 Robert E. Bronson, III, Principal Bronson Capital Markets Research www.bob@bronsons.com
Anne V. Yates, President Investment Forecasting & Management |