There are no flies on those who regress.
Sequence prediction, Bayes, Solomonoff prior, Kolmogorov complexity, Occam’s razor, prediction bounds, model classes, philosophical issues, symmetry principle, confirmation theory, reparametrization invariance, oldevidence/ updating problem, (non)computable environments.
en.wikipedia.org
en.wikipedia.org
nlreg.com
Introduction
Examples and goal. Given the weather in the past, what is the probability of rain tomorrow? What is the correct answer in an IQ test asking to continue the sequence 1,4,9,16,? Given historic stock-charts, can one predict the quotes of tomorrow? Assuming the sun rose 5000 years every day, how likely is doomsday (that the sun does not rise) tomorrow? These are instances of the important problem of inductive inference or time-series forecasting or sequence prediction. Finding prediction rules for every particular (new) problem is possible but cumbersome and prone to disagreement or contradiction. What we are interested in is a formal general theory for prediction.
Bayesian sequence prediction.
The Bayesian framework is the most consistent and successful framework developed thus far [Ear93]. A Bayesian considers a set of environments=hypotheses=models M which includes the true data generating probability distribution µ. From one’s prior belief w in environment 2Mand the observed data sequence x=x1...xn, Bayes’ rule yields one’s posterior confidence in . In a predictive setting, one directly determines the predictive probability of the next symbol xn+1 without the intermediate step of identifying a (true or good or causal or useful) model. Note that classification and regression can be regarded as special sequence prediction problems, where the sequence x1y1...xnynxn+1 of (x,y)-pairs is given and the class label or function value yn+1 shall be predicted. Universal sequence prediction. The Bayesian framework leaves open how to choose the model class M and prior w. General guidelines are that M should be small but large enough to contain the true environment µ, and w should reflect one’s prior (subjective) belief in or should be non-informative or neutral or objective if no prior knowledge is available. But these are informal and ambiguous considerations outside the formal Bayesian framework. Solomonoff’s [Sol64] rigorous, essentially unique, formal, and universal solution to this problem is to consider a single large universal class MU suitable for all induction problems.
The corresponding universal prior wU
is biased towards simple environments in such a way that it dominates =- superior to all other priors. This leads to an a priori probability M(x) which is equivalent to the probability that a universal Turing machine with random input tape outputs x.
History and motivation.
Many interesting, important, and deep results have been proven for Solomonoff’s universal distribution M [ZL70, Sol78, LV97, Hut04]. The motivation and goal of this paper is to provide a broad discussion of how and in which sense universal sequence prediction solves all kinds of (philosophical) problems of Bayesian sequence prediction, and to present some recent results. Many arguments and ideas could be further developed. I hope that the exposition stimulates such a future, more detailed, investigation.
Contents.
In Section 2 we review the excellent predictive performance of Bayesian sequence prediction for generic (non-i.i.d.) countable and continuous model classes.
Section 3 critically reviews the classical principles (indifference, symmetry, minimax) for obtaining objective priors, introduces the universal prior inspired by Occam’s razor and quantified in terms of Kolmogorov complexity. In Section 4 (for i.i.d. M) and Section 5 (for universal MU) we show various desirable properties of the universal prior and class (non-zero p(oste)rior, confirmation of universal hypotheses, reparametrization and regrouping invariance, no old-evidence and updating problem) in contrast to (most) classical continuous prior densities. Finally, we show that the universal mixture performs better than classical continuous mixtures, even in uncomputable environments. Section 6 contains critique and summary.
idsia.ch
EC<:-} |