I use a Eudora plugin called Spam Be Gone! (unfortunately, no longer available) that does a credible job of filtering spam. You "train" it yourself, rating messages on a scale of 1 to 5 from least important to most important. It tags messages with the importance level, and you can use this to have Eudora filter unimportant messages into a separate folder.
What you are really doing is classifying each message as to it's "spammishness". This program uses only a single classifier, but you could run multiple classifiers on the same text, training each one for a different property.
The way to distinguish true posts from not so true is through training as well. You need a human with a good "BS filter" to train it. :)
I would not be at all surprised to find that some progressive MMs are doing just this with chat room and BBS discussions. FWIW, NASDR (Nasdaq Regulation) is using similar techniques to filter message-board messages that may point to potential violations. (The major message boards have given them explicit permission to run their spider against their messages.)
A project I've had on the back-burner for some time is to create a web site that will provide indicators using these techniques, drawing from messages on the various public bulletin board systems. It would rate "bullish sentiment", "bearish sentiment", "hype", "bashing", "spam" (yes, spam - spam increases on boards when a stock has made a big move), "credible posts", "dubious posts", etc. etc. etc. Each of these characteristics would be trained by a panel of experts. |