SI
SI
discoversearch

We've detected that you're using an ad content blocking browser plug-in or feature. Ads provide a critical source of revenue to the continued operation of Silicon Investor.  We ask that you disable ad blocking while on Silicon Investor in the best interests of our community.  If you are not using an ad blocker but are still receiving this message, make sure your browser's tracking protection is set to the 'standard' level.
Pastimes : The Death of Silicon Investor
INSP 130.68-4.9%Dec 12 9:30 AM EST

 Public ReplyPrvt ReplyMark as Last ReadFilePrevious 10Next 10PreviousNext  
To: Jeffrey S. Mitchell who wrote (440)1/22/2002 12:22:55 PM
From: (Bob) Zumbrunnen  Read Replies (2) of 1003
 
Interesting. I'm currently writing a spell-checker and it looks like I'm doing things a bit differently. I started with a list of about 134k words I got off the internet somewhere and converted them to lower-case. When I check the spelling, I check the lower-case version of the word submitted.

To tweak the speed and completeness, I'm planning to have it run through all existing messages and track the number of hits for each word, and track the number of misses in a separate table. Then any word that was never hit (in 250k messages) will get removed from the dictionary and any valid words that were frequent "misses" will get added.

Once in production, I'll have a maintenance screen that'll show me "misses" that need to be added.

I'd have it do the additions automatically by setting a number of times a word can be "missed" before it's automatically added to the word list, but if there's anything I know, I know the people I'm dealing with. And can just see people repeatedly using a typo just to get it added to the dictionary. <g> Heck, I know it's the first thing I'd do.

What I've currently got written isn't going to work for production (takes about 2 seconds to spell-check a typically long-winded message of mine on my home equipment), so once I have it done, John will use it as a model to write something a lot faster that never makes calls to the SQL backend.
Report TOU ViolationShare This Post
 Public ReplyPrvt ReplyMark as Last ReadFilePrevious 10Next 10PreviousNext