SI
SI
discoversearch

We've detected that you're using an ad content blocking browser plug-in or feature. Ads provide a critical source of revenue to the continued operation of Silicon Investor.  We ask that you disable ad blocking while on Silicon Investor in the best interests of our community.  If you are not using an ad blocker but are still receiving this message, make sure your browser's tracking protection is set to the 'standard' level.
Strategies & Market Trends : Zeev's Turnips - No Politics -- Ignore unavailable to you. Want to Upgrade?


To: Win-Lose-Draw who wrote (92032)7/6/2002 9:05:19 PM
From: (Bob) Zumbrunnen  Read Replies (2) | Respond to of 99280
 
I've seriously considered doing just that, and was even talking to someone else about that today. I've done similar things in Perl many times, and still have a script I use occasionally to fetch all the links to existing boards here, since they get dumped from the search engine after a few months.

I've got the horsepower and the bandwidth. There are 4 main issues, though:

1. I wrote a test script to see if I could do it. I can. But, SI is so huge and responds so slowly, it would take 110 24-hour days of running to get it all. I think I can tweak more speed out of it by fetching posts in 10-at-a-time mode, but that would still take several weeks.

2. SI goes into no-response mode often enough that it makes it a bear to make sure nothing was missed.

3. If I'm sucking the contents out of the SI bucket through a 100Mb straw, it's likely to get noticed.

4. INSP is a litiguous (sp?) company and I don't know if archiving their content would be illegal, but even if it weren't, I can't go toe-to-toe with them in court to try to prove it.