SI
SI
discoversearch

We've detected that you're using an ad content blocking browser plug-in or feature. Ads provide a critical source of revenue to the continued operation of Silicon Investor.  We ask that you disable ad blocking while on Silicon Investor in the best interests of our community.  If you are not using an ad blocker but are still receiving this message, make sure your browser's tracking protection is set to the 'standard' level.
Pastimes : Linux OS.: Technical questions -- Ignore unavailable to you. Want to Upgrade?


To: Mitch Blevins who wrote (295)2/15/2002 11:19:43 AM
From: E. Charters  Read Replies (2) | Respond to of 484
 
Well I guess you either edit, complicate your logic a lot
or just plain give up :) I guess you could search for classes with closures like s/<head>[^[</head>]]*[</head>]//g and the like if it would let you, and go past line ends, or try to use Perl 6 and its new inclusive regex simplification searches (Damien Conway lectures). Perl 6 improves general class selection patterns for regex. Past line ends? uhh ... Class selection, and 'tween-thingie inclusion/exclusion needs improvement in sed/awk/perl/c. Regex bogs as different closures get tried on each line in recursive loops.

I wrote an HTML stripper in basic that did not need recursin and chopped lines to so many columns selectable, and did not split words. It used rules and about seventy lines of code. A tad complex to write but it worked. Advantage is the stripper had the intelligence to look ahead and drop code between containers that was not display text. I geuss you could use Perl and eat everything between certain containers and not others. Simpler that way.

Gimme ten thousand and I will write it for you.

EC<:-}



To: Mitch Blevins who wrote (295)2/17/2002 11:17:53 PM
From: Thomas A Watson  Read Replies (3) | Respond to of 484
 
mitch, lynx -dump
Then I also wrote lynxer that dumps news pages and creates long lines that post directly into messages at SI.

[ 1691 ] > which lynxer
/usr/local/bin/lynxer
[ 1692 ] > cat /usr/local/bin/lynxer
lynx -dump $1 | grep -v http | grep -v gif | grep -v = | fmt -w 1000

my latest cool pics.
pbase.com
my latest toy.
pbase.com
tom watson tosiwmee