SI
SI
discoversearch

We've detected that you're using an ad content blocking browser plug-in or feature. Ads provide a critical source of revenue to the continued operation of Silicon Investor.  We ask that you disable ad blocking while on Silicon Investor in the best interests of our community.  If you are not using an ad blocker but are still receiving this message, make sure your browser's tracking protection is set to the 'standard' level.
Pastimes : Linux OS.: Technical questions

 Public ReplyPrvt ReplyMark as Last ReadFilePrevious 10Next 10PreviousNext  
To: Mitch Blevins who wrote (295)2/15/2002 11:19:43 AM
From: E. Charters  Read Replies (2) of 484
 
Well I guess you either edit, complicate your logic a lot
or just plain give up :) I guess you could search for classes with closures like s/<head>[^[</head>]]*[</head>]//g and the like if it would let you, and go past line ends, or try to use Perl 6 and its new inclusive regex simplification searches (Damien Conway lectures). Perl 6 improves general class selection patterns for regex. Past line ends? uhh ... Class selection, and 'tween-thingie inclusion/exclusion needs improvement in sed/awk/perl/c. Regex bogs as different closures get tried on each line in recursive loops.

I wrote an HTML stripper in basic that did not need recursin and chopped lines to so many columns selectable, and did not split words. It used rules and about seventy lines of code. A tad complex to write but it worked. Advantage is the stripper had the intelligence to look ahead and drop code between containers that was not display text. I geuss you could use Perl and eat everything between certain containers and not others. Simpler that way.

Gimme ten thousand and I will write it for you.

EC<:-}
Report TOU ViolationShare This Post
 Public ReplyPrvt ReplyMark as Last ReadFilePrevious 10Next 10PreviousNext