FWIW, A Little Something From Steve Kirsch's Homepage...
What follows is a very informal overview (I gather it was written by Steve) of their Java Search Engine Project.
************
Infoseek's Java Search Engine Project Why? Our current search engine (written in C) is designed for extreme speed and relevance Java search engine is for everything else, where we are willing to sacrifice the "extreme speed" requirement. Java search engine is designed to be:
The most powerful search engine ever written
Complete set of search operators and features
Extensible by users
Portable
Compatible (e.g., can be imbedded in a database) Features
small footprint (1Mb in size)
huge limits: 4 Billion docs
speed: we can code critical routines in C if needed since underlying data structures are purposely designed for efficiency (e.g., word alignment rather than byte alignment)
modularity: clean abstractions so we can change pieces of the system without affecting other parts of the system
capability: all the std features of the most powerful search engine
extensible: lots of hooks; public source; clean interfaces to both the normal "user API" as well as the underlying "programmer API"
customizability: lots of switches (e.g., turn off stemming, turn off storing word locations, turn on special recognizers)
suitable for large and small databases
Java and TCP interfaces
distributed searching/STARTS support (so it can be used by our upcoming distributed search clients)
SQL/ODBC/JDBC interfaces (so you can issue a query using ODBC/SQL API interfaces)
correctness of results (full evaluation of all relevant documents) source code provided so you can write your own extensions/customizations, e.g., hook it into your RDBMS, embed in your application, write a doc parser, write a number recognizer, write a company recognizer, etc. ÿ Java Search Engine Status
It works now! Real-time indexing/searching
Extensive documentation ÿ Applications Demanding environments
Demanding customers like Reuters have large feature lists that no existing search engine can meet. Using Java, Reuters (and other companies like Dialog and Lexis-Nexis that have "for pay" information full text search services) can customize and enhance to meet exacting requirements
Registry of third party plug-ins for sale for other demanding users Relational Database full text search plug-in Use Oracle 8.1 indexing APIs/cartridges to provide full text search in an Oracle DBMS
Index any field(s) of the RDBMS
Makes use of Oracle's new index organized tables to store the full text index within the Oracle database instead of the filesystem (allowing you to make use of the replication features of the database)
Makes use of the imbedded Java to run right inside the database itself (new Oracle feature)
Provides a higher performance, more accurate, more feature rich, and more customizable alternative to the Oracle Context solution Add on to Java DBMS's (such as Cloudscape JBMS) Netscape/Hot Java/web browsers Bundle into browser so instead of bookmarking you just click a button to "remember this page" Imbed in E-mail programs Eudora, Netscape mail, etc. Desktop Use as the engine for "desktop search" product Imbed in Infoseek Express so it automatically indexes any page you like ÿ Sample marketing strategies
Source supplied
Free usage if < 500 documents
Encourage Java programmers to download it to index their code/documentation and incorporate into their application
Set up a way for people to sell their Java Search Engine code extensions such as special recognizers, stemmers, XML parsers, etc. in Infoseek's store |