Fran�ais
www.doculibre.com > Home > Nutch
 

About Nutch

Nutch is an Open Source search engine entirely programmed in Java. It allows you to index Web sites, intranets and file systems. It also supports multiple file formats (HTML, PDF, MS Office, OpenOffice and several others) and allows the analysis of more than fifteen languages.

The Apache Hadoop project allows indexing an unlimited number of documents through its cluster deployment on multiple servers. Hadoop has been tested on a cluster of 2,000 servers simultaneously and preparations are now underway for a test on a cluster of 10,000 servers.

Let's mention that the Hadoop and Nutch projects are funded by Yahoo! and have a very active community that develop at an accelerated pace.


The Nutch search algorithms are open, therefore it is easy to get an explanation of the relevancy of search results and ajust its implementation if necessary. Nutch also offers several plugins which will perform, among other things, facets searches, alternatives suggestion to requests ("did you mean"), and more.

The Nutch modular structure makes possible the creation and adaptation of plugins in order to enhance its behavior. Nutch establishes the ranking of a search result based on its content but also on the links pointing to the result.

Visit the Nutch Web site.
Visit the Hadoop Web site.

Quebec
418.353.3390
  3181 ch. Sainte-Foy,
Suite 220
Sainte-Foy, Quebec
Canada
G1X 1R3
Montreal
514.655.5185
  8925 St-Laurent,
Suite 118
Montreal, Quebec
Canada
H2N 1M5
Ottawa
613.316.7188
  601 Gilmour,
Suite 2
Ottawa, Ontario
Canada
K1R 5L7
International
514.655.5185
 
  info@doculibre.com