 |
 |
 |
www.doculibre.com
> Home >
Nutch |
| |
 |
About Nutch
Nutch is an Open
Source search engine entirely programmed in Java. It allows you to
index Web sites, intranets and file systems. It also supports multiple
file formats (HTML, PDF, MS Office, OpenOffice and several others) and
allows the analysis of more than fifteen languages.
The
Apache Hadoop project allows indexing an unlimited number of documents
through its cluster deployment on multiple servers. Hadoop has been
tested on a cluster of 2,000 servers simultaneously and preparations
are now underway for a test on a cluster of 10,000 servers.
Let's
mention that the Hadoop and Nutch projects are funded by Yahoo! and
have a very active community that develop at an accelerated pace.
The Nutch search algorithms are open, therefore it is
easy to get an explanation of the relevancy of search results and
ajust its implementation if necessary. Nutch also offers several plugins which
will perform, among other things, facets searches, alternatives
suggestion to requests ("did you mean"), and more.
The Nutch
modular structure makes possible the creation and adaptation of plugins
in order to enhance its behavior. Nutch establishes the ranking of a
search result based on its content but also on the links pointing to the result.
Visit the Nutch
Web site.
Visit the Hadoop Web site.
|
|
|
|
 |
 |
Quebec |
 |
 |
418.353.3390
|
 |
| |
3181 ch. Sainte-Foy,
Suite 220
Sainte-Foy, Quebec Canada
G1X 1R3 |
 |
 |
Montreal |
 |
 |
514.655.5185
|
 |
| |
8925 St-Laurent, Suite 118
Montreal, Quebec Canada H2N 1M5 |
 |
 |
Ottawa |
 |
 |
613.316.7188
|
 |
| |
601 Gilmour,
Suite 2
Ottawa, Ontario Canada
K1R 5L7 |
 |
 |
International |
 |
 |
514.655.5185
|
 |
| |
|
 |
| |
 |
info@doculibre.com |
|