MyPage is a personalized page based on your interests.The page is customized to help you to find content that matters you the most.


I'm not curious

Top 5 Open Source Search Engines

Published on 12 August 15
7268
0
0
Top 5 Open Source Search Engines - Image 1
Many know what a search engine is, what it does and even how it functions using keywords. But to dwell in deeper into the mechanics of familiar search engines that we are fond of such as Google and Yahoo, it is pretty useful to understand more about information retrieval through an open source search engine.

Unlike the layman user search engine that common folks know about, an open source search engine can be considered an intermediate tool component that is part of an ordinary everyday search process. For example, if you are on a Google page and you would like to find out more about the history of football (or soccer if you are from the United States), you simply just type in the keywords and a large array of search results will appear in front of you in less than a second. And within that few mini seconds, Google would utilize the open source search tool to retrieve all the necessary information from the toolâs library database (encompassing of thousands of servers) and categorized them neatly in specific indexes. Providing us with the Search Engine Results Pages (SERPs) we have today.

For start-up business trying to make their website prevalent on the internet, Open source search engines can also be regarded to be an alternative to the conventional search engines that we used. For starters, Google and Yahoo may not be practical to these businesses due to its costly fees and the fact that these conventional search engines focus on well established websites. Hence, many small enterprises choose to use open source search engines as they are free, the software is actively maintained and you can customize its programming codes for specific preferences.

Now that you have a better idea of how an open source search engine works and its undeniable usefulness, letâs dive into my recommended list of open source search engines and dwell further into the technicalities.
1. Lucene
Top 5 Open Source Search Engines - Image 2
Lucene is one of the more established open source search engines out there with a text search engine library that is written purely in Java. Its indispensable software can be used for any application that requires full-text search.

Lucene can be used across platforms, has a configurable storage engine (Codecs) and has many powerful query types such as proximity queries and phrase queries.

At the moment, its open source project is available for free download and Twitter is actually using Lucene for real time search.

Programming Language: Originally Java but ported to other languages such as: Delphi, Perl, C#, C++, Python, Ruby, and PHP
License: Apache Software Foundation
Ranking of search results: Versatile (follows popular choices)
Indexing style: multiple-index searching with merged results


2. Sphinx
Top 5 Open Source Search Engines - Image 3
Sphinx is an open source full text search server that is programmed with relevant search quality and integration simplicity.
Sphinx allows flexible testing whereby its indexing features include full support for SBCS and UTF-8 encodings, stopword removal and optional hit position removal (hitless indexing); morphology and synonym processing through word forms dictionaries and stemmers; exceptions and blended characters; and many more.

Sphinx has an easy application integration that is derived from 3 different APIs. It has a native library for many programming languages, a pluggable storage engine for MySQL and an application query that uses MySQL client library and syntax.
Websites such as Craigslist, Living Social, MetaCafe and Groupon has adopted Sphinx for its searches.

Programming Language: C++
License: GPLv2 and commercial
Ranking of search results: Versatile
Indexing style: SQL database indexing and Non-SQL storage indexing.



3. Xapian
Top 5 Open Source Search Engines - Image 4
Xapian, termed as an open source probabilistic information retrieval library, provides a full text search engine library for programmers.

It possesses a wide range of structured Boolean search operators which are allocated based on probabilistic weights. There are also Boolean filters to restrict a probabilistic search.

Xapianâs search engine also has the dexterous ability to support the searchâs word synonyms explicitly and as an automatic form of query expansion.

Also, if you're looking for a fully packaged search engine that is derived upon Xapian, you may install Omega into your site. A great aspect about Xapian is its versatility is that allows you to extend to Omega to meet your needs as they grow.
Currently, Xapian is used as a search engine for the Library of the University of Cologne and Die Zeit (A popular German newspaper)

Programming Language: C++
License: GNU General Public License
Ranking of search results: Flexible (important words become more probable than unimportant words)
Indexing Style: Filing system


4. Indri
Top 5 Open Source Search Engines - Image 5
Indri is an open source search engine that prides itself through its state-of-the-art text search and a rich structured query language for text collections of up to 50 million documents (single machine) or 500 million documents (distributed search). Indri is multi platform and is applicable in Linux, Solaris, Windows and Mac OSX.

Indri is can support UTF-8 encoded text and is able to parse PDF, HTML, XML and TREC documents. It also recognizes text annotations.

One of Indriâs significant involvements is being the search engine component of the Lemur toolkit. The Lemur toolkit came from the partnership between the Center for Intelligent Information Retrieval and the Language Technologies Institute at Carnegie Mellon University. The partnership between the 2 institutions developed the Lemur Toolkit, an open-source (BSD license) software framework for building language modeling and information retrieval software.

Programming Language: Java, PHP, or C++
License: BSD style license
Ranking of search results: Versatile (Explicit term weighting and Robust query language)
Indexing style: Flexible indexing with tokenization


5. Zettair
Top 5 Open Source Search Engines - Image 6
Written and designed by the Search Engine Group at RMIT University, Zettair is a compact and fast text search engine which allows you to index and search HTML (or TREC) collections. It also formatted for simplicity as well as speed and flexibility, and one of its fundamental features is the ability to handle large amounts of text.

Other features that Zettair has are its Boolean, ranked and phrase querying, Modular C API and itâs easy to use command-line interface. Not to mention the search engine is applicable for many platforms including Solaris and Linux.

Programming language: C
License: BSD âstyle License
Ranking of search results: simple and straightforward
Indexing style: Single Executable (when an index doesn't exist, Zettair will create one for you based on the parameters you provide)


If you feel something better and note worthy Open Source Search Engine is missing in this list, please donât hesitate to leave a comment below!

References:
http://sphinxsearch.com/about/sphinx/






























Top 5 Open Source Search Engines - Image 1

Many know what a search engine is, what it does and even how it functions using keywords. But to dwell in deeper into the mechanics of familiar search engines that we are fond of such as Google and Yahoo, it is pretty useful to understand more about information retrieval through an open source search engine.

Unlike the layman user search engine that common folks know about, an open source search engine can be considered an intermediate tool component that is part of an ordinary everyday search process. For example, if you are on a Google page and you would like to find out more about the history of football (or soccer if you are from the United States), you simply just type in the keywords and a large array of search results will appear in front of you in less than a second. And within that few mini seconds, Google would utilize the open source search tool to retrieve all the necessary information from the toolâs library database (encompassing of thousands of servers) and categorized them neatly in specific indexes. Providing us with the Search Engine Results Pages (SERPs) we have today.

For start-up business trying to make their website prevalent on the internet, Open source search engines can also be regarded to be an alternative to the conventional search engines that we used. For starters, Google and Yahoo may not be practical to these businesses due to its costly fees and the fact that these conventional search engines focus on well established websites. Hence, many small enterprises choose to use open source search engines as they are free, the software is actively maintained and you can customize its programming codes for specific preferences.

Now that you have a better idea of how an open source search engine works and its undeniable usefulness, letâs dive into my recommended list of open source search engines and dwell further into the technicalities.

1. Lucene

Top 5 Open Source Search Engines - Image 2

Lucene is one of the more established open source search engines out there with a text search engine library that is written purely in Java. Its indispensable software can be used for any application that requires full-text search.

Lucene can be used across platforms, has a configurable storage engine (Codecs) and has many powerful query types such as proximity queries and phrase queries.

At the moment, its open source project is available for free download and Twitter is actually using Lucene for real time search.

Programming Language: Originally Java but ported to other languages such as: Delphi, Perl, C#, C++, Python, Ruby, and PHP

License: Apache Software Foundation

Ranking of search results: Versatile (follows popular choices)

Indexing style: multiple-index searching with merged results

2. Sphinx

Top 5 Open Source Search Engines - Image 3

Sphinx is an open source full text search server that is programmed with relevant search quality and integration simplicity.

Sphinx allows flexible testing whereby its indexing features include full support for SBCS and UTF-8 encodings, stopword removal and optional hit position removal (hitless indexing); morphology and synonym processing through word forms dictionaries and stemmers; exceptions and blended characters; and many more.

Sphinx has an easy application integration that is derived from 3 different APIs. It has a native library for many programming languages, a pluggable storage engine for MySQL and an application query that uses MySQL client library and syntax.

Websites such as Craigslist, Living Social, MetaCafe and Groupon has adopted Sphinx for its searches.

Programming Language: C++

License: GPLv2 and commercial

Ranking of search results: Versatile

Indexing style: SQL database indexing and Non-SQL storage indexing.

3. Xapian

Top 5 Open Source Search Engines - Image 4

Xapian, termed as an open source probabilistic information retrieval library, provides a full text search engine library for programmers.

It possesses a wide range of structured Boolean search operators which are allocated based on probabilistic weights. There are also Boolean filters to restrict a probabilistic search.

Xapianâs search engine also has the dexterous ability to support the searchâs word synonyms explicitly and as an automatic form of query expansion.

Also, if you're looking for a fully packaged search engine that is derived upon Xapian, you may install Omega into your site. A great aspect about Xapian is its versatility is that allows you to extend to Omega to meet your needs as they grow.

Currently, Xapian is used as a search engine for the Library of the University of Cologne and Die Zeit (A popular German newspaper)

Programming Language: C++

License: GNU General Public License

Ranking of search results: Flexible (important words become more probable than unimportant words)

Indexing Style: Filing system

4. Indri

Top 5 Open Source Search Engines - Image 5

Indri is an open source search engine that prides itself through its state-of-the-art text search and a rich structured query language for text collections of up to 50 million documents (single machine) or 500 million documents (distributed search). Indri is multi platform and is applicable in Linux, Solaris, Windows and Mac OSX.

Indri is can support UTF-8 encoded text and is able to parse PDF, HTML, XML and TREC documents. It also recognizes text annotations.

One of Indriâs significant involvements is being the search engine component of the Lemur toolkit. The Lemur toolkit came from the partnership between the Center for Intelligent Information Retrieval and the Language Technologies Institute at Carnegie Mellon University. The partnership between the 2 institutions developed the Lemur Toolkit, an open-source (BSD license) software framework for building language modeling and information retrieval software.

Programming Language: Java, PHP, or C++

License: BSD style license

Ranking of search results: Versatile (Explicit term weighting and Robust query language)

Indexing style: Flexible indexing with tokenization

5. Zettair

Top 5 Open Source Search Engines - Image 6

Written and designed by the Search Engine Group at RMIT University, Zettair is a compact and fast text search engine which allows you to index and search HTML (or TREC) collections. It also formatted for simplicity as well as speed and flexibility, and one of its fundamental features is the ability to handle large amounts of text.

Other features that Zettair has are its Boolean, ranked and phrase querying, Modular C API and itâs easy to use command-line interface. Not to mention the search engine is applicable for many platforms including Solaris and Linux.

Programming language: C

License: BSD âstyle License

Ranking of search results: simple and straightforward

Indexing style: Single Executable (when an index doesn't exist, Zettair will create one for you based on the parameters you provide)

If you feel something better and note worthy Open Source Search Engine is missing in this list, please donât hesitate to leave a comment below!

References:

http://sphinxsearch.com/about/sphinx/

This blog is listed under Open Source , Development & Implementations and Mobility Community

Post a Comment

Please notify me the replies via email.

Important:
  • We hope the conversations that take place on MyTechLogy.com will be constructive and thought-provoking.
  • To ensure the quality of the discussion, our moderators may review/edit the comments for clarity and relevance.
  • Comments that are promotional, mean-spirited, or off-topic may be deleted per the moderators' judgment.
You may also be interested in
Awards & Accolades for MyTechLogy
Winner of
REDHERRING
Top 100 Asia
Finalist at SiTF Awards 2014 under the category Best Social & Community Product
Finalist at HR Vendor of the Year 2015 Awards under the category Best Learning Management System
Finalist at HR Vendor of the Year 2015 Awards under the category Best Talent Management Software
Hidden Image Url

Back to Top