MyPage is a personalized page based on your interests.The page is customized to help you to find content that matters you the most.


I'm not curious

Apache Solr Overview For Beginner's

Published on 24 October 17
319
0
0
Solr is an open-source search platform which is used to build search applications. It was built on top of Lucene (full text search engine). Solr is enterprise-ready, fast and highly scalable. The applications built using Solr are sophisticated and deliver high performance.

It was Yonik Seely who created Solr in 2004 in order to add search capabilities to the company website of CNET Networks. In Jan 2006, it was made an open-source project under Apache Software Foundation. Its latest version, Solr 6.0, was released in 2016 with support for execution of parallel SQL queries.
Why Solr?
It isn’t really feasible to execute blazing fast search queries on very big SQL databases for 2 different reasons. The first reason comes SQL databases favoring lack of radiancy over performance. Basically, you’d need to use JOINs in your SELECT. The second reason is about the nature of data in documents: it’s essentially unstructured plain text so that SELECT would need LIKE. Both joins and likes are performance killers, so this way is a no-go in real-life search engines.

Therefore, most of them propose a way to look at data that is very different from SQL, inverted index(es). This kind of data structure is a glorified dictionary where:

key are individual terms
values are list of documents that match term
Nothing fancy, but this view of data makes for very fast research in very high-volume databases. Note that the term ‘document’ is used very loosely in that it’s should be a field-structured view of the initial document (see below).

Index structure
Though Solr belongs to the NoSQL database family, it is no schemaless. Schema configuration takes place in a dedicated schema.xml file: individual fields must be defined, and with each its type. Different document types may be different in structure and have few (no?) fields in common. In this case, each document type may be set its own index with its own schema.

Predefined types like strings, integers and dates are available out-of-the-box. Types can be declared searchables (called indexed) and/or stored (returned in queries). For examples, books could (would?) include not only their content, but also author(s), publisher(s), date of publishing, etc.
Solr is an open-source search platform which is used to build search applications. It was built on top of Lucene (full text search engine). Solr is enterprise-ready, fast and highly scalable. The applications built using Solr are sophisticated and deliver high performance.

It was Yonik Seely who created Solr in 2004 in order to add search capabilities to the company website of CNET Networks. In Jan 2006, it was made an open-source project under Apache Software Foundation. Its latest version, Solr 6.0, was released in 2016 with support for execution of parallel SQL queries.

Why Solr?

It isn’t really feasible to execute blazing fast search queries on very big SQL databases for 2 different reasons. The first reason comes SQL databases favoring lack of radiancy over performance. Basically, you’d need to use JOINs in your SELECT. The second reason is about the nature of data in documents: it’s essentially unstructured plain text so that SELECT would need LIKE. Both joins and likes are performance killers, so this way is a no-go in real-life search engines.

Therefore, most of them propose a way to look at data that is very different from SQL, inverted index(es). This kind of data structure is a glorified dictionary where:

key are individual terms

values are list of documents that match term

Nothing fancy, but this view of data makes for very fast research in very high-volume databases. Note that the term ‘document’ is used very loosely in that it’s should be a field-structured view of the initial document (see below).

Index structure

Though Solr belongs to the NoSQL database family, it is no schemaless. Schema configuration takes place in a dedicated schema.xml file: individual fields must be defined, and with each its type. Different document types may be different in structure and have few (no?) fields in common. In this case, each document type may be set its own index with its own schema.

Predefined types like strings, integers and dates are available out-of-the-box. Types can be declared searchables (called indexed) and/or stored (returned in queries). For examples, books could (would?) include not only their content, but also author(s), publisher(s), date of publishing, etc.

This blog is listed under Development & Implementations and Data & Information Management Community

Related Posts:
Post a Comment

Please notify me the replies via email.

Important:
  • We hope the conversations that take place on MyTechLogy.com will be constructive and thought-provoking.
  • To ensure the quality of the discussion, our moderators may review/edit the comments for clarity and relevance.
  • Comments that are promotional, mean-spirited, or off-topic may be deleted per the moderators' judgment.
You may also be interested in
Awards & Accolades for MyTechLogy
Winner of
REDHERRING
Top 100 Asia
Finalist at SiTF Awards 2014 under the category Best Social & Community Product
Finalist at HR Vendor of the Year 2015 Awards under the category Best Learning Management System
Finalist at HR Vendor of the Year 2015 Awards under the category Best Talent Management Software
Hidden Image Url

Back to Top