Search Service Frameworks Evaluation

Search Service Frameworks Evaluation

Search engines use indexing to store information about web pages, enabling them to quickly return relevant, high-quality results.
Indexing is the process by which search engines organize information before a search to enable super-fast responses to queries.

Searching through individual pages for keywords and topics would be a very slow process for search engines to identify relevant information. Instead, search engines (including Google) use an inverted index, also known as a reverse index.

XX: should we add an intro about “How they work” and “What do we use them in general”? I mean, I didn’t get the point of using index. how should we add it, what should it be, etc..

The following libraries and engines are services for search, we will discuss each individually, then we’ll make a comparison between them based on several factors:

1 – Lucene

Apache Lucene is a free and open-source search engine software library, originally written completely in Java.

It is supported by the Apache Software Foundation and is released under the Apache Software License.

Lucene has been ported to other programming languages including Object Pascal, Perl, C#, C++, Python, Ruby and PHP.

2 – Solr

Solr (pronounced “solar”) is an open-source enterprise-search platform, written in Java, from the Apache Lucene project. It uses the Lucene Java search library at its core for full-text indexing and search, and has REST-like HTTP/XML and JSON APIs that make it usable from most popular programming languages.

3 – Elasticsearch

Elasticsearch is a search engine based on the Lucene library. It provides a distributed, multitenant-capable full-text search engine with an HTTP web interface and schema-free JSON documents. Elasticsearch is developed in Java. Following an open-core business model, parts of the software are licensed under various open-source licenses (mostly the Apache License), while other parts fall under the proprietary (source-available) Elastic License. Official clients are available in Java, .NET (C#), PHP, Python, Apache Groovy, Ruby and many other languages. According to the DB-Engines ranking, Elasticsearch is the most popular enterprise search engine followed by Apache Solr, also based on Lucene.

4 – Sphinx

Sphinx can be used either as a stand-alone server or as a storage engine (“SphinxSE”) for the MySQL family of databases. When run as a standalone server Sphinx operates similar to a DBMS and can communicate with MySQL, MariaDB and PostgreSQL through their native protocols or with any ODBC-compliant DBMS via ODBC. MariaDB, a fork of MySQL, is distributed with SphinxSE.

If Sphinx is run as a stand-alone server, it is possible to use SphinxAPI to connect an application to it. Official implementations of the API are available for PHP, Java, Perl, Ruby and Python languages. Unofficial implementations for other languages, as well as various third party plugins and modules are also available. Other data sources can be indexed via pipe in a custom XML format.

5 – Amazon CloudSearch

Amazon CloudSearch is a scalable cloud-based search service that forms part of Amazon Web Services (AWS). CloudSearch is typically used to integrate customized search capabilities into other applications. According to Amazon, developers can set a search application up and deploy it fully in less than an hour.

6 – Amazon Elasticsearch Service (Amazon ES)

With Amazon Elasticsearch Service, you pay only for what you use. There is no minimum fee or usage requirement. You are charged only for Amazon Elasticsearch Service instance hours, Amazon EBS storage (if you choose this option), and data transfer.

This table shows the comparison between these Frameworks:


LuceneSolrElasticsearchSphinxAmazon CloudSearchAmazon Elasticsearch
Autocomplete✔️✔️✔️✔️✔️✔️
Auto-suggestion✔️✔️✔️✔️✔️✔️
Recommendation✔️✔️✔️✔️✔️
Support Arabic✔️✔️✔️✔️✔️
Memory size per million document125% – 150% of docs size125% – 150% of docs size125% – 150% of docs size125% – 150% of docs size125% – 150% of docs size
Disk sizesize of docs * 2size of docs * 2size of docs * 2size of docs * 2
CostFreeFreeStandard-16$/monthFreePay only for what you usePay only for what you use

XX: so Lucene, Solar, Sphinx, and Elasticsearch are totally free? or the software is free? :confused:

XX: Let’s add comparison for the pricing, and the free tier if available, please

The response time dependence on the hardware you use.

Why Elasticsearch is paid?

Actually, the code of elastic is open-source, so if you want managed hosting from elastic.co, they charge you according to several variables. You can find the pricing here.
If you want to use the open-source version, stand up your own servers and manage your own deployment, the code is at no cost and can be found here.

AWS ELASTICSEARCH VS AWS CLOUDSEARCH

There is some useful comparison here.

Useful comparison: Amazon CloudSearch vs ElasticSearch vs Apache Solr Comparison in detail.

Conclusion

The purpose of storing an index is to optimize speed and performance in finding relevant documents for a search query. Without an index, the search engine would scan every document in the corpus, which would require a considerable time and computing power.

According to my search, I think that Amazon CloudSearch, Amazon Elasticsearch and maybe Elasticsearch paid-version.

Do you know that we use all this and other AI technologies in our app? Look at what you’re reading now applied in action. Try our Almeta News app. You can download it from google play: https://play.google.com/store/apps/details?id=io.almeta.almetanewsapp&hl=ar_AR

Leave a Reply

Your email address will not be published. Required fields are marked *