What is Elastic Search?
Elastic search is an open source, broadly-distributable, readily-scalable, enterprise-grade search engine based on Lucene and released under the terms of the Apache License. ElasticSearch is built on Apache Lucene, which is an open source library for high-performance, full-featured text search. As Apache Lucene is a library, we need to do a lot of coding to integrate it with existing applications. Elasticsearch is developed in Java and is released as open source under the terms of the Apache License. It can search and index document files in diverse formats. It was designed to be used in distributed environments by providing flexibility and scalability. Now, elastic search is the most popular useful enterprise search engine followed by Apache Solr, also based on Lucene.
What are the features of Elasticsearch?
- Elasticsearch is open source
- It is very fast.
- Good defaults for complex Lucene classes
- Reindex API
- Real-time Application Monitoring
- It uses denormalization to improve the search performance.
- It can be used as a replacement of document stores like MongoDB and RavenDB.
- Elasticsearch is schema-free and document-oriented. For many business applications, these are important technical innovations compared to legacy enterprise search engines.
- Elasticsearch-hadoop uses Elasticsearch REST interface for communication, allowing for flexible deployments by minimizing the number of ports needed to be open within a network.
- Elasticsearch works with a wide range of data connectors that are readily available or custom-built, enabling you to search across multiple repositories efficiently.
- What is the query language of Elasticsearch?
- ElasticSearch uses the Apache Lucene query language, which is called Query DSL.
What are the Basic Concepts of Elasticsearch?
The basic concepts of Elasticsearch: node, clusters, near real-time search, indexes, shards, mapping types, document, RESTful API, and more.
Node: It is a single server that holds some data and participates on the cluster’s indexing and querying. A node can be configured to join a specific cluster by the particular cluster name. A single cluster can have as many nodes as we want. A node is simply one Elasticsearch instance. Consider this a running instance of MySQL. There is one MySQL instance running per machine on different a port, while in Elasticsearch, generally, one Elasticsearch instance runs per machine. Elasticsearch uses distributed computing, so having separate machines would help, as there would be more hardware resources.
Cluster: It is a collection of one or more nodes. Cluster provides collective indexing and search capabilities across all the nodes for entire data. For relational databases, the node is DB Instance. There can be N nodes with the same cluster name.
Near-Real-Time (NRT): ES is an NRT search platform. There is a slight from the time you index a document until the time it becomes searchable.
Index: The index is a collection of documents that have similar characteristics. For example, we can have an index for customer data and another one for a product information. An index is identified by a unique name that refers to the index when performing indexing search, update, and delete operations. In a single cluster, we can define as many indexes as we want.
Shard: A shard is a subset of documents of an index. An index can be divided into many shards. Indexes are horizontally subdivided into shards. It means each shard contains all the properties of document but contains less number of JSON objects than index. The horizontal separation makes shard an independent node, which can be store in any node. Primary shard is the original horizontal part of an index and then these primary shards are replicated into replica shards.
Mapping Type: Mapping type = database table in an RDBMS. This is a collection of documents sharing a set of common fields present in the same index. For example, an Index contains data of a social networking application, and then there can be a specific type for user profile data, another type for messaging data and another for comments data.
Document: This is a collection of fields in a specific manner defined in JSON format. Every document belongs to a type and resides inside an index. Every document is associated with a unique identifier, called the UID.
RESTful API: Elasticsearch is driven by RESTful API. Almost every action can be performed with RESTful API by using JSON through HTTP.
What is a Replica in Elasticsearch?
Each shard in ElasticSearch has 2 copy of the shard. These copies are called replicas. They serve the purpose of high-availability and fault-tolerance.
What are the core field’s type in Elasticsearch?
- Boolean
- String
- Numeric
- Date
What are the predefined fields in Elasticsearch?
Predefined fields provide metadata to the document.These fields we don’t need to populate. For example _timestamp which gives information when documents is indexed.Predefined fields always begin with _(underscore).
What is inverted index in Elasticsearch?
Inverted index is the heart of search engines. The primary goal of a search engine is to provide speedy searches while finding the documents in which our search terms occur. Inverted index is a hash map like data structure that directs users from a word to a document or a web page. It is the heart of search engines. Its main goal is to provide quick searches for finding data from millions of documents. Usually in Books we have inverted indexes as below. Based on the word we can thus find the page on which the word exists.
What are the basic operations you can perform on a document?
The following operations can be performed on documents
- Indexing a document using Elasticsearch.
- Fetching documents using Elasticsearch.
- Updating documents using Elasticsearch.
- Deleting documents using Elasticsearch.
How to identify a document uniquely in Elasticsearch?
Index ID + Type ID + Document ID combination identifies uniquely.
What is horizontal scaling in Elasticsearch?
Adding more node in same cluster is called horizontal scaling because requests are distributed.
What is vertical scaling in Elasticsearch?
Adding more resources to a node for example RAM or processor.That always increases performance.
Can you explain Analyzers in Elasticsearch?
Elasticsearch ships with a wide range of built-in analyzers, which can be used in any index without further configuration:
Standard Analyzer: It divides text into terms on word boundaries, as defined by the Unicode Text Segmentation algorithm. It removes most punctuation, lowercases terms, and supports removing stop words.
Simple Analyzer: It divides text into terms whenever it encounters a character which is not a letter. It lowercases all terms.
Whitespace Analyzer: It divides text into terms whenever it encounters any whitespace character. It does not lowercase terms.
Stop Analyzer: It is like the simple analyzer, but also supports removal of stop words.
Keyword Analyzer: It is a “noop” analyzer that accepts whatever text it is given and outputs the exact same text as a single term.
Pattern Analyzer: It uses a regular expression to split the text into terms. It supports lower-casing and stop words.
Language Analyzers: provides many language-specific analyzers like english or french.
Fingerprint Analyzer: It is a specialist analyzer which creates a fingerprint which can be used for duplicate detection.
Custom analyzers: If you do not find an analyzer suitable for your needs, you can create a custom analyzer which combines the appropriate character filters, tokenizer, and token filters.
What is from component in search request?
This component configures from which page documents should be returned.This is used for pagination. For example, if 40 items are calculated but we want from 20 documents then from will be 20.
What are the some built token filter in Elasticsearch?
- Standard
- Length
- Lowercase
- Stop
- Reverse
- ASCII Folding
- Unique
What is character filtering?
Character filtering transforms particular character to other character. For example, convert & to and.
What is the difference between match query and term query in Elasticsearch?
Match query analyze the input request and creates basic queries.For example if we search name:ABC then if any document has name=abe that also result of search while in term query exact match is done,it means abc will not be in output result.
What is use of attributes- enabled, index and store?
The enabled attribute applies to various ElasticSearch specific/created fields such as _index and _size. User-supplied fields do not have an “enabled” attribute.
Store means the data is stored by Lucene will return this data if asked. Stored fields are not necessarily searchable. By default, fields are not stored, but full source is. Since you want the defaults (which makes sense), simply do not set the store attribute.
The index attribute is used for searching. Only indexed fields can be searched. The reason for the differentiation is that indexed fields are transformed during analysis, so you cannot retrieve the original data if it is required.
What is a filter in Elasticsearch?
After data is processed by Tokenizer, the same is processed by Filter, before indexing. Following types of Filters are available in ElasticSearch 1.10.
- And filter
- Bool filter
- Exists filter
- Geo bounding box filter
- Geo distance filter
- Geo distance range filter
- Geo polygon filter
- Geoshape filter
- geohash cell filter
- Has child filter
- Has parent filter
- Ids filter
- Indices filter
- Limit filter
- Match all filter
- Missing filter
- Nested filter
- Not filter
- Or filter
- Prefix filter
- Query filter
- Range filter
- Regexp filter
- Script filter
- Term filter
What are the pros and cons of Elasticsearch?
Pros:
- Lucene is an open-source search engine library .Elastic search is built on top of Lucene, which is a full-featured information retrieval library, so it provides the most powerful full-text search capabilities of any open source product.
- Elastic Search implements a lot of features, such as customized splitting text into words, customized stemming, facetted search, etc.
- It is API driven, actions can be performed using a simple Restful API. Application doesn’t need to be written in Java to work with Elasticsearch. It has a powerful JSON-based DSL it allows you to send data over HTTP in JSON to index, search, and manage your Elasticsearch cluster.
- Scalability is simple. Since it is schema-less it accepts all type of data.
- Elastic search is able to execute complex queries extremely fast, efficiency in setting up complex bespoke search functionality.
- Elasticsearch records any changes made in transactions logs on multiple nodes in the cluster to minimize the chance of data loss.
- The simplicity of managing Elasticsearch is a big plus. We’re able to integrate routine processes such as building indices straight into our automated deployment process quickly and easily.
- It creating full backups are easy by using the concept of gateway, which is present in Elasticsearch.
Cons:
- Elasticsearch does not have any built-in authentication or authorization system.
- Elasticsearch is not an ACID compliant system.
- One can’t write Elasticsearch queries in SQL.
- ES is not a relational database and hence if your data would benefit from things like foreign-key constraints etc. Elasticsearch is not a good choice as your primary data store.
- The distributed nature of Elastic search can have negative effects on data consistency.