Elasticsearch

cheat sheet and summary


https://www.elastic.co/guide/index.html

https://www.elastic.co/guide/en/elasticsearch/guide/master/index.html

 

https://www.edureka.co/blog/elasticsearch-tutorial/

https://www.elastic.co/training/free#quick-starts

 

https://compose.com/articles/elasticsearch-query-time-strategies-and-techniques-for-relevance-part-i/

https://spoon-elastic.com/all-elastic-search-post/advanced-usage/boolean-query-with-elasticsearch-influence-elasticsearch-scoring-part-1/

https://www.runtastic.com/blog/en/increasing-search-engine-relevance-elasticsearch/

 

 

Inside:

Uses lucene

Index : database

An index is a collection of documents

Type : table

Document : same as json Document

Shard :

Fuzzy query : good query for number of differences and other stuffs

Analyzer :

Explain : shows how it does to get that result

ELK

Logstash

Kibana, Machine Learning

https://logz.io/learn/complete-guide-elk-stack/

 

Visualization, monitoring, beats : heartbeat(monitoring) and other beats, graphs

Uses Zen Discovery(instead of zookeeper)

Shard = lucene index = group of segments

 

 

 

Term-vector: get detail info about document, tf, idf, ...

 

Task status, cancel

 

Relationship can be done with Parent/child and Nested

 

Similarity

https://www.elastic.co/guide/en/elasticsearch/reference/current/index-modules-similarity.html#index-modules-similarity

 

APM

App performance monitor

 

Mapping

schema for index. More dynamic than SQL, as can have virtual fields

Data type

https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-types.html

 

Query DSL

Search collapse

Highlight: say what matches

Async search

Sort result: https://www.elastic.co/guide/en/elasticsearch/reference/current/sort-search-results.html

Query context, Filter context

 

Match

Term

Phrase

Wildcard

Fuzzy

 

intervals query

A full text query that allows fine-grained control of the ordering and proximity of matching terms.

match query

The standard query for performing full text queries, including fuzzy matching and phrase or proximity queries.

match_bool_prefix query

Creates a bool query that matches each term as a term query, except for the last term, which is matched as a prefix query

match_phrase query

Like the match query but used for matching exact phrases or word proximity matches.

match_phrase_prefix query

Like the match_phrase query, but does a wildcard search on the final word.

multi_match query

The multi-field version of the match query.

combined_fields query

Matches over multiple fields as if they had been indexed into one combined field.

query_string query

Supports the compact Lucene query string syntax, allowing you to specify AND|OR|NOT conditions and multi-field search within a single query string. For expert users only.

simple_query_string query

A simpler, more robust version of the query_string syntax suitable for exposing directly to users.

 

Exists

Fuzzy

IDs

Prefix

Range

Regexp

Term

Terms

Terms set

Type Query

Wildcard

 

positive, negative and negative_boost

script_score

Painless, Expression, Mustache, java

https://www.elastic.co/guide/en/elasticsearch/painless/current/index.html

weight

random_score

field_value_factor

decay functions: gauss, linear, exp

 

distance_feature query

A query that computes scores based on the dynamically computed distances between the origin and documents' date, date_nanos, and geo_point fields. It is able to efficiently skip non-competitive hits.

more_like_this query

This query finds documents which are similar to the specified text, document, or collection of documents.

percolate query

This query finds queries that are stored as documents that match with the specified document.

 

A query that computes scores based on the values of numeric features and is able to efficiently skip non-competitive hits.

 

script query

This query allows a script to act as a filter. Also see the function_score query.

script_score query

A query that allows to modify the score of a sub-query with a script.

wrapper query

A query that accepts other queries as json or yaml string.

pinned query

A query that promotes selected documents over others matching a given query.

 

 

https://www.elastic.co/guide/en/elasticsearch/guide/master/geopoints.html

 

Aggregation

Bucket aggregations

Bucket aggregations don’t calculate metrics over fields like the metrics aggregations do, but instead, they create buckets of documents.

Here each bucket is associated with a key and a document. Whenever the aggregation is executed, all the buckets criteria are evaluated on every document. Each time a criterion matches, the document is considered to “fall in” the relevant bucket.

Metrics aggregations

Metrics are the aggregations which are responsible for keeping a track and computing the metrics over a set of documents.

Pipeline aggregations

Pipeline are the aggregations which are responsible for aggregating the output of other aggregations and their associated metrics together.

 

Matrix: Matrix are the aggregations which are responsible for operating on multiple fields. They produce a matrix result out of the values extracted from the requested document fields. Matrix does not support scripting.

 

 

Aggs This keyword shows that you are using an aggregation.

name_of_aggregation This is the name of aggregation which the user defines.

type_of_aggregation This is the type of aggregation being used.

Field This is the field keyword.

document_field_name This is the column name of the document being targeted.

 

Analyzing: the process of conversion of text into tokens or terms.

https://www.elastic.co/blog/found-text-analysis-part-1

Analyzers

Standard, Simple, Whitespace, Stop, Keyword, Pattern, Language, Snowball, Custom

Persian

https://github.com/mlkmhd/persian-analyzer-elasticsearch

https://github.com/hlavki/jlemmagen

https://github.com/NarimanN2/ParsiAnalyzer

https://www.elastic.co/guide/en/elasticsearch/plugins/7.14/analysis-icu-analyzer.html

 

Tokenizer

responsible for generating tokens from a text. Using whitespace or other punctuations,  the text can be broken down into tokens.

Standard, Edge NGram, Keyword, Letter, Lowercase, NGram, Whitespace, Pattern, UAX Email URL, Path Hierarchy, Classic, Thai

Shingler: word edge ngram

 

Token Filters

These token filters can further modify, delete or add text into that input.

Don't use synonym in index as is make problem: like adding atm to automate teller machine

Stemming: get root of words

 

Character Filters

Before the tokenizers, the text is processed by the character filters. Character filters search for the special characters or HTML tags or specified patterns. After which it either deletes them or changes them to appropriate words.

 

HTML strip

Mapping

Pattern replace

 

Normalizers

are similar to analyzers except that they may only emit a single token. As a consequence, they do not have a tokenizer and only accept a subset of the available char filters and token filters.

Only the filters that work on a per-character basis are allowed.

 

Ingest

Sometimes we need to transform a document before we index it. For instance, we want to remove a field from the document or rename a field and then index it. This is handled by Ingest node.

 

ILM: index lifecycle management

Rollover: Creates a new write index when the current one reaches a certain size, number of docs, or age.

Shrink: Reduces the number of primary shards in an index.

Force merge: Triggers a force merge to reduce the number of segments in an index’s shards.

Freeze: Freezes an index and makes it read-only.

Delete: Permanently remove an index, including all of its data and metadata.

 

 

Lifecycle

Hot: The index is actively being updated and queried.

Warm: The index is no longer being updated but is still being queried.

Cold: The index is no longer being updated and is queried infrequently. The information still needs to be searchable, but it’s okay if those queries are slower.

Frozen: The index is no longer being updated and is queried rarely. The information still needs to be searchable, but it’s okay if those queries are extremely slow.

Delete: The index is no longer needed and can safely be removed.

 

Data stream:

Append only time series: good for logs

https://www.elastic.co/guide/en/elasticsearch/reference/current/set-up-a-data-stream.html

 

Ranking

geo shape(box) + functional decay + ranking features + term with boost

https://www.elastic.co/guide/en/elasticsearch/reference/current/search-rank-eval.html

 

Profiling

https://www.elastic.co/guide/en/elasticsearch/reference/current/search-profile.html

 

Search

API-refrence: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-search.html

https://www.elastic.co/guide/en/elasticsearch/reference/current/search.html

 

 

Guide:

Using edge n-grams for search-as-you-type is easy to set up, flexible, and fast. However, sometimes it is not fast enough. Latency matters, especially when you are trying to provide instant feedback. Sometimes the fastest way of searching is not to search at all.

The completion suggester in Elasticsearch takes a completely different approach. You feed it a list of all possible completions, and it builds them into a finite state transducer, an optimized data structure that resembles a big graph. To search for suggestions, Elasticsearch starts at the beginning of the graph and moves character by character along the matching path. Once it has run out of user input, it looks at all possible endings of the current path to produce a list of suggestions.

This data structure lives in memory and makes prefix lookups extremely fast, much faster than any term-based query could be. It is an excellent match for autocompletion of names and brands, whose words are usually organized in a common order: “Johnny Rotten” rather than “Rotten Johnny.”

When word order is less predictable, edge n-grams can be a better solution than the completion suggester. This particular cat may be skinned in myriad ways.

 

https://www.codevate.com/blog/implementing-search-as-you-type-autocomplete-with-elasticsearch-and-symfony

Add fuzziness

Add custom weight for top results

 

https://medium.com/@mourjo_sen/a-detailed-comparison-between-autocompletion-strategies-in-elasticsearch-66cb9e9c62c4

https://blog.mimacom.com/autocomplete-elasticsearch-part1/

https://blog.mimacom.com/autocomplete-elasticsearch-part2/

https://blog.mimacom.com/autocomplete-elasticsearch-part3/

https://blog.mimacom.com/autocomplete-elasticsearch-part4/

https://www.elastic.co/blog/you-complete-me

https://www.elastic.co/blog/found-uses-of-elasticsearch

 

search analyzer

https://www.elastic.co/guide/en/elasticsearch/reference/current/search-analyzer.html

 

Index prefix

https://www.elastic.co/guide/en/elasticsearch/reference/current/index-prefixes.html

 

Search as you type:

https://www.elastic.co/guide/en/elasticsearch/guide/current/_index_time_search_as_you_type.html

https://www.elastic.co/guide/en/elasticsearch/reference/current/search-as-you-type.html

 

https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-stop-tokenfilter.html

https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-edgengram-tokenizer.html

https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-edgengram-tokenfilter.html

 

 

Suggestion(completion): https://www.elastic.co/guide/en/elasticsearch/reference/master/search-suggesters.html

 

 

 

 

Query:

match_bool_prefix: term for words + prefix for last

https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-match-bool-prefix-query.html#query-dsl-match-bool-prefix-query