Elasticsearch

Flush: The flush API is responsible for flushing one or more indices through an API. Basically, its a process of releasing memory from the index by pushing the data to the index storage and clearing the internal transaction log. The following example shows an index being flushed

Refresh: The refresh API is responsible for refreshing one or more index explicitly. This makes all operations performed since the last refresh available for the search. The following example shows an index being refreshed

Term-vector: get detail info about document, tf, idf, ...

Task status, cancel

Relationship can be done with Parent/child and Nested

APM

App performance monitor

Mapping

schema for index. More dynamic than SQL, as can have virtual fields

Data type

https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-types.html

Common types
- binary Binary value encoded as a Base64 string.
- boolean true and false values.
- Keywords The keyword family, including keyword, constant_keyword, and wildcard.
- Numbers Numeric types, such as long and double, used to express amounts.
- Dates: Date types, including date and date_nanos.
- alias Defines an alias for an existing field.
Objects and relational types
- object A JSON object.
- flattened An entire JSON object as a single field value.
- nested A JSON object that preserves the relationship between its subfields.
- join Defines a parent/child relationship for documents in the same index.
Structured data types
- Range Range types, such as long_range, double_range, date_range, and ip_range.
- ip IPv4 and IPv6 addresses.
- version Software versions. Supports Semantic Versioning precedence rules.
- murmur3 Compute and stores hashes of values.
Aggregate data types
- aggregate_metric_double Pre-aggregated metric values.
- histogram Pre-aggregated numerical values in the form of a histogram.
Text search types
- text fields The text family, including text and match_only_text. Analyzed, unstructured text.
- annotated-text Text containing special markup. Used for identifying named entities.
- completion Used for auto-complete suggestions.
- search_as_you_type text-like type for as-you-type completion.
- token_count A count of tokens in a text.
Document ranking types
- dense_vector Records dense vectors of float values.
- sparse_vector Records sparse vectors of float values.
- rank_feature Records a numeric feature to boost hits at query time.
- rank_features Records numeric features to boost hits at query time.
Spatial data types
- geo_point Latitude and longitude points.
- geo_shape Complex shapes, such as polygons.
- point Arbitrary cartesian points.
- shape Arbitrary cartesian geometries.
Other types
- percolator: Indexes queries written in Query DSL.

Query DSL

Search collapse

Highlight: say what matches

Async search

Sort result: https://www.elastic.co/guide/en/elasticsearch/reference/current/sort-search-results.html

Query context, Filter context

Leaf Query Clauses

Match

Term

Phrase

Wildcard

Fuzzy

Full text queries: Match, match phrase(exact match), multi match(search in multi fields)

intervals query

A full text query that allows fine-grained control of the ordering and proximity of matching terms.

match query

The standard query for performing full text queries, including fuzzy matching and phrase or proximity queries.

match_bool_prefix query

Creates a bool query that matches each term as a term query, except for the last term, which is matched as a prefix query

match_phrase query

Like the match query but used for matching exact phrases or word proximity matches.

match_phrase_prefix query

Like the match_phrase query, but does a wildcard search on the final word.

multi_match query

The multi-field version of the match query.

combined_fields query

Matches over multiple fields as if they had been indexed into one combined field.

query_string query

Supports the compact Lucene query string syntax, allowing you to specify AND|OR|NOT conditions and multi-field search within a single query string. For expert users only.

simple_query_string query

A simpler, more robust version of the query_string syntax suitable for exposing directly to users.

Term-level queries: term, terms, range, exists, prefix, Wildcard, regex, Fuzzy(number of different), Terms Set Query

Compound queries
- Boolean query: must, filter, should,
- Boosting Queries

positive, negative and negative_boost

script_score

Painless, Expression, Mustache, java

https://www.elastic.co/guide/en/elasticsearch/painless/current/index.html

weight

random_score

field_value_factor

decay functions: gauss, linear, exp

distance_feature query

A query that computes scores based on the dynamically computed distances between the origin and documents' date, date_nanos, and geo_point fields. It is able to efficiently skip non-competitive hits.

more_like_this query

This query finds documents which are similar to the specified text, document, or collection of documents.

percolate query

This query finds queries that are stored as documents that match with the specified document.

rank_feature query

A query that computes scores based on the values of numeric features and is able to efficiently skip non-competitive hits.

script query

This query allows a script to act as a filter. Also see the function_score query.

script_score query

A query that allows to modify the score of a sub-query with a script.

wrapper query

A query that accepts other queries as json or yaml string.

pinned query

A query that promotes selected documents over others matching a given query.

Joining queries
- nested query: This query is used for the documents containing nested type fields. Using this query, you can query each object as an independent document.
- has_child & has_parent queries: This query is used to retrieve the parent-child relationship between two document types within a single index. The has_child query returns the matching parent documents, while the has_parent query returns the matching child documents.

Geo queries

https://www.elastic.co/guide/en/elasticsearch/guide/master/geopoints.html

geo_point: These are the fields which support lat/ lon pairs
geo_shape: These are the fields which support points, lines, circles, polygons, multi-polygons etc.

Shape queries

Aggregation

Bucket aggregations

Bucket aggregations don’t calculate metrics over fields like the metrics aggregations do, but instead, they create buckets of documents.

Here each bucket is associated with a key and a document. Whenever the aggregation is executed, all the buckets criteria are evaluated on every document. Each time a criterion matches, the document is considered to “fall in” the relevant bucket.

Metrics aggregations

Metrics are the aggregations which are responsible for keeping a track and computing the metrics over a set of documents.

Pipeline aggregations

Pipeline are the aggregations which are responsible for aggregating the output of other aggregations and their associated metrics together.

Matrix: Matrix are the aggregations which are responsible for operating on multiple fields. They produce a matrix result out of the values extracted from the requested document fields. Matrix does not support scripting.

Cardinality: count of distinct values of a particular field
extended_stats: all the statistics about a specific numerical field in aggregated documents
Filter aggregation
Terms aggregation
Nested aggregation

Date histogram aggregation—used with date values.
Scripted aggregation—used with scripts.
Top hits aggregation—used with top matching documents.
Range aggregation—used with a set of range values.

Aggs This keyword shows that you are using an aggregation.

name_of_aggregation This is the name of aggregation which the user defines.

type_of_aggregation This is the type of aggregation being used.

Field This is the field keyword.

document_field_name This is the column name of the document being targeted.

Analyzing: the process of conversion of text into tokens or terms.

https://www.elastic.co/blog/found-text-analysis-part-1

Analyzers

Standard, Simple, Whitespace, Stop, Keyword, Pattern, Language, Snowball, Custom

Persian

https://github.com/mlkmhd/persian-analyzer-elasticsearch

https://github.com/hlavki/jlemmagen

https://github.com/NarimanN2/ParsiAnalyzer

https://www.elastic.co/guide/en/elasticsearch/plugins/7.14/analysis-icu-analyzer.html

Tokenizer

responsible for generating tokens from a text. Using whitespace or other punctuations, the text can be broken down into tokens.

Standard, Edge NGram, Keyword, Letter, Lowercase, NGram, Whitespace, Pattern, UAX Email URL, Path Hierarchy, Classic, Thai

Shingler: word edge ngram

Token Filters

These token filters can further modify, delete or add text into that input.

Don't use synonym in index as is make problem: like adding atm to automate teller machine

Stemming: get root of words

Character Filters

Before the tokenizers, the text is processed by the character filters. Character filters search for the special characters or HTML tags or specified patterns. After which it either deletes them or changes them to appropriate words.

HTML strip

Mapping

Pattern replace

Normalizers

are similar to analyzers except that they may only emit a single token. As a consequence, they do not have a tokenizer and only accept a subset of the available char filters and token filters.

Only the filters that work on a per-character basis are allowed.

Ingest

Sometimes we need to transform a document before we index it. For instance, we want to remove a field from the document or rename a field and then index it. This is handled by Ingest node.

ILM: index lifecycle management

Rollover: Creates a new write index when the current one reaches a certain size, number of docs, or age.

Shrink: Reduces the number of primary shards in an index.

Force merge: Triggers a force merge to reduce the number of segments in an index’s shards.

Freeze: Freezes an index and makes it read-only.

Delete: Permanently remove an index, including all of its data and metadata.

Lifecycle

Hot: The index is actively being updated and queried.

Warm: The index is no longer being updated but is still being queried.

Cold: The index is no longer being updated and is queried infrequently. The information still needs to be searchable, but it’s okay if those queries are slower.

Frozen: The index is no longer being updated and is queried rarely. The information still needs to be searchable, but it’s okay if those queries are extremely slow.

Delete: The index is no longer needed and can safely be removed.

Data stream:

Append only time series: good for logs

https://www.elastic.co/guide/en/elasticsearch/reference/current/set-up-a-data-stream.html

Ranking

geo shape(box) + functional decay + ranking features + term with boost

https://www.elastic.co/guide/en/elasticsearch/reference/current/search-rank-eval.html

Profiling

https://www.elastic.co/guide/en/elasticsearch/reference/current/search-profile.html

API-refrence: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-search.html

https://www.elastic.co/guide/en/elasticsearch/reference/current/search.html

Guide:

Using edge n-grams for search-as-you-type is easy to set up, flexible, and fast. However, sometimes it is not fast enough. Latency matters, especially when you are trying to provide instant feedback. Sometimes the fastest way of searching is not to search at all.

The completion suggester in Elasticsearch takes a completely different approach. You feed it a list of all possible completions, and it builds them into a finite state transducer, an optimized data structure that resembles a big graph. To search for suggestions, Elasticsearch starts at the beginning of the graph and moves character by character along the matching path. Once it has run out of user input, it looks at all possible endings of the current path to produce a list of suggestions.

This data structure lives in memory and makes prefix lookups extremely fast, much faster than any term-based query could be. It is an excellent match for autocompletion of names and brands, whose words are usually organized in a common order: “Johnny Rotten” rather than “Rotten Johnny.”

When word order is less predictable, edge n-grams can be a better solution than the completion suggester. This particular cat may be skinned in myriad ways.

https://www.codevate.com/blog/implementing-search-as-you-type-autocomplete-with-elasticsearch-and-symfony

Add fuzziness

Add custom weight for top results

https://medium.com/@mourjo_sen/a-detailed-comparison-between-autocompletion-strategies-in-elasticsearch-66cb9e9c62c4

https://blog.mimacom.com/autocomplete-elasticsearch-part1/

https://blog.mimacom.com/autocomplete-elasticsearch-part2/

https://blog.mimacom.com/autocomplete-elasticsearch-part3/

https://blog.mimacom.com/autocomplete-elasticsearch-part4/

https://www.elastic.co/blog/you-complete-me

https://www.elastic.co/blog/found-uses-of-elasticsearch

search analyzer

https://www.elastic.co/guide/en/elasticsearch/reference/current/search-analyzer.html

Index prefix

https://www.elastic.co/guide/en/elasticsearch/reference/current/index-prefixes.html

Search as you type:

https://www.elastic.co/guide/en/elasticsearch/guide/current/_index_time_search_as_you_type.html

https://www.elastic.co/guide/en/elasticsearch/reference/current/search-as-you-type.html

https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-stop-tokenfilter.html

https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-edgengram-tokenizer.html

https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-edgengram-tokenfilter.html

Suggestion(completion): https://www.elastic.co/guide/en/elasticsearch/reference/master/search-suggesters.html

Term suggester
- Then one that provide "similar" term, based on the edit distance. It provides suggestions based on data in the index, there are a lot of knobs and turns to tune it.

Phrase suggester
- It's very similar to what term suggester is doing, but taking into account a whole phrase.

Completion suggester or search-as-you-type functionality.
- If first two are doing something like did you mean functionality or spellchecking, based on the actual terms in the index. This one should "show" you some 5 or 10 relevant docs, while user is typing, and for this one you need to manually index field of suggestion type, where later ES will do a fast lookup.
- The completion suggester provides auto-complete/search-as-you-type functionality. This is a navigational feature to guide users to relevant results as they are typing, improving search precision. It is not meant for spell correction or did-you-mean functionality like the term or phrase suggesters.

Context suggester.
- This one is a continuation of the completion suggester, with the idea of the some context where user is coming from (geo) or if engine wants to boost some company over another, just because they are paid for it, or something like this. In this case you also need to manually index additional data.

Query:

match_bool_prefix: term for words + prefix for last

https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-match-bool-prefix-query.html#query-dsl-match-bool-prefix-query

Page updated

Google Sites

Report abuse