DISCOVERY

December 28th, 2019

Basic Elasticsearch Queries

Elasticsearch

JSON

Search Engine

Text Search

Elasticsearch is a search and analytics engine. It's also a NoSQL database that holds JSON documents. These documents are stored in an inverted index and are queried with JSON syntax. In my previous article I explored analyzers and the process of storing documents in an inverted index. This article focuses on querying documents with JSON.

I wrote twenty queries based on the knowledge I gained reading “Learning Elastic Stack 6.0" by Pranav Shukla. All twenty queries search an index called race, which contains races I ran along with upcoming races I'm planning to run. The documents in this index are listed below:

{ "name": "NYRR Night at the Races #1", "location": "New York, NY", "facility": "The Armory", "date": "2019-12-19", "exercise": "run", "category": "Indoor Track", "miles": 1, "registered": true, "result": { "url": "https://results.armorytrack.com/meets/3985/athletes/2887619", "position": 28, "time": "4:54", "pace": "4:54" } }, { "name": "Ocean Breeze Miles Mania", "location": "New York, NY", "facility": "Ocean Breeze Athletic Complex", "date": "2020-01-02", "exercise": "run", "category": "Indoor Track", "miles": 2, "registered": true, "result": {} }, { "name": "NYRR Night at the Races #2", "location": "New York, NY", "facility": "The Armory", "date": "2020-01-09", "exercise": "run", "category": "Indoor Track", "miles": 1.86, "meters": 3000, "registered": false, "result": {} }, { "name": "Boston Buildup", "location": "Ridgefield, CT", "facility": "Scotland Elementary School", "date": "2020-01-19", "exercise": "run", "category": "Road", "miles": 9.32, "kilometers": 15, "registered": false, "result": {} }, { "name": "NYRR Night at the Races #3", "location": "New York, NY", "facility": "The Armory", "date": "2020-01-23", "exercise": "run", "category": "Indoor Track", "miles": 3.11, "meters": 5000, "registered": false, "result": {} }, { "name": "Freezer 5K", "location": "Yorktown Heights, NY", "facility": "Yorktown Heights", "date": "2020-02-09", "exercise": "run", "category": "Road/Trail", "miles": 3.11, "kilometers": 5, "registered": false, "result": {} }, { "name": "Ocean Breeze Miles Mania", "location": "New York, NY", "facility": "Ocean Breeze Athletic Complex", "date": "2020-02-13", "exercise": "run", "category": "Indoor Track", "miles": 1, "registered": false, "result": {} }, { "name": "Ocean Breeze Miles Mania", "location": "New York, NY", "facility": "Ocean Breeze Athletic Complex", "date": "2020-02-27", "exercise": "run", "category": "Indoor Track", "miles": 2, "registered": false, "result": {} }, { "name": "NYRR Night at the Races #4", "location": "New York, NY", "facility": "The Armory", "date": "2020-03-05", "exercise": "run", "category": "Indoor Track", "miles": 1, "registered": true, "result": {} }

These documents follow the following field mapping:

{ "settings": { "index": { "number_of_shards": 5, "number_of_replicas": 2 } }, "mappings": { "properties": { "name": { "type": "text" }, "location": { "type": "text" }, "facility": { "type": "text" }, "date": { "type": "date", "format": "yyyy-MM-dd" }, "exercise": { "type": "keyword", "ignore_above": 256 }, "category": { "type": "text", "fields": { "raw": { "type": "keyword" } } }, "miles": { "type": "double" }, "meters": { "type": "double" }, "kilometers": { "type": "double" }, "registered": { "type": "boolean" }, "result": { "type": "nested", "properties": { "url": { "type": "keyword", "ignore_above": 256 }, "position": { "type": "integer" }, "time": { "type": "keyword", "ignore_above": 256 }, "pace": { "type": "keyword", "ignore_above": 256 } } } } } }

Let's begin working through some Elasticsearch queries!

The most basic query you can perform against an Elasticsearch index is one that matches all documents.

{ "query": { "match_all": {} } }

Result (Abbreviated for space):

{ "hits" : [{ "_score" : 1.0, "_source" : { "name" : "NYRR Night at the Races #1", ... }, { "_score" : 1.0, "_source" : { "name" : "NYRR Night at the Races #2", ... }, ... }] }

Range queries narrow down the resulting documents. In the following example, the length of the race must be between 1 and 2 miles.

{ "query": { "range": { "miles": { "gte": 1, "lte": 2 } } } }

Result:

{ "hits" : [ { "_score" : 1.0, "_source" : { "name" : "NYRR Night at the Races #2", "miles" : 1.86, ... } }, { "_score" : 1.0, "_source" : { "name" : "NYRR Night at the Races #4", "miles" : 1, ... } }, { "_score" : 1.0, "_source" : { "name" : "Ocean Breeze Miles Mania", "miles" : 1, ... } }, { "_score" : 1.0, "_source" : { "name" : "Ocean Breeze Miles Mania", "miles" : 2, ... } }, { "_score" : 1.0, "_source" : { "name" : "Ocean Breeze Miles Mania", "miles" : 2, ... } }, { "_score" : 1.0, "_source" : { "name" : "NYRR Night at the Races #1", "miles" : 1, ... } } ] }

Range queries can also work with date types. Date types have a special keyword now that is used in range queries. now represents the current date, and time can be added or deleted from now1.

{ "query": { "range": { "date": { "gte": "now-15d", "lte": "now+15d" } } } }

An exists query simply checks if a field exists in a JSON document.

{ "query": { "exists": { "field": "meters" } } }

A term query checks if a string exactly matches the value of a keyword field in a document. The following query checks if the category field has a value of "Road".

{ "query": { "term": { "category.raw": "Road" } } }

Result:

{ "hits" : [{ "_score" : 0.2876821, "_source" : { "name" : "Boston Buildup", "category" : "Road", ... } }] }

You may have noticed that the prior query result gave the resulting document a score. This is helpful for text searches, since it determines which results are most relevant. However, for term searches there isn't much benefit, because every document contains the exact same term. The following query is the same as the last one, except it gives the documents returned a constant score.

{ "query": { "constant_score": { "filter": { "term": { "category.raw": "Road" } } } } }

Result:

{ "hits" : [{ "_score" : 1.0, "_source" : { "name" : "Boston Buildup", "category" : "Road", ... } }] }

Match queries perform full text searches. The following example performs a text search on the category field, returning any document containing the word "Road" in that field.

{ "query": { "match": { "category": "Road" } } }

Result:

{ "hits" : [ { "_score" : 0.6931472, "_source" : { "name" : "Freezer 5K", "category" : "Road/Trail", ... } }, { "_score" : 0.2876821, "_source" : { "name" : "Boston Buildup", "category" : "Road", ... } } ] }

It's also possible to perform a full text search on a keyword field. However, this is simply equivalent to a term query.

{ "query": { "match": { "category.raw": "Road" } } }

Result:

{ "hits" : [{ "_score" : 0.2876821, "_source" : { "name" : "Boston Buildup", "category" : "Road", ... } }] }

When a full text search is performed with multiple words in the query, resulting documents must contain at least one of the words. You can imagine the query containing OR operators between each word.

{ "query": { "match": { "category": "Road Trail" } } }

Result:

{ "hits" : [ { "_score" : 1.3862944, "_source" : { "name" : "Freezer 5K", "category" : "Road/Trail", ... } }, { "_score" : 0.2876821, "_source" : { "name" : "Boston Buildup", "category" : "Road", ... } } ] }

Match queries also support explicit or and and operators. The following query uses an and operator:

{ "query": { "match": { "category": { "query": "Road Trail", "operator": "and" } } } }

The following query uses an or operator. It's equivalent to query #9, which implicitly OR'd the words in the query.

{ "query": { "match": { "category": { "query": "Road Trail", "operator": "or" } } } }

Text searches provide a minimum_should_match parameter that specifies the number of terms in the query that must match terms in a field. The following query requires documents to have a name field containing two of the words in ["NYRR", "#1", "#2", "#3"].

{ "query": { "match": { "name": { "query": "NYRR #1 #2 #3", "minimum_should_match": 2 } } } }

Text searches also provide a fuzziness parameter which allows terms in queries to be near matches to terms in documents. For example, the query term "NYCRR" can match the document term "NYRR" if the fuzziness is one or more. It's important to note that fuzzy matches are an expensive operation in Elasticsearch2.

{ "query": { "match": { "name": { "query": "NYCRR", "fuzziness": 1 } } } }

Match phrase queries require that an entire phrase exists in a document. The following query matches all documents with a location field whose value ends with "York, NY". It matches all races in New York, NY.

{ "query": { "match_phrase": { "location": { "query": "York NY" } } } }

When using match phrase queries, leniency towards missing words is possible with the slop parameter. For example, the following query matches documents even if their name field contains two terms missing in the phrase query. The missing terms are "at" and "the".

{ "query": { "match_phrase": { "name": { "query": "NYRR Night Races #4", "slop": 2 } } } }

Result:

{ "hits" : [{ "_score" : 0.6606808, "_source" : { "name" : "NYRR Night at the Races #4", ... } }] }

Match queries can be performed on more than one field. The following query searches for the words "Freezer" and "Ocean" in the name and facility fields. It also boosts the score of the document if the term is found in the name field.

{ "query": { "multi_match": { "query": "Freezer Ocean", "fields": ["name^3", "facility"] } } }

Result:

{ "hits" : [ { "_score" : 2.4077747, "_source" : { "name" : "Freezer 5K", "facility" : "Yorktown Heights", ... } }, { "_score" : 1.8299087, "_source" : { "name" : "Ocean Breeze Miles Mania", "facility" : "Ocean Breeze Athletic Complex", ... } }, { "_score" : 0.5469647, "_source" : { "name" : "Ocean Breeze Miles Mania", "facility" : "Ocean Breeze Athletic Complex", ... } }, { "_score" : 0.5469647, "_source" : { "name" : "Ocean Breeze Miles Mania", "facility" : "Ocean Breeze Athletic Complex", ... } } ] }

Bool queries allow for complex document querying. They are the SQL equivalents of the WHERE clause along with AND and OR operands3. The following query uses a must clause, which contains a list of queries that a document must match to be returned.

{ "query": { "bool": { "must": [ { "term": { "category.raw": { "value": "Indoor Track" } } }, { "term": { "miles": { "value": 1 } } }, { "match": { "facility": "Armory" } }, { "range": { "date": { "gte": "2020-03-01", "lte": "2020-03-31", "format": "yyyy-MM-dd" } } } ] } } }

Result:

{ "hits" : [{ "_score" : 2.2670627, "_source" : { "name" : "NYRR Night at the Races #4", "facility" : "The Armory", "date" : "2020-03-05", "category" : "Indoor Track", "miles" : 1, ... } }] }

When using bool queries, the filter clause is equivalent to the must clause except that it doesn't score the returned documents. All documents are given a score of zero.

{ "query": { "bool": { "filter": [ { "term": { "category.raw": { "value": "Indoor Track" } } }, { "term": { "miles": { "value": 1 } } }, { "match": { "facility": "Armory" } } ] } } }

Result:

{ "hits" : [{ "_score" : 0.0, "_source" : { "name" : "NYRR Night at the Races #4", "facility" : "The Armory", "date" : "2020-03-05", "category" : "Indoor Track", "miles" : 1, ... } }] }

While must is equivalent to a SQL AND, should is equivalent to a SQL OR.

{ "query": { "bool": { "must": [ { "term": { "miles": { "value": 1 } } } ], "should": [ { "match": { "facility": "Armory" } }, { "range": { "date": { "gte": "2020-01-01", "lte": "2020-12-31", "format": "yyyy-MM-dd" } } } ] } } }

Bool queries can also contain subqueries that should not match a document in order for it to be returned. This is accomplished with the must_not clause.

{ "query": { "bool": { "must": [ { "match": { "facility": "Armory" } } ], "must_not": [ { "term": { "miles": { "value": 1 } } } ] } } }

This article discussed some of the building blocks for querying Elasticsearch. In my next Elasticsearch article, I'll wrap up the Elasticsearch portion of the ELK stack by discussing data aggregations. All the code from this article is available on GitHub.

[1] Pranav Shukla & Sharath Kumar M N, Learning Elastic Stack 6.0 (Birmingham: Packt, 2017), 75

[2] Shukla., 84

[3] Shukla., 90