Aggregations in Wasabi AiR
    • 20 May 2024
    • 4 Minutes to read
    • PDF

    Aggregations in Wasabi AiR

    • PDF

    Article summary

    Aggregations calculate statistical summaries about the entire set of search results. They are applied after the full-text query and any filters. Aggregations enable you to group the unique elements for a specific field into buckets. For example, you can group all items by their file extension (ext field) and get a summary similar to the following. This example indicates that, after applying any filters, there are 202 items with mp4 as their extension, 5 with csv, and so on.

    {
    	"aggregations": {
    		"terms": {
    			"file.extension": {
    				"buckets": [
    					{
    						"key": "mp4",
    						"count": 202
    					},
    					{
    						"key": "csv",
    						"count": 5
    					},
    					{
    						"key": "mov",
    						"count": 2
    					},
    					{
    						"key": "pptx",
    						"count": 6
    					},
    					{
    						"key": "xlsx",
    						"count": 7
    					},
    					{
    						"key": "xml",
    						"count": 22
    					}
    				],
    				"othersCount": 36
    			}
    		}
    	}
    }

    Requesting Aggregations

    To request an aggregation, add the aggregations field to the search request object.

    {
    	"aggregations": {
    		"terms": [{terms}],
    		"metrics": [{metrics}],
    		"histogram": [{histogram}]
    	}
    }
    • metrics - (array of objects) Requests statistical information about a field.
    • terms - (array of objects) Aggregates the frequency of specified terms.
    • histogram - (array of objects) Generates histogram data for a given field.

    Aggregation Results

    Aggregation results are provided in the response object under the Aggregations object. See the following for specific examples.

    Aggregation Names

    When requesting an aggregation, you provide a name field. The Search API returns the result of the aggregation along with this name, enabling you to find the specific aggregation.

    Metrics Aggregations (metrics)

    Metrics aggregations request statistical information about a field.

    {
    	"name": "{name}",
    	"field": "{field}",
    	"type": "{type}"
    }
    • name - (string) The user-defined name of the aggregation.
    • field - (string) The field on which to apply the aggregation.
    • type - (string) The type of the metric aggregation to perform.

    Metric Aggregation Types

    The type field of a metric aggregation indicates the kind of metric in which you are interested. Valid types are:

    • min - The minimum value of the result set.
    • max - The maximum value of the result set.
    • avg - The mean average value of the result set.
    • sum - The sum of the values of the result set.
    • cardinality - The number of unique instances of values from a result set.

    Metric Aggregation Example

    To get the minimum, maximum, and average file sizes of items from a query, make the following request:

    POST /api/data/search
    {
    	"query": "{query}",
        "aggregations": {
    		"metrics": [
    			{"type": "min", "field": "file.size", "name": "min(file.size)"},
    			{"type": "max", "field": "file.size", "name": "max(file.size)"},
    			{"type": "avg", "field": "file.size", "name": "avg(file.size)"}
    		]
    	}
    }

    The results are in the aggregations object of the response, keyed by the name:

    {
    	//... other search items
    	"aggregations": {
    		"metrics": {
    			"min(file.size)": 0,
    			"max(file.size)": 100,
    			"avg(file.size)": 50
    		}
    	}
    }

    Terms Aggregations (terms)

    Terms aggregations enable you to get the frequency of different terms within a specific field.

    {
    	"name": "{name}",
    	"field": "{field}",
    	"size": "{integer}"
    }
    • name - (string) The user-defined name of the aggregation.
    • field - (string) The field in which to apply the aggregation.
    • size - (integer) The number of buckets that is returned (default: 10).

    The results are provided as a set of buckets, one for each term, along with their count.

    Only the most popular terms are included in the results, and the Search API intelligently decides what is relevant. A special othersCount property indicates an approximate value for the remaining items that are not included in any buckets.

    Terms Aggregation Example

    To get a breakdown of all countries, you could ask for the terms from the geocoding.country_code.raw field:

    POST /api/data/search
    {
    	"query": "{query}",
        "aggregations": {
    		"terms": [
    			{"name": "countries", "field": "geocoding.country_code.raw"}
    		]
    	}
    }

    The results might be:

    {
    	//... other search items
    	"aggregations": {
    		"terms": {
    			"countries": {
    				"buckets": [
    					{"key":"UK", "count":200},
    					{"key":"USA", "count":100},
    					{"key":"Germany", "count":50}
    				],
    				"othersCount": 10
    			}
    		}
    	}
    }
    • buckets - (array of objects) The top terms (key) along with their frequency (count).
    • key - (string) The term value.
    • count - (int) The number of items that contain that term.
    • othersCount - (int) The number of items that were not included in any bucket.

    Histogram Aggregations (histograms) - BETA

    Histogram aggregations request statistical information about the frequency of items for a given interval.

    {
    	"name": "{name}",
    	"field": "{field}",
    	"interval": {interval},
    	"min_count": {min_count}
    }
    • name - (string) The user-defined name of the aggregation.
    • field - (string) The field on which to apply the aggregation.
    • interval - (int) The fixed size of each bucket over the values.
    • min_count - (int) The minimum number of items that must appear within an interval to be considered significant enough to return.

    Histogram Aggregation Example

    To get the frequency of audio files within specific peak level intervals, make the following request:

    POST /api/data/search
    {
    	"query": "{query}",
        "aggregations": {
    		"histograms": [
    			{
    				"name": "peak",
    				"field": "audiopeak.true_peak_dbfs",
    				"interval": 1,
    				"min_count": 1
    			}
    		]
    	}
    }

    The results group the items into the values specified:

    {
    	//... other search items
    	"aggregations": {
    		"histograms": {
    			"peak": {
    				"buckets": [
    					{"key":-3, "count":2},
    					{"key":-2, "count":4},
    					{"key":-1, "count":7},
    					{"key":0, "count":1},
    					{"key":1, "count":2},
    					{"key":2, "count":4},
    					{"key":3, "count":9}
    				]
    			}
    		}
    	}
    }
    
    • peak - (string) The name field passed in with the request.
    • buckets - (array of objects) The buckets that describe the results of the histogram.
    • key - (int) The lower value of the range. Items must be within key to key+interval to be counted.
    • count - (int) The number of items that fit into the interval.