-
-
Notifications
You must be signed in to change notification settings - Fork 23
Search Keyword: distinct
The [distinct(NAME)]
search keyword will match exactly one of every value in an Array (sequence/list/collection). This differs from [unique(NAME)]
in that distinct
will return the first of any duplicate values whereas unique
will yield only values which are not duplicated. This cannot be inverted.
[distinct(NAME)]
accepts one optional parameter, NAME
. This parameter specifies the exact -- case-sensitive -- name of the required child key when searching Hash (map/dict) or Array-of-Hashes (list/sequence of dicts/maps) data.
To illustrate, the example commands below will use this sample data as distinct-examples.yaml:
---
list1:
- aaa
- bbb
- ccc
list2:
- ccc
- ddd
- ddd
- eee
At a glance, we can see there are some duplicate values in the sample data. The list1
Array (sequence/list) -- on its own -- contains no duplicate values. However, list2
has a duplicate, 'ddd' and combining list1
with list2
would also duplicate the value, 'ccc'. Here are the distinct values in list1, list2, and the combined lists:
$ yaml-get --query='list1[distinct()]' distinct-examples.yaml
aaa
bbb
ccc
$ yaml-get --query='list2[distinct()]' distinct-examples.yaml
ccc
ddd
eee
$ yaml-get --query='((list1)+(list2))[distinct()]' distinct-examples.yaml
aaa
bbb
ccc
ddd
eee
In these results, we can see that only duplicates were dropped. In fact, only the first occurrence of each value is retained. This is great when the operation we mean to perform requires just one of every value in the data, discarding duplicates.
Let's take a look at the difference between "distinct" and "unique" data. If we run the same queries as above but using [unique(NAME)]
, we'd get:
$ yaml-get --query='list1[unique()]' distinct-examples.yaml
aaa
bbb
ccc
$ yaml-get --query='list2[unique()]' distinct-examples.yaml
ccc
eee
$ yaml-get --query='((list1)+(list2))[unique()]' distinct-examples.yaml
aaa
bbb
eee
Using [unique(NAME)]
instead of [distinct(NAME)]
excludes more of the source data. While [distinct(NAME)]
retains the first of any duplicated values, [unique(NAME)]
entirely discards every value which has any duplicates. For example, while list1
alone yields the value, 'ccc', it is dropped when we combine list1
with list2
because list2
also contains the value, 'ccc'.