Skip to content

Search Keyword: distinct

William W. Kimball, Jr., MBA, MSIS edited this page Jan 18, 2023 · 2 revisions
  1. Introduction
    1. Syntax
    2. Sample Data
  2. Get Only Distinct Values
  3. Contrasted With [unique(NAME)]

Introduction

The [distinct(NAME)] search keyword will match exactly one of every value in an Array (sequence/list/collection). This differs from [unique(NAME)] in that distinct will return the first of any duplicate values whereas unique will yield only values which are not duplicated. This cannot be inverted.

Syntax

[distinct(NAME)] accepts one optional parameter, NAME. This parameter specifies the exact -- case-sensitive -- name of the required child key when searching Hash (map/dict) or Array-of-Hashes (list/sequence of dicts/maps) data.

Sample Data

To illustrate, the example commands below will use this sample data as distinct-examples.yaml:

---
list1:
  - aaa
  - bbb
  - ccc

list2:
  - ccc
  - ddd
  - ddd
  - eee

Get Only Distinct Values

At a glance, we can see there are some duplicate values in the sample data. The list1 Array (sequence/list) -- on its own -- contains no duplicate values. However, list2 has a duplicate, 'ddd' and combining list1 with list2 would also duplicate the value, 'ccc'. Here are the distinct values in list1, list2, and the combined lists:

$ yaml-get --query='list1[distinct()]' distinct-examples.yaml
aaa
bbb
ccc

$ yaml-get --query='list2[distinct()]' distinct-examples.yaml
ccc
ddd
eee

$ yaml-get --query='((list1)+(list2))[distinct()]' distinct-examples.yaml
aaa
bbb
ccc
ddd
eee

In these results, we can see that only duplicates were dropped. In fact, only the first occurrence of each value is retained. This is great when the operation we mean to perform requires just one of every value in the data, discarding duplicates.

Contrasted With [unique(NAME)]

Let's take a look at the difference between "distinct" and "unique" data. If we run the same queries as above but using [unique(NAME)], we'd get:

$ yaml-get --query='list1[unique()]' distinct-examples.yaml
aaa
bbb
ccc

$ yaml-get --query='list2[unique()]' distinct-examples.yaml
ccc
eee

$ yaml-get --query='((list1)+(list2))[unique()]' distinct-examples.yaml
aaa
bbb
eee

Using [unique(NAME)] instead of [distinct(NAME)] excludes more of the source data. While [distinct(NAME)] retains the first of any duplicated values, [unique(NAME)] entirely discards every value which has any duplicates. For example, while list1 alone yields the value, 'ccc', it is dropped when we combine list1 with list2 because list2 also contains the value, 'ccc'.

Clone this wiki locally