-
-
Notifications
You must be signed in to change notification settings - Fork 23
yaml merge Anchor Options
The yaml-merge
command-line tool enables users to control how it handle conflicting Anchors during a merge operation. Non-conflicting Anchors are automatically merged and the definition-use order between each Anchor and its Aliases is preserved during the merge. An Anchor conflict occurs only when the same Anchor name is used in both of the LHS and RHS documents and the Anchored value is different between them.
Any attempt to merge with Anchor conflicts when using default options will result in an error. This is because the yaml-path
tool will refuse any operation which would destroy data unless the user explicitly instructs it to do so, and how. Not all Anchor conflict resolution options are destructive; the rename
option reserves all Anchors and their Aliases in both documents by uniquely renaming conflicting Anchors and Aliases in the RHS document.
But what are Anchors? In short:
Anchors are a name assigned to a Scalar or Hash (AKA: Map, Dictionary) YAML data element. This name is prefixed with an
&
sign. Beyond that point within the same YAML document, the data which was "anchored" can be re-used any number of times by referencing the Anchor via its Alias form whereby the&
is replaced with an*
sign.
In other words, YAML defines two kinds of Anchors. Scalar Anchors define reusable Scalar values (which are String, Integer, Float, Boolean, etc. data types). Hash Anchors (AKA Map Anchors or Dictionary Anchors). To date, there is no definition in YAML for an Array Anchor (AKA List Anchor AKA Sequence Anchor).
Each of these Anchor types will be explored in how they are merged based on user option selection. When there is no conflict, RHS Anchors are merged into the LHS document no matter what Anchor merge option is selected. In order to resolve Anchor conflicts, the yaml-merge
tool's available Anchor merge options include:
-
stop
(the default) causes merging to abort upon detection of an Anchor conflict. -
left
causes LHS Anchors to overwrite conflicting RHS Anchors. -
right
causes RHS Anchors to overwrite conflicting LHS Anchors. -
rename
causes RHS Anchors to be automatically and uniquely renamed before they are merged into the LHS document. The RHS document is updated to use the new Anchor name everywhere its Aliases appear before the merge.
Scalar Anchors create a single value which can be reused in multiple places within the same YAML document. This enables YAML file authors to institute "One Version of the Truth", enabling a change in exactly one place to affect an identical change everywhere that same change is needed. By convention, they are usually gathered up into a special Array named aliases:
near the top of each YAML file. Such a file might look like:
File: LHS1.yaml
---
aliases:
- &scalar_anchor_string This is a reusable String value
- &scalar_anchor_integer 5280
a_hash:
which_reuses:
those_anchors:
string_alias: *scalar_anchor_string
integer_alias: *scalar_anchor_integer
in_several_places:
string_alias: *scalar_anchor_string
integer_alias: *scalar_anchor_integer
Were you to query the YAML file at /a_hash/which_reuses/those_anchors/string_alias
(or a_hash.which_reuses.those_anchors.string_alias
), you'd get back This is a reusable String value
:
yaml-get --query=/a_hash/which_reuses/those_anchors/string_alias LHS1.yaml
This is a reusable String value
We can impose an Anchor conflict by attempting to merge with:
File: RHS1.yaml
---
aliases:
- &scalar_anchor_string A DIFFERENT STRING VALUE
- &scalar_anchor_integer 5280
another_hash:
another_alias_string: *scalar_anchor_string
another_alias_integer: *scalar_anchor_integer
Notice that RHS1.yaml redefines &scalar_anchor_string
but not &scalar_anchor_integer
. In this case, only &scalar_anchor_string
is in conflict. A query against /another_hash/another_alias_string
(or another_hash.another_alias_string
) in RHS1.yaml would yield the expected value:
yaml-get --query=/another_hash/another_alias_string RHS1.yaml
A DIFFERENT STRING VALUE
The next sub-sections are examples of the various ways to deal with such an Anchor conflict.
You can instruct yaml-merge
to override the value of conflicting Anchors in the RHS document. By setting --anchors=left
or -a left
, yaml-merge
would produce this output:
---
aliases:
- &scalar_anchor_string This is a reusable String value
- &scalar_anchor_integer 5280
- *scalar_anchor_string
- *scalar_anchor_integer
a_hash:
which_reuses:
those_anchors:
string_alias: *scalar_anchor_string
integer_alias: *scalar_anchor_integer
in_several_places:
string_alias: *scalar_anchor_string
integer_alias: *scalar_anchor_integer
another_hash:
another_alias_string: *scalar_anchor_string
another_alias_integer: *scalar_anchor_integer
Notice that the RHS Anchors were turned into Aliases of the same-named LHS Anchors. Because the default option for Arrays is to preserve all
elements from both documents during the merge, the aliases:
array receives the redefined Anchors-as-Aliases from the RHS document. This is normal and you can override this behavior.
If you were to repeat the previous two queries against the merged document, and inspect the resulting aliases:
Array, you'd get:
yaml-merge --anchors=left LHS1.yaml RHS1.yaml | yaml-get --query=/a_hash/which_reuses/those_anchors/string_alias -
This is a reusable String value
yaml-merge --anchors=left LHS1.yaml RHS1.yaml | yaml-get --query=/another_hash/another_alias_string -
This is a reusable String value
yaml-merge --anchors=left LHS1.yaml RHS1.yaml | yaml-get --query=/aliases -
["This is a reusable String value", 5280, "This is a reusable String value", 5280]
You can instruct yaml-merge
to override the value of conflicting Anchors in the LHS document. By setting --anchors=right
or -a right
, yaml-merge
would produce this output:
---
aliases:
- &scalar_anchor_string A DIFFERENT STRING VALUE
- &scalar_anchor_integer 5280
- *scalar_anchor_string
- *scalar_anchor_integer
a_hash:
which_reuses:
those_anchors:
string_alias: *scalar_anchor_string
integer_alias: *scalar_anchor_integer
in_several_places:
string_alias: *scalar_anchor_string
integer_alias: *scalar_anchor_integer
another_hash:
another_alias_string: *scalar_anchor_string
another_alias_integer: *scalar_anchor_integer
Notice that the conflicting RHS Anchor (&scalar_anchor_string
) has replaced the original LHS value. As with the right
behavior and because the default option for Arrays is to preserve all
elements from both documents during the merge, the aliases:
array again shows the redefined Anchors-as-Aliases from the LHS and RHS documents. This is normal and you can override this behavior.
If you were to repeat the previous two queries against the merged document, and inspect the resulting aliases:
Array, you'd get:
yaml-merge --anchors=right LHS1.yaml RHS1.yaml | yaml-get --query=/a_hash/which_reuses/those_anchors/string_alias -
A DIFFERENT STRING VALUE
yaml-merge --anchors=right LHS1.yaml RHS1.yaml | yaml-get --query=/another_hash/another_alias_string -
A DIFFERENT STRING VALUE
yaml-merge --anchors=right LHS1.yaml RHS1.yaml | yaml-get --query=/aliases -
["A DIFFERENT STRING VALUE", 5280, "A DIFFERENT STRING VALUE", 5280]
You may not always wish to override the values of Scalar Anchors during a merge. In some cases, such collisions may be accidental and you really need both documents' Anchors to be preserved during the merge. For this and similar cases, yaml-merge
supports a rename
option for handling Anchor collisions. By setting --anchors=rename
or -a rename
, yaml-merge
would produce this output:
---
aliases:
- &scalar_anchor_string This is a reusable String value
- &scalar_anchor_integer 5280
- &scalar_anchor_string_1 A DIFFERENT STRING VALUE
- *scalar_anchor_integer
a_hash:
which_reuses:
those_anchors:
string_alias: *scalar_anchor_string
integer_alias: *scalar_anchor_integer
in_several_places:
string_alias: *scalar_anchor_string
integer_alias: *scalar_anchor_integer
another_hash:
another_alias_string: *scalar_anchor_string_1
another_alias_integer: *scalar_anchor_integer
Notice that the conflicting RHS Anchor (&scalar_anchor_string
) was renamed to &scalar_anchor_string_1
. Throughout the rest of the merged document, all uses of the previously-conflicting Anchor name have been renamed to match this change, fully preserving both the LHS and RHS documents in the merged result. Note also that the non-conflicting anchor (&scalar_anchor_integer
) caused no disruption and seamlessly integrated with the LHS original. Because the default option for Arrays is to preserve all
elements from both documents during the merge, the aliases:
array again shows the redefined Anchor-as-Alias for this non-conflicting element. This is normal and you can override this behavior.
If you were to repeat the previous two queries against the merged document, and inspect the resulting aliases:
Array, you'd get:
yaml-merge --anchors=rename LHS1.yaml RHS1.yaml | yaml-get --query=/a_hash/which_reuses/those_anchors/string_alias -
This is a reusable String value
yaml-merge --anchors=rename LHS1.yaml RHS1.yaml | yaml-get --query=/another_hash/another_alias_string -
A DIFFERENT STRING VALUE
yaml-merge --anchors=rename LHS1.yaml RHS1.yaml | yaml-get --query=/aliases -
["This is a reusable String value", 5280, "A DIFFERENT STRING VALUE", 5280]
In most cases, this is probably the preferred outcome.
Hash Anchors are quite a bit more complex than Scalar Anchors. In turn, they are extremely useful at reducing duplication of Hash structure and simultaneously reusing more than one key-value pair. Rather than create a single data element (a Scalar value) which can be reused throughout the remainder of the YAML document, a Hash Anchor enables repeatedly reusing an entire Hash. Any number of the key-value pairs within the Anchored Hash can be discretely overridden at each point of reuse, allowing "defaults" within the Anchored Hash to be set to concrete values. Further, multiple Anchored Hashes can be combined to produce novel combinations of reusable Hash fragments.
A simple example might be a short set of publications, like:
File: LHS2.yaml
book_defaults: &defaults
author: UNKNOWN
publisher: UNKNOWN
publication_year: UNKNOWN
publications:
books:
'A Novel':
<<: *defaults
author: John Doe
publisher: Books-R-Us
'A Tech Manual':
<<: *defaults
publication_year: 1998
contributing_authors:
- Jane Doe
- Billy Bob Joe Frank
Note that Hash Anchor names come after the name of the Hash being Anchored and the name of the Anchor does not need to match the name of the key defining the reusable Hash; this is the reverse of Scalar Anchors. Notice also the YAML Merge Operator, <<:
, later in the document. When present in YAML files, the YAML Merge Operator is an instruction to embed the full contents of the same-named Anchored Hash. Any keys following this instruction override the same-named keys from the Anchored Hash; new keys extend the Hash for that instance. This is all separate from the capabilities provided by the yaml-merge
tool.
To explore this concept, consider these queries and their results:
yaml-get --query='/&defaults' LHS2.yaml
{"author": "UNKNOWN", "publisher": "UNKNOWN", "publication_year": "UNKNOWN"}
yaml-get --query="/publications/books/'A Novel'" LHS2.yaml
{"author": "John Doe", "publisher": "Books-R-Us", "publication_year": "UNKNOWN"}
yaml-get --query="/publications/books/'A Tech Manual'" LHS2.yaml
{"publication_year": 1998, "contributing_authors": ["Jane Doe", "Billy Bob Joe Frank"], "author": "UNKNOWN", "publisher": "UNKNOWN"}
You can see the that each book record always has all of the fields provided by the &defaults
Anchored Hash, even when the data for the book is not specified. Further, novel fields in a book record extend the definition of the record, beyond the fields provided by the default.
Note that it is convention to place the Hash Merge Operator as the first child of a Hash being merged but it could be at any child position within the Hash. The specification (draft) for this operator specifically stipulates that a Hash merge will never overwrite any existing keys already defined within the target Hash, whether they come before or after the operator. This holds true when merging multiple Anchored Hashes into the same Hash; only the first instance of any given key from the Anchored Hash is used and any locally-defined keys always override same-named keys from the Anchored Hash.
Let's take a look at how the yaml-merge
tool interacts with these, particularly when a conflict is detected.