Comprehending YAML

chantastic - Feb 17 '21 - - Dev Community

YAML is a little clever for my taste but I'm starting to get it.

Before yesterday's post, I had no idea how it related to JSON. In that post I personalized a few examples from the YAML docs and manually converted them.

And after sleeping on my exercise, I have a better picture of what's going on.

YAML infers complex types

As much as possible, YAML infers complex data structures by the data it composes.

- Evermore
- Folklore
- Lover
Enter fullscreen mode Exit fullscreen mode

This is a sequence and the containing Array structure is implied.
Here's how it looks in JSON:

["Evermore", "Folklore", "Love"]
Enter fullscreen mode Exit fullscreen mode

Compare this to a YAML mapping, where the root structure is implied to be an object.

Evermore: 2020
Folklore: 2020
Lover: 2019
Enter fullscreen mode Exit fullscreen mode

Again, here's what that looks like in JSON.

{
  "Evermore": 2020,
  "Folklore": 2020,
  "Lover": 2019
}
Enter fullscreen mode Exit fullscreen mode

So what happens if we try to mix sequences and mappings at the root level?

-Evermore
Folklore: 2020
Lover: 2019
Enter fullscreen mode Exit fullscreen mode

It breaks:

YAMLException: end of the stream or a document separator is expected at line 2, column 9: Folklore: 2020
Enter fullscreen mode Exit fullscreen mode

This tracks because something can't be both an object AND an array.

Understanding inference

Indentifying when you're describing an Array (sequence) and when you're describing an Object (mapping) is critically important. And it's not always clear.

Can you guess what the JSON equivalent for this YAML is?

- Evermore: 2020
- Folklore: 2020
- Lover: 2019
Enter fullscreen mode Exit fullscreen mode

The dashes - indicate that the root structure is an array (sequence). But each array item comprises a descrete object (mapping) with a key-value pair. This is inferred from the colon : between values in each array item.

So the JSON output for the YAML above is this:

[
  {
    "Evermore": 2020
  },
  {
    "Folklore": 2020
  },
  {
    "Lover": 2019
  }
]
Enter fullscreen mode Exit fullscreen mode

An array of objects (or "sequence of mappings").

Now that I understand it, I see the dash (-) like list-items in Markdown.

YAML infers simple types

Let's take an array (sequence) of objects (mappings).

- Evermore: 2020
- Folklore: 2020
- Lover: 2019
Enter fullscreen mode Exit fullscreen mode

The keys are strings and the values are numbers.

Evermore becomes "Evermore".

2020 stays 2020.

Taylor Swift becomes "Taylor Swift".

Takeaways

After looking at so much JavaScript and JSON, this is a little unsettling but — in simple examples like this — there's a simplicity to the representation.

YAML shorthand can be confusing

Below we have an array (sequence) of objects (mappings).

- name: Taylor Swift
- name: The National
Enter fullscreen mode Exit fullscreen mode

JSON looks like this.

[
  {
    "name": "Taylor Swift"
  },
  {
    "name": "The National"
  }
]
Enter fullscreen mode Exit fullscreen mode

What does it look like to add more properties to the these objects?

YAML allows us to use JSON object syntax.

- { name: Taylor Swift, album_count: 9 }
- { name: The National, album_count: 8 }
Enter fullscreen mode Exit fullscreen mode

But it's not super YAML-y. So there's an alternative that uses newlines.

- name: Taylor Swift
  album_count: 9
- name: The National
  album_count: 8
Enter fullscreen mode Exit fullscreen mode

This is challenging for me to interpret. Because the dash (-) is separating discrete objects (mappings) in the array (sequence). So it feels like the dash (-) is a directive for the object (mapping). But it's not, it's communicating that the containing structure is an array.

Pressing into the confusion, consider this array (sequence) containing a string, number, object, and array.

- Taylor Swift
- 1989
- album_count: 9
  nationality: American
- - Big Machine
  - Republic
Enter fullscreen mode Exit fullscreen mode

This is that same file in JSON.

[
  "Taylor Swift",
  1989,
  {
    "album_count": 9,
    "nationality": "American"
  },
  ["Big Machine", "Republic"]
]
Enter fullscreen mode Exit fullscreen mode

Takeaways

The presence of dash (-) and colon (:) describe the surrounding structure.

In the case of arrays (sequences) of objects (mappings), with multiple key-value pairs, this terseness can be unclear. At least until you've trained ourselves to see the invisble structures that (-) and (:) represent.

- name: Taylor Swift
  album_count: 9
Enter fullscreen mode Exit fullscreen mode

For me, learning to interpret the above line as an object in an array has made the biggest difference in my ability to quickly parse YAML.

YAML patterns

Identifying patterns is helpful.

Here are a few complex handoffs that I had trouble with.

Array of arrays : sequence of sequences

- - Evermore
  - Folklore
  - Lover
- - I Am Easy to Find
  - Sleep Well Beast
  - Trouble Will Find Me
Enter fullscreen mode Exit fullscreen mode

This YAML file is an array (sequence) containing two arrays (sequences) each with three strings (scalars).

Array of objects : sequence of mappings

- name: Taylor Swift
  album_count: 9
- name: The National
  album_count: 8
Enter fullscreen mode Exit fullscreen mode

This YAML file is an array (sequence) referencing two objects (mappings) each with two key-value pairs.

Object of arrays : mapping of sequences

Taylor Swift:
  - Evermore
  - Folklore
  - Lover
The National:
  - I Am Easy to Find
  - Sleep Well Beast
  - Trouble Will Find Me
Enter fullscreen mode Exit fullscreen mode

This YAML file is an object (mapping) with two key-value pairs, each key referencing an array (sequence) of strings (scalars).

Object of objects : mapping of mappings

Taylor Swift:
  album_count: 9
  label: Republic
The National:
  album_count: 8
  label: 4AD
Enter fullscreen mode Exit fullscreen mode

This YAML file is an object (mapping) with two key-value pairs, each key referencing another object with two key-value pairs with mixed strings and numbers (scalars) as values.

All mixed up

Parse isolated patterns is a good start but the big game is reading entire YAML files.

Look at this GitHub Actions workflow.

name: Netlify Rebuild
on:
  schedule:
    - cron: "0 21 * * MON-FRI"
jobs:
  build:
    name: Netlify Rebuild
    runs-on: ubuntu-latest
    steps:
      - name: Curl request
        run: curl -X POST -d {} https://api.netlify.com/build_hooks/601321b7879709a8b8874175
Enter fullscreen mode Exit fullscreen mode

Here's what we can evaluate.

  • The root structure is an object with three properties
  • name references a string
  • on references an object with one property
    • schedule references an array with one object containing one property
    • cron references a string
  • jobs referencs an object with one property
    • build references an object with three properties
    • name references a string
    • runs-on references a string
    • steps references array with an object containing two properties
      • name references a string
      • run references a string

Conclusion

I think I understand YAML enough to move on with my life. I hope you feel the same way.

Learning how to identify the implied structures has made all the difference.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .