Tag Your Unions Before You Wreck Your Unions

K - Dec 19 '17 - - Dev Community

Cover image by Paul Gorbould on Flickr.

Tagged union, discriminated union, disjoint union, variant, variant record, or sum types. Different name, similar concept, but what is it all about and how do tagged unions differ from regular ones?

Untagged Unions

If you are coming from statically typed languages like C, you probably already know about unions. A basic way to save data of different type into the same memory space. They are also called untagged unions sometimes.

An example in C could look like that

union MyUnion {
   int number;
   char text[20];
};

int main() {
   union MyUnion x;        

   x.number = 2;
   printf( "x.number: %d\n", x.number);

   strcpy( x.text, "Hello, world!");
   printf( "x.text: %s\n", x.text);

   return 0;
}
Enter fullscreen mode Exit fullscreen mode

The size of x in memory will be the biggest value that MyUnion can store. It looks a bit like a struct but if you write a value in one field it overrides the memory of the other fields. The basic idea behind this is to save space, also it makes languages like C a tiny bit more dynamic, because one variable now can store different types.

As you probably can imagine, this can also be used to save different types of structs into one memory space.

The problem with unions is, the type-checker doesn't care what you are doing.

If you declare an int x, the type-checker will throw an error if you try to put a string inside of it.

If you declare an union MyUnion x, the type-checker won't keep track of what you are storing, since it's runtime dependent, so you have to check inside of your program logic if it's okay to access x.number or x.text.

How is this realated to JavaScript?

Well, in JavaScript, you can't type your variables, which allows you to store anything in them.

let x = 2;
console.log("Number:", x);

x = "Hello, world!";
console.log("Text", x);
Enter fullscreen mode Exit fullscreen mode

This can be rather convenient, because if you data-structure changes, you still can put it inside the same variables, without caring about the types.

The problems arise when you get a bit more complex data-structures.

let x = {
  httpMethod: "GET",
  path: "/users",
  queryParams: { id: 10 }
};
console.log("ID:", x.queryParams.id);

x = {
  httpMethod: "POST",
  path: "/users",
  body: { name: "Jane" }
};
console.log("ID:", x.body.name);
Enter fullscreen mode Exit fullscreen mode

As you can see, a GET request comes with a queryParams field and a POST request comes with a body field. The path is the same, but some parts differ.

You can use the httpMethod field to check what it is, but you have to do it yourself. If you get this wrong, you could end up accessing x.body.id in a GET request and everything blows up, because x.body is undefined.

If you used JavaScript for a while, you probably noticed that basically all data is a untagged union. Most of the time you just store one type of data into a variable, but more often than not you end up pushing around objects that are kinda the same, but differ in some fields, like the request example above.

Tagged Unions

So what's the idea about tagged unions?

They let you define the differences of your unions with the help of a static type system.

What does this mean?

Like I explained with the request example, you often have a bunch of different data types, that come in one variable, like an argument of a function or something. They are basically the same, but vary in few fields or they are entirely different. If you want to be sure you don't access data that isn't there and prevent the infamous is undefined errors, you would have to check inside the program code at runtime.

Such a check could look like this:

function handle(request) {
  if (request.method == "GET") console.log(request.queryParams.id);
}
Enter fullscreen mode Exit fullscreen mode

You could also directly check the queryParams object, but nobody forces you to do so, this is completely in your hand and could fail one day in production.

Languages with tagged unions in their type-system allow you to make this check at compile time. Reason is such a language.

An example of a request type could look like this:

type body = {name: string};
type queryParams = {id: string};
type httpMethod = GET(queryParams) | POST(body);

type request = {
  path: string,
  httpMethod: httpMethod
};
Enter fullscreen mode Exit fullscreen mode

Now the data is encapsulated inside a tagged union (called variant in Reason), which is the httpMethod type at the top.

If the content of httpMethod is GET, you don't even get access to a body, which could have (and often has) an entirely different structure from queryParams.

Example of a usage could look like that:

let handleRequest = (req: request) => 
  switch (req.httpMethod) {
  | GET(query) => Js.log("GET " ++ req.path ++ " ID:" ++ query.id)
  | POST(body) => Js.log("GET " ++ req.path ++ " ID:" ++ body.name)
  };
Enter fullscreen mode Exit fullscreen mode

What does this do? It types the req argument as request. Since req.httpMethod is a variant (= tagged union), we can use switch to do things for the different types in that variant.

Many languages that have tagged unions even force you to do things for every possibility. This seems strange at first, but it can help later. If someone changes that tagged union, which can be defined somewhere else in the code, the type-checker will tell you that you need to do something for the new type in that union. This could be forgotten if done manually.

Conclusion

Tagged unions are a nice way to store different data-types inside of one variable without losing track of their structure. This allows code to be written more like in a dynamically typed language while giving it more safety in the long run.

Reason is such a language, it tries to make concepts like tagged unions, called variants in Reason, accessible for JavaScript developers while delivering it with a familiar syntax.

TypeScript has tagged unions too, if you aren't into that whole FP thingy.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .