If you have been coding in the web development industry, you are most likely pretty familiar with JSON. It is the all-encompassing de facto standard that is never challenged. It is used everywhere, and you have become accustomed to it. All your REST calls transfer data via JSON. You know the format's limitations, and you accept them.
Or do you have to?
(Note: all links to packages and code are in the Links-section of the article)
Brief history
My background is heavily in the Java and JavaScript/TypeScript world, so I have learned how to deal with their own quirks. And many years ago, I began a hobby web project (TypeScript/Node) that had a problem which JSON could not solve well.
I wanted to break free from the RESTful mindset to a more relaxed, message-based transport between browser and server. And for that, I really wanted to utilize the JavaScript type system to differentiate messages from each other. You know, one would have classes like AddDocument, GetUsers, GiveMeAllYourMoney, etc. And instead of having many HTTP endpoints, I would have just one, and the messages would flow from browser to server and back in a more ad hoc manner.
But I did not really have an elegant solution for my needs, because JSON destroys all type information during serialization. Of course, I could use some dedicated property to transfer types, but that would require custom processing, and I just felt that it was not the route I wanted to go. I just wanted a protocol that would take my object as is and serialize it in such a way that it would be exactly the same when deserialized. It would retain all type information, and that's it. So I needed an alternative.
Thus, I began the challenge of creating a better JSON for myself. This journey has taken at least five years so far, and with the current evolution of the protocol, I finally feel it is mature enough to be introduced to others who might be interested.
The shortcomings of using JSON
As I mentioned before, the original spark for the project was the capability to retain types during the serialization process. But I quickly realized that I could then also embed even more information that JSON could not.
JSON has a bad habit of not guaranteeing anything. You can put whatever you like, and you normally have to just trust that your name property actually contains a string and not an array of booleans. Or you check everything at runtime to make sure. Of course, there are libraries to check the model. But again, I asked myself, why can't the actual protocol implementation already do it so that I could trust that what was deserialized was the thing I wanted?
Also, the native types of JSON are quite limited. For instance, JSON has an array for collections. But JavaScript already has sets and maps. Could I add them too? And what about dates? There have been numerous times when I have struggled with date formats. Maybe you have a date with a timezone, or maybe you don't. Was it even intended to have one? You never know, because JSON does not really tell you anything.
So in essence, I wanted to rectify these kind of shortcomings in some way.
Introducing Cbot (Character Based Object Transport) -protocol
The Cbot protocol is designed to have a character based and machine-readable format only. It has a predictable and straightforward syntax, and it could be seen as a kind of small assembly language. Each command is separated by a newline, and each line begins with an opcode that explains how objects are to be constructed.
The protocol natively supports a much larger set of types compared to JSON, which includes:
- Integer types: 32-bit and 64-bit integers, and big integer
- Float types: 32-bit and 64-bit floats, and big decimal
- Boolean
- String
- Collection types: List/Array, Set, and Map
- Date types: Zoned datetime, Local datetime, Local date, and Local time
- Binary types: 8-bit array
The protocol can be used in schemaless or schemaful mode, or you can mix them together at will. The schemaless mode is a drop-in replacement for JSON.stringify()
and JSON.parse()
, creating JSON-like behavior.
In the schemaful mode, you create a self-validating layer that works at the protocol level. This enables the transport of types (classes) and enforces and validates that data structures adhere to a defined schema. This ensures that the received message is exactly what it was intended to be. It also supports polymorphism, which enables you to use superclasses or interfaces as property types.
Regarding the schemaful mode, in existing implementations, you do not use an actual predefined schema at all. Instead, in TypeScript, you create a meta-model for your classes in the code itself. And, for instance, in Java, the model can be inferred via reflection automatically. Even though these classes do not share a "primary" schema, they can be validated against each other to check correctness.
I personally find the validation approach to be more flexible because it allows the classes to have variety in their implementations rather than being fixed to some specific format by a third party.
Why not a binary protocol or just use other alternatives?
One could argue that there are plenty of alternatives to JSON, like Protocol Buffers, MessagePack, or Avro. But the nature of these alternatives is that they are all binary protocols. And even when I searched for information on where they are used, web development was totally absent from the scene.
Even today, I am simply unable to find any specific articles or use cases that include an actual web browser as a client. So it is anyone's guess why JSON still dominates as a transport protocol.
Also, I personally didn't feel that any of the existing protocols would satisfy my requirements. I just wanted something that works out of the box (at least almost) and would stay in the background. The purpose of a transport protocol is not to be visible or something you have to constantly worry about. It would do its job efficiently, and that's it.
So, in the end, I felt that a more efficient character-based solution would suit web development better. Strings are easy to handle and manipulate in JavaScript, and the character nature creates some opportunities to display the content of the message in a human-understandable format.
What does it look like?
This article is not meant to be a tutorial for Cbot but because you are most likely a developer/engineer, you would like to have at least some sort of understanding of what is happening. So I formulated an example for this purpose. In the example, I am using Cbot as a simple JSON replacement. Using more advanced features requires the use of a metamodel, which is also discussed in the actual tutorial.
So here is the example object:
{
name: "John Smith",
age: 41,
address: {
street: "Second Avenue",
postalCode: "1356-A",
city: "Yorkistan"
},
isNiceGuy: true,
hobbies: [
"Playing cards",
"Shopping",
"Asking odd questions"
],
favouritePoem: {
title: "Digital Dreams",
created: new Date("2024-09-16T12:13:00"),
content: "In the code, we drift and weave,\n"
+ "A dance of data we perceive.\n"
+ "With each keypress, a world unfolds,\n"
+ "Infinite stories, yet untold."
}
}
When this object is converted to a Cbot-message, it looks like this:
112345ac0
E
A name
B JKJohn Smith
A !age
B !Id41
A "address
B "E
A #street
B #JKSecond Avenue
A $postalCode
B $JK1356-A
A %city
B %JKYorkistan
F
A &isNiceGuy
B &Iet
A 'hobbies
B 'C
JKPlaying cards
JKShopping
JKAsking odd questions
D
A (favouritePoem
B (E
A )title
B )JKDigital Dreams
A *created
B *Ih2024-09-16T12:13:00.000+03:00
A +content
B +JL
OIn the code, we drift and weave,
OA dance of data we perceive.
OWith each keypress, a world unfolds,
NInfinite stories, yet untold.
M
F
F
Because this format is meant to be read programmatically, it does not really make any sense to be read as is. However, it can be visualized in disassembly format, which explains the content much better:
MCSM 12345ac0
OBJB (plain)
DEFN 0 name
ASGV 0 (name) STRN SSTR John Smith
DEFN 1 age
ASGV 1 (age) NATV FLOAT64 41
DEFN 2 address
ASGV 2 (address) OBJB (plain)
DEFN 3 street
ASGV 3 (street) STRN SSTR Second Avenue
DEFN 4 postalCode
ASGV 4 (postalCode) STRN SSTR 1356-A
DEFN 5 city
ASGV 5 (city) STRN SSTR Yorkistan
OBJE
DEFN 6 isNiceGuy
ASGV 6 (isNiceGuy) NATV BOOLEAN TRUE
DEFN 7 hobbies
ASGV 7 (hobbies) ARRB
STRN SSTR Playing cards
STRN SSTR Shopping
STRN SSTR Asking odd questions
ARRE
DEFN 8 favouritePoem
ASGV 8 (favouritePoem) OBJB (plain)
DEFN 9 title
ASGV 9 (title) STRN SSTR Digital Dreams
DEFN 10 created
ASGV 10 (created) NATV ZONED_DATETIME 2024-09-16T12:13:00.000+03:00
DEFN 11 content
ASGV 11 (content) STRN STBG
STNL In the code, we drift and weave,
STNL A dance of data we perceive.
STNL With each keypress, a world unfolds,
STPA Infinite stories, yet untold.
STEN
OBJE
OBJE
In the disassembly, one can see a number of commands, some explanations, and data. Here is a brief summary of the opcodes:
-
MCSM
is a Model Checksum, which is used as a sanity check that the message is understood by both parties. -
OBJB / OBJE
denotes the beginning and the end of an object -
DEFN / ASGV
pair means that first an index is assigned to a property name, and then ASGV uses that index to assign a value to an object. Therefore, if the same property name is encountered again within the message, it does not have to be repeated. -
SSTR SSTR
denotes a simple ordinary string -
NATV FLOAT64
denotes a native value for a 64-bit float -
NATV BOOLEAN TRUE
, well you guessed it already -
ARRB / ARRE
-pair denotes the beginning and the end of an array -
NATV ZONED_DATETIME
denotes a zoned datetime, which is the default for Javascript Date -
STRN
,STBG
,STNL
,STPA
, andSTEN
are a set of instructions that define a string builder. Because strings may contain newlines and they can be indefinitely long, a string builder pattern is used to split the string into more manageable pieces.
Is this a TypeScript-only thing?
No, it is not.
Due to my background and the use case, the implementation naturally started from the JavaScript side. But because Cbot is language-agnostic, it can be extended to other languages as well.
Is there a specification somewhere?
Kind of. I found that creating a proper specification was actually really hard to do. I did try to use some sort of EBNF format to create one, but my first problem was that there is no single specification for such a format (the irony). Just a bunch of interpretations of it. Also, even if I had used one of the versions, I wouldn't have any means to actually validate specifications correctness.
So instead, I decided to create a TypeScript file that contains the validation logic as types and classes. And I used that spec file to validate my tests. Thus, it became the validating specification. That spec file is then the master specification that other implementations must use as the source of truth.
What is the status of the project right now?
As of the time this article is written, the project has a version number 0.8
. In my roadmap, this is now the finalized version, which includes all the features that have proper use cases. So it is basically feature-complete.
Also, there exist full client implementations for Java and JavaScript that are also considered feature-complete.
In order to get to version 1.0
, it basically requires testing and feedback from those who are interested. It is the only way to know what is still missing or does not work as expected. Also, it would be nice to find people who are able to create implementations for other platforms like Python or Rust.
So, do give it a try and tell me about your impressions.
Links
- Contact: sisujs@sisujs.fi
- Git: https://gitlab.com/sisujs/sisujs/
Repositories
- NPM: https://www.npmjs.com/package/@sisujs/meta-cbot
- MVN: https://mvnrepository.com/artifact/fi.sisujs/cbot or https://central.sonatype.com/artifact/fi.sisujs/cbot
Documentation
- TypeScript Tutorial: https://gitlab.com/sisujs/sisujs/-/blob/main/docs/cbot/tutorial_ts.md
- Java Tutorial: https://gitlab.com/sisujs/sisujs/-/blob/main/docs/cbot/tutorial_java.md
- Typedoc: https://gitlab.com/sisujs/sisujs/-/blob/main/js/meta-cbot/typedoc/README.md
- Javadoc: https://www.javadoc.io/doc/fi.sisujs/cbot/latest/index.html