Recently, I was thinking about remote play by sending inputs to a remote computer, and receiving back the screen. Unfortunately, a 1280x640 screen is 3.2mb, and at 60fps, that's 11gb per minute of data.
What's My Data
By the end of this you'll understand the idea that data and context knowledge drives compression. In my case, I want to send over a 1280x640 screen from a video game so I can view it on my local computer. I know that the screen is made up of pixels, and each pixel is 4 bytes, that's red, green, blue, and alpha. This makes one full screen 3,276,800 bytes, a lot to send so quickly. By using what we know about the data, we should be able to get this value down significantly, and enough to be viable over the wire.
Bytes per Frame: 3,276,800 | 3mb
Bytes per Second: 196,608,000 | 196mb
Bytes per Minute: 11,796,480,000 | 11gb
% Saving Since Last Step: 0%
% Saving Total: 0%
Alpha?
Firstly, why do I need the alpha? I know that in all cases alpha will be the full 255 value, because I'm never working with transparency. This means that each pixel is just red, green, and blue, which is 3 bytes. This is a 25% saving.
Bytes per Frame: 2,457,600 | 2mb
Bytes per Second: 147,456,000 | 147mb
Bytes per Minute: 8,847,360,000 | 8gb
% Saving Since Last Step: 25%
% Saving Total: 25%
Lower FPS
For the most part, 30fps is more than enough visually as long as the logic of the game still feels like it updates at 60fps. If anything, lowering the visual frame rate will ensure that we're getting smooth frames, even if there are less of them. This halves the number of frames we're sending. There are cases where you don't want to do this, but in my case, where I personally can barely tell the difference, I don't mind cutting the fps to save on data.
Bytes per Frame: 2,457,600 | 2mb
Bytes per Second: 73,728,000 | 73mb
Bytes per Minute: 4,423,680,000 | 4gb
% Saving Since Last Step: 50%
% Saving Total: 62.5%
Lower Resolution
Luckily for me, I have bad eyes. This means that I can watch YouTube at 480p and not tell any difference. This also means that I can lower the resolution of my frames to 640 by 320, effectively using a quarter of the data. Here I'm not only using information about the data in order to compress it, nor how it will be used, but information about the people using it. The more information like this you have, the better.
Bytes per Frame: 614,400 | 614kb
Bytes per Second: 18,432,000 | 18mb
Bytes per Minute: 1,105,920,000 | 1gb
% Saving Since Last Step: 75%
% Saving Total: 90.625%
Less Colors
Here are 2 colors.
One of them is 255,255,255.
The other is 255,255,239.
You can probably tell the difference, even I can, but would you be able to tell the difference at 30fps when it's taking up only one pixel of space?
Because of this, we can squeeze a red, green, and blue value into 2 bytes. We would do this by only using 5 bits per color, then with some bit-shifting get them all in there with a redundant bit because we actually have 16 bits in 2 bytes. With this extra bit we can either leave it alone, or use it for r so that we have double the red accuracy, but it doesn't really matter. So now we're only using 2 bytes per pixel as apposed to 3, which is a third saving.
Bytes per Frame: 409,600 | 409kb
Bytes per Second: 12,288,000 | 12mb
Bytes per Minute: 737,280,000 | 737mb
% Saving Since Last Step: ~33.3%
% Saving Total: 93.75%
We've made it below a gigabyte! But we can go further.
Huffman Encoding
It was surprisingly easy to implement Huffman encoding on my data. If you don't know what Huffman encoding is there are plenty of explanations of it online, but I'll give a very simple explanation here.
First, we count how many occurrences of each color are in the frame. What we want is the most frequent colors to take up the least amount of data.
We now have a sorted array of colors to counts. We're going to make a tree out of this.
To start building out this tree, we're going to take the 2 least frequent colors, and combine them. We do this by creating a node, and the node will have a point to both of these colors, and the node's count with be the sum of those colors counts.
We can then replace those 2 colors with the new node we've created, notice that this means there's one less item in the array. Remember, we still have to sort the array so that it's in descending count order. This means that the new node may be further up in the array.
We repeat this process of creating nodes until we only have one node left in the array, which is the head node. We've just made a binary tree, but how is this useful?
Well, let's imagine that moving left along the tree is a 0, and moving right is a 1. From this, any leaf on the tree can be expressed as travelling to that leaf, which can be expressed by a string of 1's and 0's. The way we've done this means that the more frequent colors will take less movements to get to that leaf, resulting in less 1's and 0's.
We can then use this encoding to compress our data, by creating a table of the tree's leaves.
From my testing, I was able to compress my data by 50% most of the time.
However, it is good to keep in mind that you're not always going to get an 50% gain, and you're also going to have to send over the encoding. I could've also done my Huffman encoding implementation terribly.
Bytes per Frame: 204,800 | 204kb
Bytes per Second: 6,144,000 | 6mb
Bytes per Minute: 368,640,000 | 368mb
% Saving Since Last Step: 50%
% Saving Total: 96.875%
Further Compression
So I'm happy with the amount of compression that I've hit right now. One minute of uncompressed data would be equivalent to half an hour of compressed data. There are a few more things I could do to compress it further.
Firstly, I can use a static Huffman encoding, where I use the same tree for say a full second of frames. This means I don't have to send the encoding every frame.
RLE (Run-Length encoding), but I haven't seen how much this would compress my data so I'm not using this too. But RLE can be used not only before Huffman, but even after. I would have to look at the sort of data I'm getting in a bit more before I decide on implementing RLE.
There are plenty of other compression algorithms that I could use, and I should look into them, but this is plenty compressed for what I want to do.
Furthermore, for my use case, I need this compression code to run at a minimum of 30fps. This means I should not only be worried about compression size, but also speed.
Using what I knew about the data and how it was being used made it extremely easy to get rid of unnecessary data and compress the rest.