Cloud: IO limits gone full circle

Frits Hoogland - Nov 4 '21 - - Dev Community

In the old days we used rotating disks, which had mechanical arms moving over the surface to read data, which meant there was a certain latency before data could be obtained, and a limited amount of bandwidth. The solution for getting more bandwidth was to use more disks (RAID). Still the overall usage was essentially bound by IOPS because of the mechanical arms/attenuators. (there was other storage media before that, but that is outside of the scope of this article)

Then came the solid state disks (SSD). Because these do not use mechanical, rotating disks, the latency is severely limited. In fact, this was such an improvement that the existing access protocols were found to be limiting SSD and new protocols were needed to take advantage of parallelism and bandwidth that was made possible by SSD (such as multipath IO and NVMe). Of course new storage (technology) comes with their own problems, but that is beyond the scope of this article.

Fast forward further in time and we enter the cloud era. Now we can rent a (virtual) machine, and elastically scale up and down, and let all these properties that we have obsessed about in the past, such as the number of disks, disk failure rates, bandwidth, etc. be the problem of the cloud vendor, we can just use the infrastructure...

Or can we? If you carefully look at the specifications of the virtual machines of all major cloud providers, you will notice that a cloud machine shape has obvious limits such as number of vCPUs and memory, but also has limits on disks, both on the layer of the virtual machine, as well as on the disk.

The disk limits being less obvious also gives me the impression that these are put in such a way that makes it easy to miss these.

But that is not what I wanted to discuss: if you look and work out the information about IO limits for a cloud machine shape together with one or more disk devices, you will notice that the IO limits of especially the smaller machine shapes are quite low.

In fact...if you take the IO limits of such machines, it leaves an impression with me that we essentially are back at the disk limits of the time of rotating disks.

But it's not all nostalgia: there is another side to this; this means that disk IO sensitive applications that have to use these machines have to be tuned for limited IOPS again, and cannot assume close to unlimited amounts of IOPS and bandwidth, using tuning such as using large IOs to be able to reach bandwidth, because parallel usage of small IOs will run into the IOPS limit.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .