One of the most crucial considerations when selecting an algorithm is the speed with which it is likely to complete. Predicting this speed involves using mathematical methods. This post delves into the mathematical tools, aiming to clarify the terms commonly used in this series and the rest of the literature that describes algorithms.

Problem Instance Size

Think of the problem you are giving to an algorithm implementation as a recipe you are following. An instance of that problem is like the number of portions your recipe will create. Most of the time, the more portions (or the larger the dataset), the longer it takes to cook the dish (or for the implementation to run). On the other hand, trying to compress the dataset might include unnecessary operations that will eventually slow down the execution. Finding this sweet spot in how you represent the data to the computer is surprisingly challenging. This is because real-world problems aren't naturally in a format computers understand. They need to be translated into a suitable representation, and there are many ways to do that.

When we judge how good an implementation is at solving a problem, we want to focus on the program's core logic (the algorithm), not how the problem was presented. We don't want the way we formatted the data to be the main thing that decides how fast or well the program works. The way you represent the problem should really just depend on what the program needs to do with the data. For example, if you need to find the shortest route between cities, the best way to represent the map data is different than if you're sorting a list of names.

Designing efficient algorithms often starts with picking the right tools for the job. In this case, the tools are data structures – how you organize and store the information in the computer's memory. By choosing the right data structures, you set the stage for a program that solves the problem quickly and smoothly.

Since there's no single, universal way to define the "size" of a problem for a computer program, we usually rely on common sense and practical conventions. For instance, when dealing with a task like sorting numbers, we typically assume that each number can be stored in a single 32-bit word (a standard unit of computer memory). So, if we're sorting 100 numbers, we say the size of the problem is 100.

Now, sometimes a number might be too big to fit into one word. But if it takes a consistent, fixed number of words (say, 2 or 3), our estimate of the problem size is only off by a small factor. For instance, an algorithm that works with 64-bit numbers might take roughly twice as long as one working with 32-bit numbers, even if they're doing the same kind of sorting.

The key point is this: we make these assumptions about how data is encoded so we can compare different algorithms fairly. By focusing on the size of the problem (like the number of items to be sorted) and assuming a reasonable way to store the data, we can get a good idea of how well an algorithm performs without getting bogged down in the details of data representation.

In the world of algorithm design, researchers accept that it's impossible to perfectly predict how the choice of data representation will impact the performance of an algorithm when implemented. To simplify things, they consider performance differences that are only a constant multiple of each other as being essentially the same, especially when dealing with very large problem sizes. This concept is called asymptotic equivalence.

Let's take an example. We know that working with 64-bit integers is generally slower than 32-bit integers. But if we have a good algorithm designed for sorting a million 32-bit integers, it's reasonable to assume it will also be a good algorithm for sorting a million 64-bit integers. The difference in processing time, while existing, becomes less significant as the problem size increases.

Of course, in real-life situations, nobody would be happy to pay a bill that's 1000 times larger than expected! But this simplification, where we disregard constant multiplicative factors, is a widely accepted way to compare the efficiency of different algorithms in theory. It helps researchers focus on the core logic of the algorithm and avoid getting bogged down in the details of implementation.

Best, Average and Worst Case Analysis

For a lot of problems, there's no one-size-fits-all solution. The best algorithm to choose depends on several factors

The Problem Itself: What exactly are you trying to solve? Different problems demand different approaches.
The Data: What kind of data are you working with? How are they distributed? Algorithms can be optimized for specific types of data and distributions.
The Algorithms: How do the different algorithms your are considering behave under different circumstances? Some might be faster for datasets, while others are more appropriate for larger ones.

To help navigate this decision-making process, algorithms are usually presented with three different common scenarios in mind.

Worst Case: This identifies a category of problem instances where an algorithm performs at its absolute slowest. Instead of pinpointing a single, specific input, algorithm designers usually describe characteristics of the input data that hinder the algorithm's efficiency. This could be things like a large input size , specific data patterns, or extreme outlier values.

Average Case: This focuses on the expected performance of an algorithm when faced with typical, randomly generated problem instances. While some specific cases might take longer due to unique characteristics, most instances should fall within a predictable range of execution time. This measure reflects the performance that a typical user can anticipate when using the algorithm.

Best Case: The best-case scenario represents a special class of problem instances that allow an algorithm to shine, performing at its absolute fastest with minimal work. These are often idealized or uncommon scenarios that rarely occur in practice.

Worst Case

When dealing with a fixed-size input, a computer program's execution time can still vary greatly depending on the specific data it receives. The worst-case execution time is the absolute longest it could take to run, considering all possible input scenarios of that size.

We focus on the worst-case scenario because it represents the upper limit of a program's slowness. This is important for applications where timeliness is critical. Additionally, analyzing the worst case is often easier than dealing with the average or best cases, providing valuable insights into overall efficiency.

To express it more formally, let's consider $S_n$ as the set of all problem instances $s_i$ with a size of $n$ . The function $t(s_i)$ measures the amount of work (computational steps or time) an algorithm takes to process a specific instance $s_i$ . The worst-case performance of $S_n$ , denoted as $T_{wc}(n)$ , is the maximum value of $t(s_i)$ across all instances of $S_n$ .

In simpler terms, $T_{wc}(n)$ represents the absolute longest time the algorithm could possibly take to solve any problem instance of size n. This worst-case behavior provides an upper bound on the algorithm's running time, guaranteeing that it will never take longer than this amount of time, regardless of the specific input. The rate at which $T_{wc}(n)$ grows as n increases defines the worst-case time complexity of the algorithm.

Average Case

Imagine a massive telephone system built to handle a huge number of phones (let's say $n$ phones). In the absolute worst-case scenario, we need to be prepared for a situation where half the people ( $\frac{n}{2}$ ) decide to call the other half $\frac{n}{2}$ simultaneously. This scenario would put immense strain on the system, and while it might not crash completely, building a system to handle this extreme case would be incredibly costly.

In reality, the chances of this exact scenario happening are very, very slim. It's highly unlikely that everyone would call a completely different person at the same time. Instead of building an exorbitantly expensive system to handle this improbable worst-case scenario, we can take a more practical approach. We can design a cheaper system and then use mathematical tools to analyze the likelihood of it crashing due to overload under more realistic conditions.

To do this, we assign probabilities to different scenarios or instances of phone calls. Each instance $s_i$ has a size $n$ (representing the total number of phones), and we give each instance a probability between 0 and 1. The sum of all probabilities for all possible instances of size $n$ must equal 1.

For each set of problem instances with size $n$ (represented as $S_n$ ), we assign a probability distribution $Pr{s_i}$ . This means each individual instance $s_i$ within $S_n$ gets a probability value between 0 and 1. These probability values reflect the likelihood of encountering each specific instance. Importantly, the sum of all these probabilities across all instances in $S_n$ must add up to 1, ensuring that we've covered all possible scenarios.

More formally, we can express this relationship as:

\sum_{s_i \in S_n} Pr \lbrace s_i \rbrace = 1

This equation simply states that when you add up the probabilities of all possible instances within a given set Sn, the total probability must equal 1 (or 100%).

If $t()$ measures the work done by an algorithm on each instance, then the average case work done by an algorithm on $S_n$ is:

T_{ac}(n) = \sum_{s_i \in S_n} t(s_i) Pr \lbrace s_i \rbrace

In simpler terms, the average-case work done by an algorithm on a set of problem instances $S_n$ is calculated by taking the work done on each specific instance ( $t(s_i)$ ) and multiplying it by the probability of that instance actually occurring ( $Pr \lbrace s_i \rbrace$ ). If the probability of an instance is zero, it doesn't contribute to the overall average work.

The average-case work, denoted as $T_{ac}(n)$ , gives us a more realistic picture of how an algorithm performs in real-world scenarios. It's a weighted average that takes into account the likelihood of encountering different problem instances. The rate at which $T_{ac}(n)$ grows as $n$ increases defines the average-case complexity of the algorithm, indicating how the algorithm's performance scales with the size of the input.

Best Case

Understanding the best-case scenario for an algorithm, even though it rarely happens in real life, can be quite insightful. It gives us a glimpse into the ideal conditions that would allow the algorithm to perform optimally.

Let's take Sequential Search as an example. The best-case scenario for this algorithm is when the value you're looking for (let's call it v) happens to be the very first element in the list. It finds it right away, making it super efficient in this specific situation.

Now, imagine a slightly different approach called Counting Search. This algorithm counts how many times the value v appears in the list. If the count is zero, it means the item wasn't found. Otherwise, it's there.

Here's the key difference: Counting Search always goes through the entire list, even if it finds v early on. So, even though its worst-case performance is the same as Sequential Search (O(n)), its best-case performance is also O(n). This means Counting Search isn't able to take advantage of situations where it could potentially finish faster, like when the value is found at the beginning.

In essence, while the best-case scenario might be rare, it reveals the potential for an algorithm to be even more efficient under optimal conditions. Understanding this can guide us in making decisions about which algorithm to choose for a particular task, especially if we know something about the characteristics of the data we're dealing with.

Lower and Upper Bounds

To simplify how we talk about "Big O" notation in this series, we'll focus on classifying how an algorithm behaves as it tackles problems of growing size, represented by $n$ . We're going to use the notation $O(f(n))$ , where $f(n)$ is typically a function like $n$ , $n^2$ , $n^3$ , or $O^n$ .

Let's, for example, consider an algorithm where the worst-case performance grows in direct proportion to the size of the input data, $n$ , once that input gets big enough. This means there's a positive constant $c$ and a threshold size $n_0$ such that the time it takes for the algorithm to run, represented by $t(n)$ , is always less than or equal to $c$ multiplied by $n$ for all values of $n$ greater than $n_0$ , or $t(n) \leq c \times n$ , $\forall n \gt n_0$ . In simpler terms, the algorithm's runtime is "linear", meaning that its execution time increases proportionally with the input size. It implies that as the size of the input data grows, the time it takes to process that data also grows linearly.

In this scenario, we classify the algorithm's worst-case performance as $O(n)$ , where $n$ represents the input size. This notation, called Big O notation, describes the upper bound of the algorithm's time complexity.

Now, imagine that the same algorithm's best-case performance is also directly proportional to the input size. This means there's a different positive constant $c$ and a different threshold $n_0$ such that $t(n)$ is always greater than or equal to $O(n)$ multiplied by $O(n)$ for all $O(n)$ greater than $n_0$ , or in mathematical terms $t(n) \geq c \times n$ , $\forall n \ge n_0$ . In other words, the algorithm's runtime never gets faster than linear, even in the best-case scenario.

In this situation, we classify the algorithm's best-case performance as $Ω(n)$ , using Big Omega notation. This indicates the lower bound of the algorithm's time complexity.

To summarize, the actual formal notation is as follows:

The lower bound for the execution time of an algorithm is classified as $Ω(f(n))$ and corresponds to the best-case scenario.
The upper bound for execution time of an algorithm is classified as $O(f(n))$ and corresponds to the worst-case scenario.

Summary

In this post we discussed how we analyze algorithms and what is the Big O Notation. In the next post we are going to discuss performance families of the algorithms. See you on the next post!

The Mathematics of Algorithms