One of my favorite things about learning to code is that any time you think, could this be shorter? You're probably right. A nice example of this that I encountered recently is the difference between .max
and .max_by
when finding the maximum value(s) in a Ruby array.
Should I use .max
or .max_by
?
When I first saw the ruby array methods .max
and .max_by
, I wondered why we needed both, or why we couldn't just always use .max
, since it's shorter and I don't want to be wasting precious seconds typing out an extra 3 characters if I don't need to. Spoiler alert - in many cases, using .max_by
ends up making your code overall shorter and cleaner! I will discuss the different ways to use each here, but I also want to note that the same logic applies for .min
/ .min_by
, .minmax
/ .minmax_by
, and .sort
/ .sort_by
. If you want to know some more terms - all of these methods are connected by the Ruby Enumerable class, and more specifically, the Comparable class. Again, I focus here on .max
, but really all of these methods come down to the fundamental principle of sorting!
The "spaceship" operator: <=>
All of these methods (min, max, sort) use the <=>
operator. The <=>
, or "spaceship" combines conventional comparison operators (<, <=, ==, >=, and >):
a <=> b
if a < b then return -1
if a = b then return 0
if a > b then return 1
if a and b are not comparable then return nil
and is then used in an object, such as an array, to order its elements. I like to think of an array of numbers, say [4, 2, 7, 1]
, and imagine the computer asking, is 4 < 2
? No. Is 4 > 2
? Yes. Move it up in the list. And so on with 4 < 7
, etc.
It's a bit different with strings:
"hello" <=> "world" #=> -1
Some of the rules with <=>
and strings are intuitive. As you might expect, you cannot <=>
compare a string to an integer/float. But! You can compare a string with a string of an integer.
"abcdef" <=> 1 #=> nil
"abcdef" <=> "1" #=> 1
Second, if the strings are identical, we get '0' for equal.
"abcdef" <=> "abcdef" #=> 0
But then it gets a little tricky. You might be thinking, a string is 'less than' or 'equal to' another based on their lengths. And you would be partially correct.
Pretty much all that the official Ruby documentation seems to give on this point is "if the strings are of different lengths, and the strings are equal when compared up to the shortest length, then the longer string is considered greater than the shorter one."
"abcdef" <=> "abcde" #=> 1
"abcdef" <=> "abcdefg" #=> -1
But! This can be misleading, and for me it makes more sense to think in terms of alphabetical sorting:
"abcdef" <=> "ABCDEF" #=> 1
"horse" <=> "apple #=> 1
"a" <=> "z" #=> -1
"categorically" <=> "category" #=> -1
I love the idea that everything can be represented by a number. And that idea is very important here, because the <=>
operator with strings is actually comparing characters in binary. So an a
is 01100001
, or 97 in decimal, A
is 01000001
, or 65, which explains why an identical, but lowercase, string would be considered 'greater than' its capitalized version. And an a
is less than z
because each letter increases by one throughout the alphabet - z
in binary is 01111010
or 122. Or, in more human terms - they are simply sorted in alphabetical order...
"z" <=> "apple" #=> 1
...and methods such as .max
return the last string because "maximum" is the numerical way to think of a string closest to the end of the alphabet.
a = %w(dog albatross horse)
a.sort #=> ["albatross", "dog", "horse"]
a.max #=> "horse"
While this is all very interesting background information on how these methods work, it still seems to me that the <=>
operator and Ruby's Comparable
mixin are more useful in cases such as .max
, because we are more directly interacting with numbers. Sure, I just said everything is a number, but finding the maximum from a list of numbers seems more common to me than finding the maximum string.
When to use .max
[5, 1, 3, 4, 2].max #=> 5
[5, 1, 3, 4, 2].max(3) #=> [5, 4, 3]
.max
is useful and concise if you just want to find the maximum value(s) from a list of numbers, such as an array, or a range:
(10..20).max #=> 20
So if you just have that list of numbers, and you want to know the maximum or minimum number, .max
is your fastest, shortest way there. Great! But what if you need to be more specific? This is where I start to question the usefulness of .max
and wonder if there is a better way:
a = %w(albatross dog horse)
a.min(2) #=> ["albatross", "dog"]
a.max(2) #=> ["horse", "dog"]
a.max { |a, b| a.length <=> b.length } #=> "albatross"
Again, calling .max
or .min
on an array of strings will return the string(s) with an alphabetical sort. When a number of arguments is given, it will return them in descending order. This would be useful if you just wanted the first or last string alphabetically and didn't care about having or using the whole sorted array. If you needed that, you could just do .sort
and then .first
/.last
etc, without the .max
or .min
.
c = %w(mouse house cat rat bat)
c.sort! #=>["bat", "cat", "house", "mouse", "rat"]
c.first #=> "bat"
c.last #=> "rat"
c[2] #=> "house"
Special shoutout here to .sort!
- it is destructive so it modifies the original array. .sort
is non-destructive so it would still sort, but it would be more like creating a separate array and so in the example above, c
would still have the same order and methods such as c.first
would return mouse
, etc.
It's more likely that we would want to return something like the longest word. Now, .max
with the |a,b|
block is where you might be thinking, this is useful! We can specify .length
now! And you are right. Also, if we ever needed to, we could reverse the .max
by reversing the order of a
and b
:
(10..20).max {|a,b| b <=> a} #=> 10
but this is just a longer way to write .min
. Similarly, specifying a.length <=> b.length
with .max
is just a longer way to write... .max_by
!
.max_by
array = ["albatross", "dog", "horse", "fish", "antelope", "zzzzzzzz"]
array.max #=> "zzzzzzzz"
array.max_by { |x| x.length } #=> "albatross"
This does the exact same work as .max
, you just only have to write .length
once, so that's a game changer. Basically, remember that with .max
you are always comparing two things, a
and b
, so you have to specify the attribute for both, whereas .max_by
includes that comparison within the method and assumes you are comparing the same attribute.
And we're not even done yet! You can make this already short code even shorter with Ruby's Proc class. The &
tells Ruby we are using a Proc
to "encapsulate" the length attribute:
array.max_by(&:length) #=> "albatross"
Final note for potentially unique .max
use
The only scenario I have been able to imagine in which you would need .max
and not be able to use .max_by
is one where you, for some reason, need to compare different attributes.
arr1 = [1,2,0,0] #length > sum
arr2 = [4,4,4] #sum > length
array = [arr1, arr2] #=> [[1, 2, 0, 0], [4, 4, 4]]
array.max #=> [4, 4, 4]
array.max {|a,b| a.length <=> b.sum} #=> [1, 2, 0, 0]
At first, I thought of this example like asking, "For each array in this array of arrays, which array's length is greater than, equal to, or less than its sum?" and the result that gives us '1' for 'greater than' is returned. But! I don't think it's that simple...
[4, 4, 4].length <=> [1, 2, 0, 0].sum #=> 0
Without .max
, and using only <=>
, the idea generally works. We are comparing two different attributes on two different arrays. But how does this work for sorting?
This example is a bit number heavy but demonstrates this unusual sorting process:
a = [1, 2, 3] # length 3, sum 6
b = [1, 0, 0] # length 3, sum 1
c = [1, 1, 1, 1, 1, 1] # length 6, sum 6
d = [0, 0, 0, 0, 0, 0, 0] # length 7, sum 0
e = [4, 4] # length 2, sum 8
nums = [a,b,c,d,e]
nums.sort! {|a,b| a.length <=> b.sum}
#=> [[0, 0, 0, 0, 0, 0, 0], [1, 0, 0], [1, 2, 3], [1, 1, 1, 1, 1, 1], [4, 4]]
This process goes through each array in an array of arrays and asks "Is the length of the array we call 'a' greater than, less than, or equal to the sum of array 'b'?" So 'e' is last because its sum, 8, is greater than any of the other lengths. 'c' and 'a' have equal sums but 'c' has the greater length and so has higher rank. Vice versa for 'b' and 'a' - they have equal lengths and so higher sum gets higher rank. 'd' has the longest length but lowest sum. I can also see this example as simple sorting by sum increasing, and only sorting by length if two sums are equal.
Good news is, it's probably pretty rare that you'd ever need to use .max
like this. But, it is an interesting example to gain a better understanding of the <=>
behind the magic. Let me know if you have a better explanation for this one!