Intro
If you're into Cryptography For Beginners, you're in the right place. Maybe you're just getting into Rails and want to add a user login/logout function. Maybe you're just really into cryptographic hashing algorithms. I sure am! If you're really a beginner, a little website called Khan Academy has some especially great videos on cryptography.
This isn't really a tutorial, but when using BCrypt, always remember to uncomment the gem in your Rails Gemfile.
source 'https://rubygems.org'
git_source(:github) { |repo| "https://github.com/#{repo}.git" }
ruby '2.6.1'
...
# Use Active Model has_secure_password
# gem 'bcrypt', '~> 3.1.7'
gem 'bcrypt', '~> 3.1.7'
...
Now, you can do cool things like this:
# Encrypts password
my_password = BCrypt::Password.create('iLOVEdogs123')
=> "$2a$10$kypbnGGCpJ7UQlysnqzJG.6H.dUewn7UPVWA3Ip.E.8U4jlVnFNnu"
# Tests if input matches
my_password == 'iLOVEdogs123'
=> true
my_password == 'ilovedogs12'
=> false
# Not super important for this post, but this is why the above crypt begins with "$2a$"
my_password.version
=> "2a"
# "Cost" factor - how quickly the password is encrypted
my_password.cost
=> 10
Now there are a lot of things you can do with your app using BCrypt - authentication, authorization, login, logout, etc.! But there are already a lot of blog posts and helpful docs on how to code that. I am more interested in the behind the scenes action.
History
BCrypt is a hashing algorithm that was designed by Niels Provos and David Mazières of the OpenBSD Project in 1999. The B stands for...
Blowfish!
Blowfish is a symmetric-key block cipher, designed by Bruce Schneier in 1993. Now there's a more modern version called Twofish, but we don't care about that here!
It's pretty amazing that their original proposal over 20 years ago was titled "A Future-Adaptable Password Scheme" because it seems to have truly endured the test of time.
They envisioned an algorithm with computational cost that would increase as hardware improved. Or, as computers get faster and better able to guess passwords, encryption should be slower, or have more "cost." Notice in the following line:
pw = BCrypt::Password.create('password123')
=> "$2a$10$kypbnGGCpJ7UQlysnqzJG.6H.dUewn7UPVWA3Ip.E.8U4jlVnFNnu"
pw.cost
=> 10
cost
is 10. If cost increases, speed decreases, but the speed with which a hacker can guess your passwords also decreases. For example, an attacker using Ruby could check ~140,000 passwords a second with MD5 but only ~450 passwords a second with bcrypt
. BCrypt allows you to configure cost depending on how important the speed/security tradeoff is to you. Here's a nice video that shows some examples of different cost factors.
General Hash Function Background
In general, a hash algorithm or function takes data (i.e., the password) and maps to "fixed-size values," or creates a "digital fingerprint," or hash, of it. This hash is not exactly the same as the Ruby class, but they are similar. A hashing algorithm is like a key-value pair of passwords and their encryptions, but you wouldn't want to store or save them like that! The process is never truly "reversible," in the sense that if I hashed a list of passwords, and all you had was a list of unique crypts, the only way you could "hack" my passwords would be through something like brute force search. But you could never take a hashed value and return it to its original form!
This is why modular arithmetic and the XOR
gate/operator are so fundamental to cryptography and understanding the algorithm behind the BCrypt magic. XOR
stands for "exclusive or," and it is a logical relationship between A and B where one, and only one, must be true. So A XOR B
returns true if A is true or B is true, but not both. The XOR
gate also returns true or 1 if there is an odd number of 'true' inputs, and so is also thought of as addition mod 2
. To me, modular arithmetic is the clearest way to think about why all modern forms of cryptography, hashing, encryption, etc., are "irreversible" - 12 mod 7 = 5
and 40 mod 7 = 5
but even if you know the output is 5, and the algorithm is mod 7
, you mathematically cannot determine the original number in any way other than essentially guessing.
These are all pretty fundamental, basic ideas behind BCrypt, which is clearly way more complicated. But it's interesting to me how BCrypt has mostly built upon these ideas in a way that others haven't.
For example, some other common "general purpose" hash functions, MD5, SHA1, SHA2, SHA3 are fast, but insecure. A modern server can calculate the MD5 hash of about 330MB every second. For lowercase, alphanumeric, 6 character passwords, you can try every single option in about 40 seconds.
So how does BCrypt use all of this?
BCrypt isn't really saving or storing these hashes. It's actually hashing or encrypting everything entered, and comparing the hashes. If I save my password as 'password123' and enter 'password' on the login page of my Rails app, BCrypt hashes both of those strings and compares them for authentication. For example, when I save my password as 'password123,' BCrypt does the following:
pw = BCrypt::Password.create('password123')
=> "$2a$10$kypbnGGCpJ7UQlysnqzJG.6H.dUewn7UPVWA3Ip.E.8U4jlVnFNnu"
and if I entered an incorrect password, such as just 'password,' it is calculating a separate hash and comparing them.
not_pw = BCrypt::Password.create('password')
=> "$2a$10$vx6htugaV4KRG2ucXc8iHOo/Ch4FRfM7aa6Tpc79j9ecPo9U6APsu"
Which finally allows BCrypt to compare:
pw == 'password123'
=> true
pw == 'password'
=> false
You could never take either of those long crypts starting with $2a$10$
and go backwards. It might seem obvious, but it's cool to really think - it's not just that you can't go backwards because you're a human and not a computer. No one, no computer, not even an omnipotent/omniscient being could go backwards!! It would be truly impossible for BCrypt to even have a function or method that would take a hashed password and somehow reverse it and return the original password. All it can do is compare hashed values.
But good news for hackers, bad news for you - hackers don't need to be omniscient or mathematically rigorous here. In fact, the brute force solution will, theoretically, always work. If you think about a world where we had unlimited time and resources - probably anything would be hackable. So if you're ever feeling existential and sad about your own mortality, at least know that it means you can't be hacked as easily.
Rainbow Tables
But there's still a lot that hackers can do to use their time more wisely than a simple brute force attack. A rainbow table attack uses a list of possible passwords hashed with the same algorithm to compare. So you could theoretically compose a list of common passwords, hash them with an algorithm like BCrypt, and compare those hashes to the actual list of password hashes. The solution to this - salts! Kind of.
Sneaky Salt!
There are a lot of puns we can make here about salts, but all jokes aside, a salt is simply "random data that is used as an additional input to a one-way hash function." Salts add yet another layer of randomness to our already pretty random crypt. So, according to the bcrypt
Ruby gem docs, we have something like this:
hash(password) #=> <unique gibberish>
hash(salt + password) #=> <really unique gibberish>
There is also the concept of a pepper, which is again fun for pun purposes, but not explicitly used in BCrypt.
But! Salts on their own are not enough. They help prevent rainbow table attacks, but salts will not necessarily prevent dictionary or brute force attacks. Attackers can throw a list of potential passwords at each individual password:
hash(salt + "aadvark") =? <really unique gibberish>
hash(salt + "abacus") =? <really unique gibberish>
So the really revolutionary aspect of BCrypt is just that it's slow! Hash algorithms aren't usually designed to be slow, they're designed to turn a lot of data into secure fingerprints as quickly as possible. Furthermore, BCrypt is so impressive because it is truly adaptable - the work factor feature allows you to determine how 'expensive,' 'costly,' or 'slow' the hashing algorithm will be. So as hardware gets faster, you can make BCrypt slower, and keep up with Moore's Law!
In my earlier example, the default work factor or cost was set to 10:
pw = BCrypt::Password.create('password123')
=> "$2a$10$kypbnGGCpJ7UQlysnqzJG.6H.dUewn7UPVWA3Ip.E.8U4jlVnFNnu"
pw.cost
=> 10
Using a work factor of 12, BCrypt hashes the password yaaa
in about 0.3 seconds. MD5 takes less than a microsecond. This might seem like nothing, but think of it as cracking a password every 40 seconds vs. every 12 years. You might not need that kind of security, and you might need a faster algorithm - which is why it's great that BCrypt allows you to choose your balance of speed and security.
A Final Tutorial Note
If you ever need it - here's how the docs say to change the default cost factor, using either BCrypt::Engine.cost = new_value
or specifying :cost
as an additional argument in creation.
Option 1:
BCrypt::Engine.cost = 8
BCrypt::Password.create('password').cost
#=> 8
Option 2:
BCrypt::Password.create('password', :cost => 6).cost #=> 6
Conclusion
BCrypt is cool and safe. Unless you have very advanced and specific speed-security balance needs, learn to love it!