Earlier this week, I was trying to figure out how to export authentication data from Firebase to another database. As it turns out, Firebase provides a command line utility that allows users to export authentication data from Firebase. As I was reading through the documentation for this tool, I came across a description of the hashing algorithm that Firebase uses to secure passwords: scrypt. I thought it would be interesting to dive into the codebase for this encryption technique. As it turns out, although it was invented in 2009, it has gained a resurgence of source recently due to its use in cryptocurrencies.
Usually, I would dive in straight to the code base and start reading the code for a particular open source project. However, I’ve never read the code for a cryptography related codebase. That being said, I figured that it is best I do some reading on how scrypt works so I know what to look out for when reading the codebase.
I started browsing through the homepage to see what I could pick out. As I read through the webpage, I made a list of words that I didn’t completely understand and researched them further to determine some good definitions for them. It ended up only being two words. Here they are.
- “key derivation function”
- “hardware brute-force attacks”
I have a rough sense of what a “key derivation function” is based on the name. It’s a function that derives a secret key based on some internal algorithm. As it turns out, this is a pretty good definition of KDFs. I guess my imposter syndrome kicked in and I felt like I needed to know more here but I think this definition will suffice for now.
I know what a “brute-force attack” is, but what is a “hardware brute-force attack.” As it turns out, a “hardware brute-force attack” is essentially a brute-force attack that relies heavily on high-powered hardware, like GPUs for its functionality. GPUs are built with parallelism in mind, which means that GPUs can compute more in less time, which means they can execute brute-force attacks more effectively.
The next thing I wanted to look into the original research paper published alongside the release of scrypt. The paper is titled “Stronger Key Derivation via Sequential Memory-Hard Functions.” Ooooh boy. That’s a fun one. It’s not actually that bad in the scope of security research paper titles. The first thing I wanted to determine was what “memory-hard” meant. I had a pretty good idea, but I wanted to be sure. As it turns out, my hunch was correct. A memory-hard function is a function that utilizes an extensive amount of memory. They are used often in cryptography because in addition to something requiring a large amount of compute in order to be solved, having it require a large amount of memory will make it more resilient. The research paper linked above describes memory-hard functions with further detail.
A memory-hard algorithm is thus an algorithm which asymptotically uses almost as many memory locations as it uses operations5; it can also be thought of as an algorithm which comes close to using the most memory possible for a given number of operations…
The research paper highlights the HEKS key derivation algorithm, which was designed to utilize arbitrarily large amounts of memory. IMO, the paper didn’t do a really good job of explaining the algorithm so I went and read more about it at this webpage.
After scrolling through the research paper, I figured that it would be a good idea for me to write up a “plan of attack” for how I plan to read through the paper and what I’m hoping to get out of it. Specifically, I’d like to figure out.
- The key parameters that the scrypt KDF relies on.
- How the parameters affect the algorithms operations.
- What the algorithm does with each parameter.
I’ll reread the research paper and do more research on the scrypt KDF to answer the questions above in the next blog post.