Book Club: Grokking Algorithms. 7: Where to go next and Final Discussion

ruthmoog - Oct 12 '23 - - Dev Community

Grokking Algorithms An illustrated guide for programmers and other curious people, by Aditya Y. Bhargava


Chapter 11: Where to go next

If you read this chapter, please go ahead and discuss what you thought in the comments!

📝 This is the last chapter in the book, and is a summary of 10 algorithms for further optional reading. You can read my notes below.


Where to go Next

key topics covered: suggestions, binary search tree, The Fourier Transform, Parallel algorithms, MapReduce, Bloom Filters, HyperLogLog, SHA, Diffie-Hellman, key exchange, locality-sensitive, hashing, linear programming

Binary Search Tree

Good for learning about databases and data structures

A binary search tree is faster for insertions and deletions than a sorted array in average, because a binary search tree allows you to find or insert in a more granular but sorted way. They don't allow random access, and the good performance times are based on a balanced tree (too many nodes on one side will be imbalanced, and performance will be affected - red-black trees are self-balancing so that might be a good option).

What is and isn't a binary search tree

copyright Manning Publications, drawn by adit.io

Inverted Indexes

Good for learning about search

This data structur is a hash that maps words to places where they appear, and is commonly used in search engines.

The Fourier Transform

Good for compression and formats like MP3 or JPG, and music tech

This algorithm can take a digital music signal & break it into individual frequencies so that the file can be reduced (like in MP3) or frequecies can be boosted or reduced in music production. It also works for images in the same way, and can be used in applications like earthquake prediction and DNA analysis, with those digital signals.

Parallel algorithms

Good for speed optimisation

These allow you to spread work out in parallel so that it can be completed sooner. But they're hard to design, test, and predict the cost benefit of as the time gains aren't linear.

MapReduce

Good for speed and load balancing

This is a popular distributed algorithm (a subtype of parralel algorithm). Distributed algos are useful when you have a lot of work to do as they will cut out a lot of time to do it by spreading work across multiple machines. an example is Apache Hadoop.

Bloom filters

Good for quick approximations

These are probabilistic data structures (they give you a likely correct answer but it could be a false positive). This is an alternative to a hash (which will give you the correct answer) but for a large hash index might not be practical, and a bloom filter approximation might be good enough.

HyperLogLog

Good for quick approximations

A similar probabilistic algorithm that approximates the number of unique elements in a set.

SHA algorithms

Good for unbiased comparison
This is a secure hash algorithm (a SHA is a hash function, which generates a string in this context called a hash). Unlike has functions which goes from (e.g.) string to array index, SHA goes from string to string. You can use the SHA to tell if objects, files or strings (like passwords) are the same without revealing the original content. [note: this is not the gold-standard for password-hashing]

Localilty-sensitive hashing

Good for biased comparison

SHA is locality insensitive - similar starting strings will not have similar hashes. A locality-sensitive hash such as Simhash will generate a string that is similarly represented so that you can compare two hashes for similarity of content - e.g. for plagiarism checks.

Diffie-Hellman key exchange

Good for cryptography

The Diffie-Hellman algorithm is used for message encryption. It has a public key for sharing which is used to encrypt a message, and a private key which can decode the message and is 'owned' by one person. It's hard to decode without the private key and the cipher does not need to be communicated so it is still used in practise alongside RSA.

Linear programming

Good for optimization
Graph algorithms in the book are a subset of linear programming, which uses the complex Simplex algorithm.


Discussion for this chapter

  1. Are there any algorithms you're going to investigate that aren't recommended in this list or in the book?
  2. Have you used any of these 10 algorithms? Or, did you consider any of these but opted for a different algorithm instead?

Discussion for the entire book

Please visit the discussion questions post to reflect on the book, and add your answers to the comments along wit any of your own questions or reflections you have.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .