The Data Engineering Interview Study Guide

SeattleDataGuy - Apr 22 '21 - - Dev Community

Photo by Green Chameleon on Unsplash

Interviewing for any technical position generally requires preparing, studying, and long, all-day interviews.

Data engineering interviews, like other technical interviews, require plenty of preparation. There are a number of subjects that need to be covered in order to ensure you are ready for back-to-back questions.

Some positions require Hadoop, others SQL. Some roles require understanding statistics, while still others require heavy amounts of system design.

We have gathered many of the resources that we have used to study and get jobs at companies in the FAANG family as well as other major tech companies. We have yet to find one that requires you to know anything about Hadoop during the interview, so that has not been included in this study guide.

We recommend asking the recruiter if you aren't sure which type of interview you will be facing. Some companies are very good at keeping interviews consistent, but even then, teams can deviate depending on what they are looking for. Here are some examples of what we have noticed about some companies' data engineering interviews.

Amazon --- SQL- and database-design heavy as well as general ETL design. Surprisingly, no Python.

Netflix ---SQL- and code-heavy, with the expectation that you can not only write SQL and code but can optimize them.

"They asked about SQL queries to find time difference between two events given certain condition. " --- Netflix data engineer on Glassdoor

Expedia --- Big Data questions, like what is Spark and RDDs, as well as SQL and Python.

Due to this variance, we've created a checklist to keep track of what subject areas you have already studied and what you still need to cover: data engineering study checklist.

Also, I recently created a video guide to walk through the data engineering interview study guide.

Let's get started with SQL.


SQL

As a data engineer, it is almost inevitable that you will get some SQL questions. As someone who has participated in many interviews for a lot of top tech companies, like Amazon and Capital One, I know that they usually follow some similar patterns.

Typically there will be at least one question that requires an aggregation with a filter, another that requires a few joins, and then one that requires a subquery. Along with that, there might be a few curveball questions that require self-joins, recursions, and analytic functions. So let's look at a couple of concepts that are good to cover.

Pre-video SQL problems

These first few problems will help you gauge where you are on different concepts. That way you can take notes on the study guide, and go back and review what you feel you were not comfortable with.

  1. 262. Trips and Users
  2. 601. Human Traffic of Stadium
  3. 185. Department Top Three Salaries
  4. 197. Rising Temperature
  5. 626. Exchange Seats
  6. The Report
  7. 177. Nth Highest Salary
  8. Symmetric Pairs
  9. Occupations
  10. Ollivander's Inventory
  11. Placements

Videos

Learning about ROW_NUMBER and Analytic Functions

Advanced Implementation Of Analytic Functions Running Total

Advanced Implementation Of Analytic Functions Median

Wise Owl SQL Videos

Post-video SQL problems

Once you have finished watching the SQL videos above, consider trying the new problems below. Try to see if you feel like you are improving. Again, note down any specific topics you feel weak on.

  1. Binary Tree Nodes
  2. 595. Big Countries
  3. 626. Exchange Seats
  4. Weather Observation Station 18
  5. Challenges
  6. Print Prime Numbers
  7. SQL Interview Questions: 3 Tech Screening Exercises (For Data Analysts)

Join Our Newsletter

Before scrolling any further, why not join our team's newsletter to keep up to date on data science, data engineering and tech! Learn more here.


Databases, ETL, and Data Warehouses

Source: Stackoverflow

For database, ETL, and data warehouse design questions, we have gathered some books and videos we hope will help you out when it comes to explaining your design in an interview. In addition, we have listed a few plausible database/DW concepts you could attempt to design out on your own.

We recommend going through the videos and at least skimming the Data Warehouse Toolkit before attempting the self-practice problems.

The Data Warehouse Toolkitby Ralph Kimball

Designing a Traditional Relational Database Video

Data Warehouse Design Video

Self-practice problems

For this part of your interview practice, we are going to list a few business systems that you can try to design out. First, we recommend designing a relational database, then thinking about how you would design an ETL and DW that rely on that relational DB.

Note: In addition, we have found it very common that interviewers will base their interview questions on your design. So think about some of the questions you could answer with your DB and list them.

Design a database/ETL and DW for a:

  • dating app
  • bicycle rental service
  • music streaming app
  • job search website
  • Udemy-like website

These are just a few ideas. We hope they help you have a clearer idea of what you can practice modeling and designing. Take some time to think about how users interact with these websites before getting started.


Programming Problems

Data engineers do a significant amount of programming in their daily life. There are several specific languages data engineers use. Python is arguably the most common.

If the role requires a lot of Hadoop work, then Java is also a useful language to have. There are a few other useful languages, like Java and PowerShell (if you work at a Microsoft shop).

There are two types of questions we have experienced. Some interviewers will ask you more operational questions. Others will ask classic algorithm and data structure questions.


Operational Programming Problems

Operational interview questions are harder to prep for. There are no "classic" interview questions here. However, they are also often easier to figure out on the spot. Algorithm interview questions usually have some sort of trick. Like the balanced brackets problem: If you don't know you need to use queues, it will be very difficult to get to the correct answer.

Operational problems, however, will be more focused on workflows and business processes. So as long as you are good at walking through real problems, this should be easier. Here are some problems that are great for prepping. We find it is helpful to know how to use arrays and dictionaries. Beyond that, there isn't too much more required.

  1. Kangaroo problem
  2. Breaking records
  3. Find a string
  4. itertools.permutations()
  5. No idea!
  6. Days of the programmer
  7. Leaderboard
  8. Word order
  9. Sherlock and squares
  10. Equalize the array
  11. Apples and oranges
  12. More operational style questions

Algorithms and Data Structures

abstract image of data on a screen

Photo by Markus Spiske on Unsplash

Before going too deep into data structure and algorithms, let's do a quick check to see how you are currently doing in this area. We have listed eight LeetCode problems that vary in difficulty. Try these out and try to gauge yourself on how long it takes you, as well as how many hints you needed. If you are following along with the study guide, then note this down. At the end of this list are a few more questions. So once you have watched all the videos, consider doing those problems, and see if you feel like you are improving!

Pre-study problems

  1. 985. Sum of even numbers after queries
  2. 657. Robot return to origin
  3. 961. N-repeated element in size 2N array
  4. 110. Balanced binary tree
  5. 3. Longest substring without repeating characters
  6. 19. Remove Nth node from end of list
  7. 23. Merge k sorted lists
  8. 31. Next permutation

Now that you have gone through these eight questions and shaken off the rust, let's start reviewing these concepts.

Data structures

  1. Data Structures & Algorithms #1 --- What Are Data Structures?
  2. Data Structures: Linked Lists
  3. Data Structures: Trees
  4. Data Structures: Heaps
  5. Data Structures: Hash Tables
  6. Data Structures: Stacks and Queues
  7. Data Structures: Crash Course Computer Science #14
  8. Data Structures: Tries

Algorithms

  1. Python Algorithms for Interviews
  2. Algorithms: Graph Search, DFS, and BFS
  3. Algorithms: Binary Search
  4. Algorithms: Recursion
  5. Algorithms: Bubble Sort
  6. Algorithms: Merge Sort
  7. Algorithms: Quicksort

Big O notation

Introduction to Big O Notation and Time Complexity (Data Structures & Algorithms #7)

Some interview walk-throughs

Amazon Coding Interview Question --- Recursive Staircase Problem

Google Coding Interview --- Universal Value Tree Problem

Google Coding Interview Question and Answer #1: First Recurring Character

Post-video problems

Once you have finished the videos above, consider trying the algorithm and data structure problems below. Make sure you keep track of how comfortable you felt when working on the problems.

  1. Bigger is greater
  2. 6. Zigzag conversion
  3. 7. Reverse integer
  4. 40. Combination sum II
  5. 43. Multiply strings
  6. Larry's array
  7. Short palindrome
  8. 65. Valid number

If you still feel like you need help, then consider taking a course on algorithms and data structures.


Big Data Frameworks

Back in 2020, I made a video about practicing for a data engineering interview. Funny thing was, a person commented about the video and pointed back to my original data engineering study guide. Just by happenstance!

Small world.

They also added another section. In this case, they added Spark. So for those of you out there needing to study for Spark, here is what Paul Russel added to the checklist. What would you add?

Architecture Overview & Use Cases

Spark by Examples (tutorial documentation)

PySpark Syntax Cheat Sheet


Conclusion

We do hope this list will help you prepare for your next data engineering interview. Please let us know if you have any questions or need any future help.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .