DAY 21 - Edit Distance

Nayan Pahuja - Jun 24 '23 - - Dev Community

Hey Guys! Nayan here and we are on the day 21st day of 100-DAYS Challenge. Let's get going with the problem.

DAY 21 Problem : Edit Distance

Given two strings word1 and word2, return the minimum number of operations required to convert word1 to word2.

You have the following three operations permitted on a word:

  • Insert a character
  • Delete a character
  • Replace a character

Examples:

Input: word1 = "horse", word2 = "ros"
Output: 3
Explanation:
horse -> rorse (replace 'h' with 'r')
rorse -> rose (remove 'r')
rose -> ros (remove 'e')'


Input: word1 = "intention", word2 = "execution"
Output: 5
Explanation:
intention -> inention (remove 't')
inention -> enention (replace 'i' with 'e')
enention -> exention (replace 'n' with 'x')
exention -> exection (replace 'n' with 'c')
exection -> execution (insert 'u')

Intuition: Recursion

The question is pretty straightforward. It simply demands us that we need to convert string 1 to string 2 using minimal operations.
Minimization.

By observation at max none of the characters in either string will be same and we will have to replace every character or delete and insert.
Hence we will require at max N + M operations where N and M are sizes of strings.

One thing we can do is go through the naive approach and optimize it along the way.

We have three options here. Let's go through all possible cases where I use or don't use them on each character.

Approach:

  • Establishment of base cases: Here we are starting from the last index. So let's say we are traversing backward in both strings using iterators i and j.
  • If at some moment we go out of bounds for i, we will need j+1 operations to convert the current string into our result.
  • This is because our j >= 0 and we still have to convert the string, so we will require remaining length of the string + 1 operations because it's 0 indexed.
  • Similarly if we go out of bounds for the string 2 while traversion, we need i+1 steps more.

  • Recursion Relation Establishment: We have three steps, insert, delete, replace.

  • If our character's match, we don't need to do anything we simply move forward and decrement the iterators.

  • If it doesn't match, we use one of the three options and move ahead.

  • Call for return value for min of the value formed by this relation.

  • This will take exponential complexity so we will perform memoization on it.

  • We will declare a dp array of the same size and initialize it using -1.

  • Then we will store result at every recursive call into the dp array.

Code:

int util(string& word1, string& word2, int i, int j, vector<vector<int>>& dp){
    //base cases
    if(i == 0)
        return j;
    if(j == 0)
        return i;

    if(dp[i][j]!=-1) return dp[i][j];

    if(word1[i-1]==word2[j-1])
        return dp[i][j] =  0+util(word1,word2,i-1,j-1,dp);

    // Minimum of three choices
    else return dp[i][j] = 1+min(util(word1,word2,i-1,j-1,dp),
    min(util(word1,word2,i-1,j,dp),util(word1,word2,i,j-1,dp)));

}
    int minDistance(string word1, string word2) {
        int n = word1.size();
        int m = word2.size();

        vector<vector<int>> dp(n+1,vector<int>(m+1,-1));

Enter fullscreen mode Exit fullscreen mode

Complexity Analysis:

Time Complexity: O(N*M)
Space Complexity: O(N*M) + O(N+M)where N and M are sizes of strings. Extra space due to recursion stack.

Tabulation:

We can rewrite the code to eliminate the extra space complexity using tabulation method. We will go bottom up in this approach and start from the 0th index.
To perform tabulation, we need only 2 things.
We need the base case. We need the recursive relation.
We already have the second one.
If we look closely it just says that for every i from 0 to n dp[i] can take any value.
So we will do just that for our base case.

Code:

int minDistance(string word1, string word2) {
        int n = word1.size();
        int m = word2.size();

        vector<vector<int>> dp(n+1,vector<int>(m+1,0));

    //tabulation
    for(int i = 0; i <= n; i++){
        dp[i][0] = i;
    }
    for(int j = 0; j<=m; j++){
        dp[0][j] = j;
    }
        for(int i = 1; i <= n; i++){
            for(int j = 1; j <=m; j++){
                if(word1[i-1] == word2[j-1]){
                    dp[i][j] = dp[i-1][j-1];
                }
                else
                dp[i][j] = 1 + min(dp[i-1][j-1],min(dp[i-1][j], dp[i][j-1]));

            }
        }
        return dp[n][m];
        }
Enter fullscreen mode Exit fullscreen mode

Complexity Analysis:

Time Complexity: O(N*M)
Space Complexity: O(N*M)where N and M are sizes of strings.

We can still optimize into to 2 1-D arrays. We can see that the dp value doesn't depend on the whole table but rather just on the previous row.
So we can eliminate the extra space using this approach.

Code:

ector<int> prev(m+1,0);
    vector<int> curr(m+1,0);

    for(int j=0;j<=m;j++){
        prev[j] = j;
    }

    for(int i=1;i<n+1;i++){
        curr[0]=i; // $$ The value for the coloumn starting changes.
        for(int j=1;j<m+1;j++){
            if(word1[i-1]==word2[j-1])
                curr[j] = 0+prev[j-1];

            else curr[j] = 1+min(prev[j-1],min(prev[j],cur[j-1]));
        }
        prev = curr;
    }

    return prev[m];

}
Enter fullscreen mode Exit fullscreen mode

Complexity Analysis:

Time Complexity: O(N*M)
Space Complexity: O(2*M)where N and M are sizes of strings.

Thanks for reading. Feedback is highly appreciated.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .