DAY 18 - Longest Common Subsequence

Nayan Pahuja - Jun 21 '23 - - Dev Community

Hello to anyone reading this. I am still relatively new to coding and this platform as well, so please feel free to drop any corrections/advice for me.

I am trying to do the 100 Day-Challenge of consistent coding and this is Day-18.

Problem: Longest Common Subsequence Question Link

Given two strings text1 and text2, return the length of their longest common subsequence. If there is no common subsequence, return 0.

A subsequence of a string is a new string generated from the original string with some characters (can be none) deleted without changing the relative order of the remaining characters.

For example, "ace" is a subsequence of "abcde".
A common subsequence of two strings is a subsequence that is common to both strings.

Examples:

Input: text1 = "abcde", text2 = "ace"
Output: 3

Explanation: The longest common subsequence is "ace" and its length is 3.


Input: s = "bbbbb"
Output: 1
Explanation: The answer is "b", with the length of 1.


Input: text1 = "abc", text2 = "def"
Output: 0
Explanation: There is no such common subsequence, so the result is 0.

Intuition:

Let's start by understanding what a subsequence is first and then move to the question.

A subsequence is a string of characters from a parent string where some characters(or none at all) are deleted but their relative position order is the same.
A string of length N has 2^N subsequences.
The question demands us to find the longest common subsequence given two strings. The question itself isn't hard to understand.

Let's write down our observations:
We will need at least two indexes two traverse and compare both strings.
The simplest approach would be to find all the subsequences and find the max common subsequence between both.
This would take us to a time complexity of 2^N + 2^M as we will have to calculate both their subsequences.

We can lower down the time complexity using DP here by storing as we traverse the max LCS we have found until now and add from there.
Breaking this into parts of longest yet and then adding one by one.

Let's write down the approach with memoization.

Approach: Memoization + Recursion

  • Base case establishment.

If any of the indexes go out of bounds we don't need to go further.
We check we have already solved for the sub part in our dp.

  • If(S[ind1] == t[ind2])

In this case this common element will represent a single character common subsequence, so we can say that we have found one character and we can shrink both the strings by 1 to find the longest common subsequence in the remaining pair of strings.

  • If its not equal we go into two options.

We decrease both the indexes by 1 but individually to calculate all the cases instead of decreasing both by 1 at same time.
If we decrease both at same time we might miss a case where there a index position might have benefited us in the end.

Code:

#include <bits/stdc++.h>
int helper(int ind1, int ind2,string text1, string text2, vector<vector<int>>& dp){
        //base case
        if(ind1 < 0 || ind2 < 0)
        {
            return 0;
        }
        if(dp[ind1][ind2] != -1)
        {
            return dp[ind1][ind2];
        }
        if(text1[ind1] == text2[ind2]){
          return  dp[ind1][ind2] =  1 + helper(ind1-1, ind2-1, text1, text2,dp);
        }
    return dp[ind1][ind2] = max(helper(ind1, ind2-1, text1, text2,dp),helper(ind1-1, ind2, text1, text2,dp));    
    }
    int longestCommonSubsequence(string text1, string text2) {
        int ind1 = text1.size();
        int ind2 = text2.size();
        vector<vector<int>> dp(ind1, vector<int>(ind2,-1));
        return helper(ind1-1, ind2-1, text1, text2,dp);

    }

int lcs(string s, string t)
{
        int ind1 = s.size();
        int ind2 = t.size();
        vector<vector<int>> dp(ind1, vector<int>(ind2,-1));
        return helper(ind1-1,ind2-1,s,t,dp);
}
Enter fullscreen mode Exit fullscreen mode

Complexity Analysis:

Time Complexity: O(N*M)
Space Complexity: O(N*M) + O(N+M) where N and M are sizes of strings. Extra O(N+M) recursive stack space.

Approach: Tabulation

To convert the memoization approach to a tabulation one, create a dp array with the same size as done in memoization.

Since we went from the end of our strings, we will reverse our approach from to top-down to bottom-up.

In the recursive logic, we set the base case to if(ind1<0 || ind2<0) but we can’t set the dp array’s index to -1. Therefore we shift indexes by 1 positively.

Simply saying i = i+1 and j = j+1

Rest will remain the same.
We will simply initialize the dp array with 0 instead of -1 as we are not doing recursion calls here.

Code:

class Solution {
public:
    int longestCommonSubsequence(string text1, string text2) {
        int ind1 = text1.size();
        int ind2 = text2.size();
        vector<vector<int>> dp(ind1+1, vector<int>(ind2+1,0));

        for(int i = 1; i <=ind1; i++){
            for(int d = 1; d <=ind2; d++){
                if(text1[i-1] == text2[d-1])
                dp[i][d] = 1 + dp[i-1][d-1];
                else
                dp[i][d] = max(dp[i-1][d], dp[i][d-1]);
                }
            }
        return dp[ind1][ind2];
        }
    };
Enter fullscreen mode Exit fullscreen mode

Complexity Analysis:

Time Complexity: O(N*M)
Space Complexity: O(N*M) where N and M are sizes of strings.

There can still be space optimization done here. We can reduce the space into simply two arrays as we only require the previous row's value at each call not the full dp arrays value.

Thanks for reading. Please feel free to leave any feedback or appreciative comments.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .