Programming with Passion Algorithms, Mobile Development, Robotics

Sunday, May 21, 2017

LeetCode (Algorithms#10) ‒ Regular Expression Matching

Problem

Implement regular expression matching with support for '.' and '*'.

'.' Matches any single character.
'*' Matches zero or more of the preceding element.

The matching should cover the entire input string (not partial).

The function prototype should be:
bool isMatch(const char *s, const char *p)

Some examples:
isMatch("aa","a") → false
isMatch("aa","aa") → true
isMatch("aaa","aa") → false
isMatch("aa", "a*") → true
isMatch("aa", ".*") → true
isMatch("ab", ".*") → true
isMatch("aab", "c*a*b") → true

Solution

The problem may seem simple at first, but becomes complicated when considering some cases involving *. For instance isMatch("aaa","a*a"), which requires you to match a* with varying numbers of a to find the right match. This effectively eliminates any linear solution as we cannot simply iterate through the strings once.

The solution that stood out most to me was the recursive one. Recursion allows us to check each case of matching, and then backtrack to the last successful match to try a different branch. The primary cases of matching are to match a single character (s[i]==p[j] or p[j]='.'), or 0+ characters when a ".*" or "a*" is present. These evaluate into the two cases of recursive calls in the code below. Although it is slow as the recursion tree quickly branches out, and so a better solution can come from Dynamic Programming as we will see.

Time complexity: O(2^n)
Space complexity: O(2^n)
Runtime: 59 ms (28.61%)

public boolean isMatch(String s, String p) {
    return helper(s,p,0,0);
}
boolean helper(String s, String p, int i, int j) {
    if (j>=p.length()) return i>=s.length();
    if (j+1<p.length() && p.charAt(j+1)=='*') { // .* or c*
        return helper(s,p,i,j+2) || i<s.length() 
            && ((s.charAt(i)==p.charAt(j) || p.charAt(j)=='.') && helper(s,p,i+1,j));
    } else { // single character match
        return i<s.length() && (s.charAt(i)==p.charAt(j) || p.charAt(j)=='.')
            && helper(s,p,i+1,j+1);
    }
}

The Dynamic Programming solution follows the standard formula for DP. We formulate a 2D array and fill in values for where a character i in the string matches a character j in the pattern while considering previously computed cells. The solution then simply lies in the value of the last cell. The code below should make sense upon realizing the cases commented at the top. Note that in line 15, the first row of the array is filled with true for sequences of characters containing '*'. This allows us to match the string with "a*" zero times. Also note the significant improvements in complexity and runtime.

Time complexity: O(n^2)
Space complexity: O(n^2)
Runtime: 27 ms (97.46%)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
public boolean isMatch(String s_, String p_) {
    /**
    * DP[i][j] truth cases:
    *   1) s[i]==p[j] OR p[j]=='.' AND DP[row-1][col-1]
    *   2) p[j]=='*'
    *       a) s[i]!=p[j], then the character preceeding p[j] is counted 0 times
    *       b) s[i]==p[j] OR p[j]=='.', then we must consider counting it any number
    *           of times.
    **/
    char[] s = s_.toCharArray(), p = p_.toCharArray();
    int n=s_.length(), m=p_.length();
    boolean[][] DP = new boolean[n+1][m+1];
    
    DP[0][0]=true;
    for (int col=1; col<=m; col++) DP[0][col] = p[col-1]=='*' && col>1 && DP[0][col-2];

    for (int row=1; row<=n; row++) {
        for (int col=1; col<=m; col++) {
            if (p[col-1]=='*') {
                if (isCharMatch(s[row-1],p[col-2])) {
                    DP[row][col] = DP[row-1][col] || DP[row][col-1] || DP[row][col-2];
                } else {
                    DP[row][col] = DP[row][col-2];
                }
            } else {
                DP[row][col] = isCharMatch(s[row-1],p[col-1]) && DP[row-1][col-1];
            }
        }
    }
    return DP[n][m];
}

public boolean isCharMatch(char c, char pc) {
    return c==pc || pc=='.';
}

No comments:

Post a Comment