Rabin–Karp algorithm

I know it have been a long while that I do not update my website. Even missed the entire May…

Actually, I am confused about my future path during these two months. I got a bunch of offers from different places. However, I don’t even know where should I go ultimately.

So I just try to learn some new stuff as I can to kill the time…

Rabin-Karp is a kind of string searching algorithm which created by Richard M. Karp and Michael O. Rabin. It uses the rolling hash to find an exact match of pattern in a given text. Of course, it is also able to match for multiple patterns.

def search(pattern, text, mod):
    # Let d be the number of characters in the input set
    d = len(set(list(text)))
    
    # Length of pattern     
    l_p = len(pattern)
    
    # Length of text
    l_t = len(text)
    
    p = 0
    t = 0
    h = 1
    
    # Let us calculate the hash value of the pattern
    # hash value for pattern(p) = Σ(v * dm-1) mod 13 
    #                           = ((3 * 102) + (4 * 101) + (4 * 100)) mod 13 
    #                           = 344 mod 13 
    #                           = 6     
    for i in range(l_p - 1):
        h = (h * d) % mod

    # Calculate hash value for pattern and text
    for i in range(l_p):
        p = (d * p + ord(pattern[i])) % mod
        t = (d * t + ord(text[i])) % mod

    # Find the match
    for i in range(l_t - l_p + 1):
        if p == t:
            for j in range(l_p):
                if text[i+j] != pattern[j]:
                    break

            j += 1
            if j == l_p:
                print("Pattern is found at position: " + str(i+1))

        if i < l_t - l_p:
            t = (d*(t-ord(text[i])*h) + ord(text[i+l_p])) % mod

            if t < 0:
                t += mod


text = "ABCCCDCCDDAEFG"
pattern = "CDD"
search(pattern, text, 13)

Mitchell Approximation

A method of computer multiplication and division is proposed which uses binary logarithms. The logarithm of a binary number may be determined approximately from the number itself by simple shifting and counting. A simple add or subtract and shift operation is all that is required to multiply or divide.

#include<stdio.h>

int main() {
    float a = 12.3f;
    float b = 4.56f;
    int c = *(int*)&a + *(int*)&b - 0x3f800000;
    printf("Approximate result:%f\n", *(float*)&c);
    printf("Accurate result:%f\n", a * b);
    return 0;
}

Transformer Architecture

Recently, I found that my lovely website server got busted by Chinese National Firewall (aka. GFW), Maybe because of the ShadowSocks proxy running on this server. I tried to snapshot this server to other servers a couple of times, but the visit speed was really sick. I suppose it related with my domestic DNS binding.

Due to the epidemic outbreak, I am impressively working from home for almost three month. Although I still need to work, but you know at home, I can spend more time on my personal interestings, like machine learning, music and reading. (Hope my mentor will never find this article out)

Attached is a probably boring illustration about transformer architecture. Almost every BERT-ish paper would like to describe it with a bunch of paragraphs. Ironically, I have never carefully looked through this structure. So today I eventually decided to dive deeply into this structure in a precious weekend afternoon.

Expected Value about Normal Distribution

I could have completed this post in the last month, however I was too exhausted to write this article yesterday (the last day of Feb).

Recently, I am addicted in Probability & Statistics theorem and calculus. I found there is a lot of interesting formula derivation about Normal Distribution (aka Gaussian Distribution), so I’d like to proof it by myself. Here are several methods I tried:

The first formula is the expected value of lognormal distribution. We know, for a continuous function:

And the Probability Density Function (PDF) of lognormal distribution function is:

Hence, we have:

Because of,

Continue reading

Verily Phone Screen Interview

That is a sad story…

Just several days ago, I received an email from Verily, an Alphabet company which delicates in life science. Their HR passed my application and going to move me to the phone interview. However, I messed it up…

I have to say that interview is not a difficult one. The question is like a medium level question at Leetcode:

Given a 1-dimensional axis, a man can move left or right in each time unit. How many possibilities that the man stands on x point after t time units?

I stupidly tried DP at first and struggled in how to implement the state transform formula, that wastes a lot of time.

Today, I reviewed this question and found a fairly easy solution. We do not even need Dynamic Programming.

l + r = t   (1)
r - l = x   (2)

Once we solve this equation, we can directly calculate the number of combinations. For example, the total possibilities of that man stand on point 5 after 9 time units is C72 = 21.