Rabin–Karp algorithm

I know it have been a long while that I do not update my website. Even missed the entire May…

Actually, I am confused about my future path during these two months. I got a bunch of offers from different places. However, I don’t even know where should I go ultimately.

So I just try to learn some new stuff as I can to kill the time…

Rabin-Karp is a kind of string searching algorithm which created by Richard M. Karp and Michael O. Rabin. It uses the rolling hash to find an exact match of pattern in a given text. Of course, it is also able to match for multiple patterns.

def search(pattern, text, mod):
    # Let d be the number of characters in the input set
    d = len(set(list(text)))
    
    # Length of pattern     
    l_p = len(pattern)
    
    # Length of text
    l_t = len(text)
    
    p = 0
    t = 0
    h = 1
    
    # Let us calculate the hash value of the pattern
    # hash value for pattern(p) = Σ(v * dm-1) mod 13 
    #                           = ((3 * 102) + (4 * 101) + (4 * 100)) mod 13 
    #                           = 344 mod 13 
    #                           = 6     
    for i in range(l_p - 1):
        h = (h * d) % mod

    # Calculate hash value for pattern and text
    for i in range(l_p):
        p = (d * p + ord(pattern[i])) % mod
        t = (d * t + ord(text[i])) % mod

    # Find the match
    for i in range(l_t - l_p + 1):
        if p == t:
            for j in range(l_p):
                if text[i+j] != pattern[j]:
                    break

            j += 1
            if j == l_p:
                print("Pattern is found at position: " + str(i+1))

        if i < l_t - l_p:
            t = (d*(t-ord(text[i])*h) + ord(text[i+l_p])) % mod

            if t < 0:
                t += mod


text = "ABCCCDCCDDAEFG"
pattern = "CDD"
search(pattern, text, 13)

Generate Parentheses

Given n pairs of parentheses, write a function to generate all combinations of well-formed parentheses.

class Solution:
    def generateParenthesis(self, n: int) -> List[str]:
        ret = []

        # @functools.lru_cache(None)
        def dfs(curr, l, r):
            if l == n and r == n:
                ret.append(curr)
            
            if r > l: return 
            if l < n: dfs(curr + "(", l + 1, r)
            if r < n: dfs(curr + ")", l, r + 1)

        dfs('', 0, 0)

        return ret

Something interesting

不管哪个领域,都可以在上升期做科研,在平稳期做业务,在饱和期做教育,显然Andrew Ng是个明白人。

No matter what field you are in, you can do research in the growing period, dedicate into the industry in the steady period, and develop education in the saturation period. Obviously, Andrew Ng is a sensible person.

楽しい

按理说,我们二十五六岁,买车买房,工作体面,不能再抱怨了。但是房,车,体面,这些都是大路商品。我有,其他人可能有我的一百倍,一千倍还多。但是十年的青春时光,每个人都只有一次。

我希望看到这里的人,可以记得未来很重要,但是二十多岁一定要快意,可以努力奋斗也可以玩,可以富有也可以贫穷,但不能扭曲,要快意。

当你26岁决定自己要不要成为30岁的博士时,要记得这件事。

To be honest, we have a decent job, house, car at the age of 25, should not complain more. However, cars/houses/good jobs, all of them, are general commodities, others may have them of ten times or even of hundred times than of what I have. But the ten years of youth, everyone has only one time.

So, please remember this thing, when you are 26 years old and decide whether you want to be a 30-year-old Doctor.

Machine learning, can?

其实我们都已经意识到了,机器学习正在朝着超大规模参数的趋势发展。以后决定机器学习准确率的是GPU的数量和数据的使用权,而不再是研究机器学习的人。

Actually, we have been aware of the trending of massive scale parameters in machine learning. The number of CPUs and the access of data determines the final performance, but not the person who researches machine learning algorithms.

Mitchell Approximation

A method of computer multiplication and division is proposed which uses binary logarithms. The logarithm of a binary number may be determined approximately from the number itself by simple shifting and counting. A simple add or subtract and shift operation is all that is required to multiply or divide.

#include<stdio.h>

int main() {
    float a = 12.3f;
    float b = 4.56f;
    int c = *(int*)&a + *(int*)&b - 0x3f800000;
    printf("Approximate result:%f\n", *(float*)&c);
    printf("Accurate result:%f\n", a * b);
    return 0;
}

Semiring

import numpy as np
import networkx as nx
from functools import reduce
import matplotlib.pyplot as plt

connect_graph = np.array([[0, 1, 0, 0, 0], 
                          [0, 0, 0, 1, 0], 
                          [0, 0, 0, 1, 0], 
                          [0, 0, 0, 0, 1], 
                          [0, 0, 1, 0, 0]])

def ring_add(a, b):
    return a or b

def ring_multi(a, b):
    return a and b

def dot_product(i, j):
    row = connect_graph[i]
    column = connect_graph[:,j]
    return reduce(ring_add, [ring_multi(a, b) for a, b in zip(row, column)])

def next_generation(connect_graph):
    candidate_number = connect_graph.shape[0]

    new_connect_graph = np.zeros((candidate_number, candidate_number))

    for i in range(candidate_number):
        for j in range(candidate_number):
            new_connect_graph[i][j] = dot_product(i,j)
            
    return new_connect_graph

new_connect_graph = next_generation(connect_graph)

def draw_graph(connect_graph):
    G = nx.DiGraph()
    
    candidate_number = connect_graph.shape[0]
    
    node_name = list(range(candidate_number))
    G.add_nodes_from(node_name)
    
    for i in range(candidate_number):
        for j in range(candidate_number):
            if connect_graph[i][j]:
                G.add_edge(i, j)

    nx.draw(G, with_labels=True)

    plt.show()

draw_graph(new_connect_graph)