Rabin–Karp algorithm

I know it have been a long while that I do not update my website. Even missed the entire May…

Actually, I am confused about my future path during these two months. I got a bunch of offers from different places. However, I don’t even know where should I go ultimately.

So I just try to learn some new stuff as I can to kill the time…

Rabin-Karp is a kind of string searching algorithm which created by Richard M. Karp and Michael O. Rabin. It uses the rolling hash to find an exact match of pattern in a given text. Of course, it is also able to match for multiple patterns.

def search(pattern, text, mod):
    # Let d be the number of characters in the input set
    d = len(set(list(text)))
    
    # Length of pattern     
    l_p = len(pattern)
    
    # Length of text
    l_t = len(text)
    
    p = 0
    t = 0
    h = 1
    
    # Let us calculate the hash value of the pattern
    # hash value for pattern(p) = Σ(v * dm-1) mod 13 
    #                           = ((3 * 102) + (4 * 101) + (4 * 100)) mod 13 
    #                           = 344 mod 13 
    #                           = 6     
    for i in range(l_p - 1):
        h = (h * d) % mod

    # Calculate hash value for pattern and text
    for i in range(l_p):
        p = (d * p + ord(pattern[i])) % mod
        t = (d * t + ord(text[i])) % mod

    # Find the match
    for i in range(l_t - l_p + 1):
        if p == t:
            for j in range(l_p):
                if text[i+j] != pattern[j]:
                    break

            j += 1
            if j == l_p:
                print("Pattern is found at position: " + str(i+1))

        if i < l_t - l_p:
            t = (d*(t-ord(text[i])*h) + ord(text[i+l_p])) % mod

            if t < 0:
                t += mod


text = "ABCCCDCCDDAEFG"
pattern = "CDD"
search(pattern, text, 13)

Machine learning, can?

其实我们都已经意识到了,机器学习正在朝着超大规模参数的趋势发展。以后决定机器学习准确率的是GPU的数量和数据的使用权,而不再是研究机器学习的人。

Actually, we have been aware of the trending of massive scale parameters in machine learning. The number of CPUs and the access of data determines the final performance, but not the person who researches machine learning algorithms.

Mitchell Approximation

A method of computer multiplication and division is proposed which uses binary logarithms. The logarithm of a binary number may be determined approximately from the number itself by simple shifting and counting. A simple add or subtract and shift operation is all that is required to multiply or divide.

#include<stdio.h>

int main() {
    float a = 12.3f;
    float b = 4.56f;
    int c = *(int*)&a + *(int*)&b - 0x3f800000;
    printf("Approximate result:%f\n", *(float*)&c);
    printf("Accurate result:%f\n", a * b);
    return 0;
}

Heap sort

class Test():
    def heap_sort(self, nums):
        i, l = 0, len(nums)
        self.nums = nums
        
        # 构造大顶堆,从非叶子节点开始倒序遍历,因此是l//2 -1 就是最后一个非叶子节点
        for i in range(l//2-1, -1, -1): 
            self.build_heap(i, l-1)
        
        print(nums)
        
        # 上面的循环完成了大顶堆的构造,那么就开始把根节点跟末尾节点交换,然后重新调整大顶堆  
        for j in range(l-1, -1, -1):
            nums[0], nums[j] = nums[j], nums[0]
            self.build_heap(0, j-1)

        return nums
    
    def build_heap(self, i, l): 
        """构建大顶堆"""
        nums = self.nums
        left, right = 2*i+1, 2*i+2 ## 左右子节点的下标
        
        large_index = i 
        
        if left <= l and nums[i] < nums[left]:
            large_index = left

        if right <= l and nums[left] < nums[right]:
            large_index = right
 
        # 通过上面跟左右节点比较后,得出三个元素之间较大的下标,如果较大下表不是父节点的下标,说明交换后需要重新调整大顶堆
        if large_index != i:
            nums[i], nums[large_index] = nums[large_index], nums[i]
            self.build_heap(large_index, l)

Is a Balanced Binary Tree

For this problem, a height-balanced binary tree is defined as:

a binary tree in which the left and right subtrees of every node differ in height by no more than 1.

This problem is an easy-level at Leetcode. I probably did it more than five times, once and once again. Just like a muscle memory.

However, I found an interesting solution today, which literally changed my mind about Python…

Here is the code:

# Definition for a binary tree node.
# class TreeNode:
#     def __init__(self, val=0, left=None, right=None):
#         self.val = val
#         self.left = left
#         self.right = right

class Solution:        
    def isBalanced(self, root: TreeNode, h = 1) -> bool:
        if not root: return h
        l = self.isBalanced(root.left, h + 1)
        r = self.isBalanced(root.right, h + 1)
        return abs(l - r) <= 1 and max(l, r)

I am very confused at the last line, the max(l, r) part.

I thought that max(l, r) should be converted as a bool value even it returns a integer type value, because as the second component of the operation AND, max(l, r) should represent as a bool variable.

Following by my worst idea, I supposed that the function isBalanced would return either 1 (True) or 0 (False). However, I found a crazy truth after experiments, that Python executor actually return a integer value (the maximum value of l and r) if abs(l – r) <= 1 is matched.

So, it really makes sense. Gain new knowledge of Python 🙂

The Discrete Fourier Transform

This week, I welcomed a new apartment-mate, who was a brilliant theoretical physicist/ Product Manager/ Programmer/etc. In one word, he is amazing!

After a cheerful chat with him, I got known that he has extremely abundant experience spans many fields. He was used to be a postgraduate student at UCAS majored in theoretical physics, and RA at sustech, and be PM at MeiTuan (A Chinese take-away services pioneer), and be visit student at MPI.

And, you can’t imagine that, he is a very good programmer at BtyeDance currently XD.

Back to our points, I’d grasp this opportunity to learn some very fundamental knowledge about Quantum Computing. Speaking of which, I really wanna figure out the Shor’s Algorithm.

The most important concept of the Shor’s Algorithm should be how to find the period, in which we can take advantages of Quantum Computing.

Fine, my mentor calls me back… I will do this article lately. STAY TUNED!