歇斯底里

今天读论文的时候看到了一个词:Hysteria

不认识,于是随手查了一下,发现字典给出的释义非常简单,就四个字:歇斯底里。

以前一直以为歇斯底里是一个成语,没想到原来是一个舶来词。

Rabin–Karp algorithm

I know it have been a long while that I do not update my website. Even missed the entire May…

Actually, I am confused about my future path during these two months. I got a bunch of offers from different places. However, I don’t even know where should I go ultimately.

So I just try to learn some new stuff as I can to kill the time…

Rabin-Karp is a kind of string searching algorithm which created by Richard M. Karp and Michael O. Rabin. It uses the rolling hash to find an exact match of pattern in a given text. Of course, it is also able to match for multiple patterns.

def search(pattern, text, mod):
    # Let d be the number of characters in the input set
    d = len(set(list(text)))
    
    # Length of pattern     
    l_p = len(pattern)
    
    # Length of text
    l_t = len(text)
    
    p = 0
    t = 0
    h = 1
    
    # Let us calculate the hash value of the pattern
    # hash value for pattern(p) = Σ(v * dm-1) mod 13 
    #                           = ((3 * 102) + (4 * 101) + (4 * 100)) mod 13 
    #                           = 344 mod 13 
    #                           = 6     
    for i in range(l_p - 1):
        h = (h * d) % mod

    # Calculate hash value for pattern and text
    for i in range(l_p):
        p = (d * p + ord(pattern[i])) % mod
        t = (d * t + ord(text[i])) % mod

    # Find the match
    for i in range(l_t - l_p + 1):
        if p == t:
            for j in range(l_p):
                if text[i+j] != pattern[j]:
                    break

            j += 1
            if j == l_p:
                print("Pattern is found at position: " + str(i+1))

        if i < l_t - l_p:
            t = (d*(t-ord(text[i])*h) + ord(text[i+l_p])) % mod

            if t < 0:
                t += mod


text = "ABCCCDCCDDAEFG"
pattern = "CDD"
search(pattern, text, 13)

Machine learning, can?

其实我们都已经意识到了,机器学习正在朝着超大规模参数的趋势发展。以后决定机器学习准确率的是GPU的数量和数据的使用权,而不再是研究机器学习的人。

Actually, we have been aware of the trending of massive scale parameters in machine learning. The number of CPUs and the access of data determines the final performance, but not the person who researches machine learning algorithms.

Mitchell Approximation

A method of computer multiplication and division is proposed which uses binary logarithms. The logarithm of a binary number may be determined approximately from the number itself by simple shifting and counting. A simple add or subtract and shift operation is all that is required to multiply or divide.

#include<stdio.h>

int main() {
    float a = 12.3f;
    float b = 4.56f;
    int c = *(int*)&a + *(int*)&b - 0x3f800000;
    printf("Approximate result:%f\n", *(float*)&c);
    printf("Accurate result:%f\n", a * b);
    return 0;
}

Heap sort

class Test():
    def heap_sort(self, nums):
        i, l = 0, len(nums)
        self.nums = nums
        
        # 构造大顶堆,从非叶子节点开始倒序遍历,因此是l//2 -1 就是最后一个非叶子节点
        for i in range(l//2-1, -1, -1): 
            self.build_heap(i, l-1)
        
        print(nums)
        
        # 上面的循环完成了大顶堆的构造,那么就开始把根节点跟末尾节点交换,然后重新调整大顶堆  
        for j in range(l-1, -1, -1):
            nums[0], nums[j] = nums[j], nums[0]
            self.build_heap(0, j-1)

        return nums
    
    def build_heap(self, i, l): 
        """构建大顶堆"""
        nums = self.nums
        left, right = 2*i+1, 2*i+2 ## 左右子节点的下标
        
        large_index = i 
        
        if left <= l and nums[i] < nums[left]:
            large_index = left

        if right <= l and nums[left] < nums[right]:
            large_index = right
 
        # 通过上面跟左右节点比较后,得出三个元素之间较大的下标,如果较大下表不是父节点的下标,说明交换后需要重新调整大顶堆
        if large_index != i:
            nums[i], nums[large_index] = nums[large_index], nums[i]
            self.build_heap(large_index, l)