Transformer Architecture

Recently, I found that my lovely website server got busted by Chinese National Firewall (aka. GFW), Maybe because of the ShadowSocks proxy running on this server. I tried to snapshot this server to other servers a couple of times, but the visit speed was really sick. I suppose it related with my domestic DNS binding.

Due to the epidemic outbreak, I am impressively working from home for almost three month. Although I still need to work, but you know at home, I can spend more time on my personal interestings, like machine learning, music and reading. (Hope my mentor will never find this article out)

Attached is a probably boring illustration about transformer architecture. Almost every BERT-ish paper would like to describe it with a bunch of paragraphs. Ironically, I have never carefully looked through this structure. So today I eventually decided to dive deeply into this structure in a precious weekend afternoon.

Expected Value about Normal Distribution

I could have completed this post in the last month, however I was too exhausted to write this article yesterday (the last day of Feb).

Recently, I am addicted in Probability & Statistics theorem and calculus. I found there is a lot of interesting formula derivation about Normal Distribution (aka Gaussian Distribution), so I’d like to proof it by myself. Here are several methods I tried:

The first formula is the expected value of lognormal distribution. We know, for a continuous function:

And the Probability Density Function (PDF) of lognormal distribution function is:

Hence, we have:

Because of,

Continue reading

Semigroup and Monoid

Credit to reminder – My sweet piggy

Today is the last day of January. It would be my last chance to complete this blog, otherwise I will miss this lovely month.

In this month, I have went to a new company as a FTE, and simultaneously happened to meet an epic epidemic that overwhelming in Mainland China. The government suggests citizens stay at home, so I am suddenly aware of that I can enjoy this great time at home and read some books.

Semigroup is a fancy concept in math and programming. Let’s say there is a set S and a binary operation ✕ on this set: S ✕ S ➞ S, if ✕ meets the binding law, that is ∀ x, y, z ∈ S, (x ✕ y) ✕ z = x ✕ (y ✕ z). Then the ordered pair (S, ✕) is called a semigroup.

For example, S = {1,2,3,4,5, …}, (2 + 3) + 4 = 2 + (3 + 4) = 9.

infix operator <> : AdditionPrecedence

protocol Semigroup {
    static func <> (lhs: Self, rhs: Self) -> Self
}

Protocol semigroup declares a calculation method of that two arguments and the return value achieve an identical semigroup type. We called this method as Append. Following is the specific achievements of String and Array.

extension String: Semigroup {
    static func <> (lhs: String, rhs: String) -> String {
        return lhs + rhs
    }
}

extension Array: Semigroup {
    static func <> (lhs: [Element], rhs: [Element]) -> [Element] {
        return lhs + rhs
    }
}

func test() {
    let hello = "Hello "
    let world = "world"
    let helloWorld = hello <> world
    
    let one = [1,2,3]
    let two = [4,5,6,7]
    let three = one <> two
}

Next, we are moving to the Monoid part.

Monoid is a kind of semigroup, with an extra attribute named Identity, that is an element E on the set S of the semigroup. Any element A in the set S conforms to A ✕ E = E ✕ A = A.

e.g. 0 + x = x in the natural number set.

protocol Monoid: Semigroup {
    static var empty: Self { get }
}

extension String: Monoid {
    static var empty: String { return "" }
}

extension Array: Monoid {
    static var empty: [Element] { return [] }
}

func test() {
let str = "Hello world" <> String.empty // Always "Hello world"
let arr = [1,2,3] <> [Int].empty // Always [1,2,3]
}

This article is not finish, stay tuned.

How to buy and sell stock at the best time

  • Best Time to Buy and Sell Stock I
  • Best Time to Buy and Sell Stock II
  • Best Time to Buy and Sell Stock III
  • Best Time to Buy and Sell Stock IV
  • Best Time to Buy and Sell Stock with cooldown
  • Best Time to Buy and Sell Stock with transaction fee

Best Time to Buy and Sell Stock I
Say you have an array for which the ith element is the price of a given stock on day i.

If you were only permitted to complete at most one transaction (i.e., buy one and sell one share of the stock), design an algorithm to find the maximum profit.

Note that you cannot sell a stock before you buy one.

class Solution:
    def maxProfit(self, prices: List[int]) -> int:
        buy, sell = -0x7777777, 0

        for price in prices:
            buy = max(buy, -price)
            sell = max(sell, buy + price)

        return sell

Best Time to Buy and Sell Stock II
Say you have an array for which the ith element is the price of a given stock on day i.

Design an algorithm to find the maximum profit. You may complete as many transactions as you like (i.e., buy one and sell one share of the stock multiple times).

Note that you may not engage in multiple transactions at the same time (i.e., you must sell the stock before you buy again).

class Solution:
    def maxProfit(self, prices: List[int]) -> int:
        profit, buy = 0, 0x7777777

        for price in prices:
            buy = min(buy, price)

            tmp = price - buy
            
            it tmp > 0:
                profit += tmp
                buy = price

        return profit

Best Time to Buy and Sell Stock III
Say you have an array for which the ith element is the price of a given stock on day i.

Design an algorithm to find the maximum profit. You may complete at most two transactions.

Note that you may not engage in multiple transactions at the same time (i.e., you must sell the stock before you buy again).

class Solution:
    def maxProfit(self, prices: List[int]) -> int:
        buy_1, buy_2, sell_1, sell_2 = -0x7777777, -0x7777777, 0, 0

        for price in prices:
            buy_1 = max(buy_1, -price)
            sell_1 = max(sell_1, buy_1 + price)

            buy_2 = max(buy_2, sell_1 - price)
            sell_2 = max(sell_2, buy_2 + price)

        return sell_2

Best Time to Buy and Sell Stock IV
Say you have an array for which the i-th element is the price of a given stock on day i.

Design an algorithm to find the maximum profit. You may complete at most k transactions.

Note that you may not engage in multiple transactions at the same time (ie, you must sell the stock before you buy again).

class Solution:
    def maxProfit(self, k: int, prices: List[int]) -> int:
        if k > len(prices) >> 1:
            return sum(prices[i+1] - prices[i] for i in range(len(prices) - 1) if prices[i+1] > prices[i])
        
        cash, asset = [-0x7777777] * (k + 1), [0] * (k + 1)

        for price in prices:
            for i in range(1, k+1):
                cash[i] = max(cash[i], sell[i-1] - price)
                asset[i] = max(asset[i], cash[i] + price)

        return asset[k]

Best Time to Buy and Sell Stock with Cooldown

Say you have an array for which the ith element is the price of a given stock on day i.

Design an algorithm to find the maximum profit. You may complete as many transactions as you like (ie, buy one and sell one share of the stock multiple times) with the following restrictions:

  • You may not engage in multiple transactions at the same time (ie, you must sell the stock before you buy again).
  • After you sell your stock, you cannot buy stock on the next day. (ie, cooldown 1 day)
class Solution:
    def maxProfit(self, prices: List[int]) -> int: 
        if not prices:
            return 0
        
        sell, buy, prev_sell, prev_buy = 0, -prices[0], 0, 0
        
        for price in prices:
            prev_buy = buy
            buy = max(prev_buy, prev_sell - price)
            prev_sell = sell
            sell = max(prev_sell, prev_buy + price)
            
        return sell

Best Time to Buy and Sell Stock with Transaction Fee
You are given an array of integers prices, for which the i-th element is the price of a given stock on day i; and a non-negative integer fee representing a transaction fee.

You may complete as many transactions as you like, but you need to pay the transaction fee for each transaction. You may not buy more than 1 share of a stock at a time (ie. you must sell the stock share before you buy again.)

Return the maximum profit you can make.

class Solution:
    def maxProfit(self, prices: List[int], fee: int) -> int:
        sell, buy = 0, -prices[0]
        
        for p in prices[1:]:
            sell = max(sell, buy + p - fee)
            buy = max(buy, sell - p)
            
        return sell

Verily Phone Screen Interview

That is a sad story…

Just several days ago, I received an email from Verily, an Alphabet company which delicates in life science. Their HR passed my application and going to move me to the phone interview. However, I messed it up…

I have to say that interview is not a difficult one. The question is like a medium level question at Leetcode:

Given a 1-dimensional axis, a man can move left or right in each time unit. How many possibilities that the man stands on x point after t time units?

I stupidly tried DP at first and struggled in how to implement the state transform formula, that wastes a lot of time.

Today, I reviewed this question and found a fairly easy solution. We do not even need Dynamic Programming.

l + r = t   (1)
r - l = x   (2)

Once we solve this equation, we can directly calculate the number of combinations. For example, the total possibilities of that man stand on point 5 after 9 time units is C72 = 21.

Word Break

Given a non-empty string s and a dictionary wordDict containing a list of non-empty words, determine if s can be segmented into a space-separated sequence of one or more dictionary words.

Note:

  • The same word in the dictionary may be reused multiple times in the segmentation.
  • You may assume the dictionary does not contain duplicate words.

Example 1:

Input: s = "leetcode", wordDict = ["leet", "code"]
Output: true
Explanation: Return true because "leetcode" can be segmented as "leet code".

Example 2:

Input: s = "applepenapple", wordDict = ["apple", "pen"]
Output: true
Explanation: Return true because "applepenapple" can be segmented as "apple pen apple".
             Note that you are allowed to reuse a dictionary word.

Example 3:

Input: s = "catsandog", wordDict = ["cats", "dog", "sand", "and", "cat"]
Output: false

Recently, I am going to pick up my leetcode skills.

This problem that I still remember It token me more than two days to consider, but this time it was aced in 5 minutes, as well as just in nearly 1 line core code.

It seems like practising is really useful!

from functools import lru_cache

class Solution:
    def wordBreak(self, s: str, wordDict: List[str]) -> bool:
        @lru_cache(None)
        def dfs(s):
            return True if not s else any(dfs(s[len(word):]) for word in wordDict if s.startswith(word))
            
        return dfs(s)

Max Increase to Keep City Skyline

In a 2 dimensional array grid, each value grid[i][j] represents the height of a building located there. We are allowed to increase the height of any number of buildings, by any amount (the amounts can be different for different buildings). Height 0 is considered to be a building as well. 

At the end, the “skyline” when viewed from all four directions of the grid, i.e. top, bottom, left, and right, must be the same as the skyline of the original grid. A city’s skyline is the outer contour of the rectangles formed by all the buildings when viewed from a distance. See the following example.

What is the maximum total sum that the height of the buildings can be increased?

Example:
Input: grid = [[3,0,8,4],[2,4,5,7],[9,2,6,3],[0,3,1,0]]
Output: 35
Explanation: 
The grid is:
[ [3, 0, 8, 4], 
  [2, 4, 5, 7],
  [9, 2, 6, 3],
  [0, 3, 1, 0] ]

The skyline viewed from top or bottom is: [9, 4, 8, 7]
The skyline viewed from left or right is: [8, 7, 9, 3]

The grid after increasing the height of buildings without affecting skylines is:

gridNew = [ [8, 4, 8, 7],
            [7, 4, 7, 7],
            [9, 4, 8, 7],
            [3, 3, 3, 3] ]

Notes:

  • 1 < grid.length = grid[0].length <= 50.
  • All heights grid[i][j] are in the range [0, 100].
  • All buildings in grid[i][j] occupy the entire grid cell: that is, they are a 1 x 1 x grid[i][j] rectangular prism.
class Solution:
    def maxIncreaseKeepingSkyline(self, grid: List[List[int]]) -> int:
        lr_view = [max(line) for line in grid]
        tb_view = [max(line) for line in zip(*grid)]
        
        result = 0
        
        for i in range(len(grid)):
            for j in range(len(grid[0])):
                diff = min(tb_view[j], lr_view[i]) - grid[i][j]
                result += max(0, diff)
                
        return result