Transformer Architecture

Recently, I found that my lovely website server got busted by Chinese National Firewall (aka. GFW), Maybe because of the ShadowSocks proxy running on this server. I tried to snapshot this server to other servers a couple of times, but the visit speed was really sick. I suppose it related with my domestic DNS binding.

Due to the epidemic outbreak, I am impressively working from home for almost three month. Although I still need to work, but you know at home, I can spend more time on my personal interestings, like machine learning, music and reading. (Hope my mentor will never find this article out)

Attached is a probably boring illustration about transformer architecture. Almost every BERT-ish paper would like to describe it with a bunch of paragraphs. Ironically, I have never carefully looked through this structure. So today I eventually decided to dive deeply into this structure in a precious weekend afternoon.

Expected Value about Normal Distribution

I could have completed this post in the last month, however I was too exhausted to write this article yesterday (the last day of Feb).

Recently, I am addicted in Probability & Statistics theorem and calculus. I found there is a lot of interesting formula derivation about Normal Distribution (aka Gaussian Distribution), so I’d like to proof it by myself. Here are several methods I tried:

The first formula is the expected value of lognormal distribution. We know, for a continuous function:

And the Probability Density Function (PDF) of lognormal distribution function is:

Hence, we have:

Because of,

Continue reading

Verily Phone Screen Interview

That is a sad story…

Just several days ago, I received an email from Verily, an Alphabet company which delicates in life science. Their HR passed my application and going to move me to the phone interview. However, I messed it up…

I have to say that interview is not a difficult one. The question is like a medium level question at Leetcode:

Given a 1-dimensional axis, a man can move left or right in each time unit. How many possibilities that the man stands on x point after t time units?

I stupidly tried DP at first and struggled in how to implement the state transform formula, that wastes a lot of time.

Today, I reviewed this question and found a fairly easy solution. We do not even need Dynamic Programming.

l + r = t   (1)
r - l = x   (2)

Once we solve this equation, we can directly calculate the number of combinations. For example, the total possibilities of that man stand on point 5 after 9 time units is C72 = 21.

The revolution of CNN

(a) Regular convolution:
AlexNet/VGG

(b) Separable convolution block:
Split Regular convolution into Depth wise and Point wise.

(c) Separable with linear bottleneck:Import ResNet bottleneck into Separable convolution.

(d) bottleneck with expansion layer:
Invert bottleneck. (Small – Large – Small)