# Newton’s method

# Transformer Architecture

Attached is a probably boring illustration about transformer architecture. Almost every BERT-ish paper would like to describe it with a bunch of paragraphs. Ironically, I have never carefully looked through this structure. So today I eventually decided to dive deeply into this structure in a precious weekend afternoon.

# Expected Value about Normal Distribution

Recently, I am addicted in Probability & Statistics theorem and calculus. I found there is a lot of interesting formula derivation about Normal Distribution (aka Gaussian Distribution), so I’d like to proof it by myself. Here are several methods I tried:

The first formula is the expected value of lognormal distribution. We know, for a continuous function:

And the Probability Density Function (PDF) of lognormal distribution function is:

Hence, we have:

Because of,

# Verily Phone Screen Interview

I received an email from Verily, an Alphabet company which delicates in life science. Their HR passed my application and going to move me to the phone interview.

I have to say that interview is not a difficult one. The question is like a medium level question at Leetcode:

Given a 1-dimensional axis, a man can move left or right in each time unit. How many possibilities that the man stands on x point after t time units?

I stupidly tried DP at first and struggled in how to implement the state transform formula, that wastes a lot of time.

Today, I reviewed this question and found a fairly easy solution. We do not even need Dynamic Programming.

```l + r = t   (1)
r - l = x   (2)```

Once we solve this equation, we can directly calculate the number of combinations. For example, the total possibilities of that man stand on point 5 after 9 time units is C72 = 21.

# The revolution of CNN

(a) Regular convolution：
AlexNet/VGG

(b) Separable convolution block：
Split Regular convolution into Depth wise and Point wise.

(c) Separable with linear bottleneck：Import ResNet bottleneck into Separable convolution.

(d) bottleneck with expansion layer：
Invert bottleneck. (Small – Large – Small)