# Newton’s method

How highly productive I am…

I just recently completed this post on an online Latex editor, that has a really impressive AI inference formula system. It is pretty good except that I don’t know how to elegantly export content as SVG~

# Transformer Architecture

Recently, I found that my lovely website server got busted by Chinese National Firewall (aka. GFW), Maybe because of the ShadowSocks proxy running on this server. I tried to snapshot this server to other servers a couple of times, but the visit speed was really sick. I suppose it related with my domestic DNS binding.

Due to the epidemic outbreak, I am impressively working from home for almost three month. Although I still need to work, but you know at home, I can spend more time on my personal interestings, like machine learning, music and reading. (Hope my mentor will never find this article out)

Attached is a probably boring illustration about transformer architecture. Almost every BERT-ish paper would like to describe it with a bunch of paragraphs. Ironically, I have never carefully looked through this structure. So today I eventually decided to dive deeply into this structure in a precious weekend afternoon.

# Expected Value about Normal Distribution

I could have completed this post in the last month, however I was too exhausted to write this article yesterday (the last day of Feb).

Recently, I am addicted in Probability & Statistics theorem and calculus. I found there is a lot of interesting formula derivation about Normal Distribution (aka Gaussian Distribution), so I’d like to proof it by myself. Here are several methods I tried:

The first formula is the expected value of lognormal distribution. We know, for a continuous function:

And the Probability Density Function (PDF) of lognormal distribution function is:

Hence, we have:

Because of,

# Verily Phone Screen Interview

Just several days ago, I received an email from Verily, an Alphabet company which delicates in life science. Their HR passed my application and going to move me to the phone interview. However, I messed it up…

I have to say that interview is not a difficult one. The question is like a medium level question at Leetcode:

Given a 1-dimensional axis, a man can move left or right in each time unit. How many possibilities that the man stands on x point after t time units?

I stupidly tried DP at first and struggled in how to implement the state transform formula, that wastes a lot of time.

Today, I reviewed this question and found a fairly easy solution. We do not even need Dynamic Programming.

```l + r = t   (1)
r - l = x   (2)```

Once we solve this equation, we can directly calculate the number of combinations. For example, the total possibilities of that man stand on point 5 after 9 time units is C72 = 21.

# The revolution of CNN

(a) Regular convolution：
AlexNet/VGG

(b) Separable convolution block：
Split Regular convolution into Depth wise and Point wise.

(c) Separable with linear bottleneck：Import ResNet bottleneck into Separable convolution.

(d) bottleneck with expansion layer：
Invert bottleneck. (Small – Large – Small)