Transformer Architecture

Due to the epidemic outbreak, I am impressively working from home for almost three month. Although I still need to work, but you know at home, I can spend more time on my personal interestings, like machine learning, music and reading. (Hope my mentor will never find this article out)

Attached is a probably boring illustration about transformer architecture. Almost every BERT-ish paper would like to describe it with a bunch of paragraphs. Ironically, I have never carefully looked through this structure. So today I eventually decided to dive deeply into this structure in a precious weekend afternoon.