Gate News reports that on March 17, Moonshot released a technical report on Attention Residuals, proposing to replace the fixed residual connections in Transformers with an attention mechanism. On the Kimi Linear 48B model, this can equivalently save 25% of computing power, with less than a 2% increase in inference latency. Elon Musk posted on X last night, “Impressive work from Kimi,” and Moonshot official responded today on Weibo, “Your rocket building is also quite good!”
This tweet also drew attention to one of the co-first authors of the paper: Chen Guangyu (English name Nathan), who is 17 years old and still in high school. The other two co-authors are RoPE (Rotary Positional Encoding) proposer Su Jianlin, and Zhang Yu, the first author of Kimi Linear. Chen Guangyu joined Moonshot in November 2025. The open-source Flash Linear Attention project on GitHub was his introduction to machine learning.
Chen Guangyu also responded to external discussions on X, stating that “a paper that combines algorithm and infrastructure co-design, with both experimental and theoretical support, is unlikely to be written by one person.” The entire Kimi team has contributed, and Yu Zhang and Su Jianlin are both equal contributors. He reminded everyone, “Don’t believe rumors.”
Chen Guangyu’s LinkedIn profile shows that he studies at Huizhou Basis International Park Lane Harbour. Moonshot Academy organized the “Moonshot 48” high school hacker event in March 2025, which Chen Guangyu won.