docs:第三章 预训练语言模型 (除GLM外)修复完成,等待添加deepseek-v3

This commit is contained in:
KMnO4-zx
2025-05-10 13:46:08 +08:00
parent 9821f37bc0
commit a1e533632e
2 changed files with 109 additions and 69 deletions

View File

@@ -828,4 +828,4 @@ class Transformer(nn.Module):
[1] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin. (2023). *Attention Is All You Need.* arXiv preprint arXiv:1706.03762.
[2] Jay Mody 的文章 “An Intuition for Attention”.
[2] Jay Mody 的文章 “An Intuition for Attention”. 来源https://jaykmody.com/blog/attention-intuition/