docs:第三章 预训练语言模型 (除GLM外)修复完成,等待添加deepseek-v3
This commit is contained in:
@@ -828,4 +828,4 @@ class Transformer(nn.Module):
|
||||
|
||||
[1] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin. (2023). *Attention Is All You Need.* arXiv preprint arXiv:1706.03762.
|
||||
|
||||
[2] Jay Mody 的文章 “An Intuition for Attention”.
|
||||
[2] Jay Mody 的文章 “An Intuition for Attention”. 来源:https://jaykmody.com/blog/attention-intuition/
|
||||
|
||||
Reference in New Issue
Block a user