关于Solar-Open-100B模型源码注释引用来源的疑问

#15
by kelinliu - opened

最近在韩国AI技术社区看到Solar-Open-100B这个模型讨论热度挺高的说实话我一开始就是凑个热闹随便看看,结果看着看着发现注释里有个地方好像有点不太对,就想着跟维护者聊一下仅供参考。

https://huggingface.co/upstage/Solar-Open-100B/blob/main/modeling_solar_open.py

image

文件开头写着这个实现是基于GLM-4 Dense版本改的,我看的时候心里就有点犯嘀咕,因为之前接触过GLM-4的MoE版本,总觉得整体风格看起来挺接近的。后来忍不住就把Transformers库里头的modeling_glm4_moe.py找出来仔细对比了一下,发现Solar的整体实现(不管是RoPE、Norm还是MoE和Router这些部分)确实有不少相似点。

https://github.com/huggingface/transformers/blob/a7f29523361b2cc12e51c1f5133d95f122f6f45c/src/transformers/models/glm4_moe/modeling_glm4_moe.py#L457

https://hwcomputing.csdn.net/69549c02bf6b0e4b285fac44.html

比如注释里提到引入混合MoE架构、RoPE策略用Block风格、简化LayerNorm结构这些改动方向,在modeling_glm4_moe.py里其实已经有类似的形态了。MoE版本本身就采用了Block风格的rotate_half,Router部分也有e_score_correction_bias这类比较有辨识度的命名细节,双方放在一起看的话相似度还是比较高的。当然这可能只是说明Solar在工程实现上参考借鉴了MoE版本的设计思路,倒不是说Dense版本就完全没有参考价值。

只是我有点担心如果注释只写基于Dense版本改的可能会让人误解主要改动是从Dense版本做架构迁移过来的毕竟像我们这种后来看代码的人如果按照注释去Dense版本里找对应实现可能要走不少弯路。当然也可能是我自己理解不到位的地方,如果维护者有其他考量的话也正常。

所以我建议是注释里可以把引用来源改成指向modeling_glm4_moe.py,或者两个上游文件都提一下,顺便说明一下Solar在这两个版本基础上做了哪些工程调整和取舍。这样大家一眼就能看明白代码演进的脉络,不用像我这样还得自己吭哧吭吭对比半天才敢确定。纯属个人一点小看法,维护者/作者觉得怎么合适就怎么处理吧。

hi @kelinliu , thanks a lot for your detailed analysis. you're correct, we're working on clarifying this better in the new version of the file.

@keunwooupstage 期待后续的交流与合作

hi @kelinliu , i wanted to follow up.
we once had prepared a fixed version of the file, but at the same time, we started to work on this PR to transformers. while working on this PR, we realized the modeling file is going to be automatically generated, including the license part.

so, for the record, what you described is correct. but instead of adding another version of modeling_solar_open.py in our repo, we decided to keep it until the aforementioned PR is merged. after then, we will simply remove the modeling file.

hope it clarifies your question, and thanks again for your interest.

keunwooupstage changed discussion status to closed

Sign up or log in to comment