inconsistency in the `configuration_longcat_flash.py`
#11
by
hb4ch
- opened
in the comment section:norm_topk_prob (`bool`, *optional*, defaults to `True`): Whether to normalize the weights of the routed experts.
Whereas in the function signature: def __init__( self, ... norm_topk_prob=False, ... )
This could cause confusion and lead to incorrect community implementation.
I suggest that either you could add this param explicitly in the configuration file, or keep configuration_longcat_flash.py consistent
LongCat0830
changed discussion status to
closed