meituan-longcat
/

LongCat-Flash-Chat

Text Generation

LongCat-Flash-Chat

Model card Files Files and versions

LongCat0830 commited on Sep 24

Commit

f0b8e8d

·

verified ·

1 Parent(s): 426032a

Upload configuration_longcat_flash.py

Files changed (1) hide show

configuration_longcat_flash.py +1 -1

configuration_longcat_flash.py CHANGED Viewed

@@ -53,7 +53,7 @@ class LongcatFlashConfig(PretrainedConfig):
             Dimension of the value heads.
         qk_nope_head_dim (`int`, *optional*, defaults to 128):
             Dimension of the query/key heads that don't use rotary position embeddings.
-        norm_topk_prob (`bool`, *optional*, defaults to `True`):
             Whether to normalize the weights of the routed experts.
         hidden_act (`str` or `function`, *optional*, defaults to `"silu"`):
             The non-linear activation function (function or string) in the decoder.

             Dimension of the value heads.
         qk_nope_head_dim (`int`, *optional*, defaults to 128):
             Dimension of the query/key heads that don't use rotary position embeddings.
+        norm_topk_prob (`bool`, *optional*, defaults to `False`):
             Whether to normalize the weights of the routed experts.
         hidden_act (`str` or `function`, *optional*, defaults to `"silu"`):
             The non-linear activation function (function or string) in the decoder.