can not believe, but seems 256M is slower then internvl-1B ?
#25
by
josefph
- opened
As title said, it's hard to believe that smolvlm-256M-instruct is slower then internvl-1B. Even i inspect the input embedding and params still can not figure out why ?
internvl-1B >
inp_embed : (1, 547, 896)
trainable params: 17,596,416 || all params: 647,260,288 || trainable%: 2.7186
smolvlm-256M >
inp_embed : (1, 171, 576)
trainable params: 9,768,960 || all params: 172,742,976 || trainable%: 5.6552
Does someone have similar issue ??
