Fix:

  1. Some parameter names are not aligned with Qwen2.
  2. The return value of Qwen2Attention is three in some transformer versions.
Ready to merge
This branch is ready to get merged automatically.

Sign up or log in to comment