Merge Details

Merge Method

This model was merged using the NuSLERP (Non-Uniform Spherical Linear Interpolation) merging method, with medalpaca-7b as a base.

The NuSLERP (Non-Uniform Spherical Linear Interpolation) model merging technique is an extension of the standard SLERP (Spherical Linear Interpolation) method used to combine the weights of two Large Language Models (LLMs).
SLERP is a geometric approach that interpolates between the parameter vectors of two models (often referred to as $\mathbf{W}_1$ and $\mathbf{W}_2$) along the shortest great arc on a hypersphere. This maintains a constant rate of change and prevents the interpolated weights from shrinking in magnitude, a problem that can occur with simple linear interpolation in high-dimensional space.

NuSLERP's Key Differences

The "Non-Uniform" aspect in NuSLERP introduces flexibility to the standard SLERP method, primarily by allowing the interpolation to be applied differently across parts of the model. While standard SLERP treats the entire weight tensor (or model) as a single vector for interpolation, NuSLERP can:

Row-wise/Column-wise Interpolation: Instead of treating an entire weight tensor as one high-dimensional vector, NuSLERP can apply the SLERP calculation to individual rows or columns of the weight matrices separately. This is specified by the nuslerp_flatten and nuslerp_row_wise parameters in configuration files (MergeKit framework).
Layer-Specific Parameters: Like many merging techniques, NuSLERP allows for layer-specific weighting, meaning the blend factor (the $\alpha$ or $t$ in the equation) can be different for the attention layers versus the MLP layers, or even for early layers versus deep layers.

This non-uniform application allows for a more granular and potentially more effective blend of knowledge, as different parts of the model (like different layers or different weight directions) may contribute differently to the model's overall capabilities.

Models Merged

The following models were included in the merge:

Configuration

The following YAML configuration was used to produce this model:

base_model: medalpaca-7b
dtype: bfloat16
merge_method: nuslerp
modules:
  default:
    slices:
    - sources:
      - layer_range: [0, 32]
        model: medalpaca-sft
        parameters:
          weight: 0.3
      - layer_range: [0, 32]
        model: medalpaca-kd
        parameters:
          weight: 0.7
      - layer_range: [0, 32]
        model: medalpaca-7b

Review all model metrics benchmark via Benchmark Document Preview.

Downloads last month: 34

Safetensors

Model size

7B params

Tensor type

BF16

Model tree for MedSwin/MedSwin-Merged-NuSLERP-KD-0.7

MedSwin/MedSwin-7B-KD

MedSwin/MedSwin-7B-SFT

Merge model

this model

Datasets used to train MedSwin/MedSwin-Merged-NuSLERP-KD-0.7

Space using MedSwin/MedSwin-Merged-NuSLERP-KD-0.7 1

Collection including MedSwin/MedSwin-Merged-NuSLERP-KD-0.7

Model

Collection

Model mix • 9 items • Updated 10 days ago • 1