You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

SentenceTransformer based on microsoft/unixcoder-base

This is a sentence-transformers model finetuned from microsoft/unixcoder-base. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: microsoft/unixcoder-base
  • Maximum Sequence Length: 256 tokens
  • Output Dimensionality: 768 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 256, 'do_lower_case': False, 'architecture': 'RobertaModel'})
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
    'Name: Arithmetic Slices | Code: // Time:  O(n)\n// Space: O(1)\n\nclass Solution {\npublic:\n    int numberOfArithmeticSlices(vector<int>& A) {\n        int res = 0, i = 0;\n        for (int i = 0; i + 2 < A.size(); ++i) {\n            const auto start = i;\n            while (i + 2 < A.size() && A[i + 2] + A[i] == 2 * A[i + 1]) {\n                res += (i++) - start + 1;\n            }\n        }\n        return res;\n    }\n};\n | Tags: Array,Dynamic Programming,Sliding Window',
    'Name: Arithmetic Subarrays | Code: // Time:  O(n * q)\n// Space: O(n)\n\nclass Solution {\npublic:\n    vector<bool> checkArithmeticSubarrays(vector<int>& nums, vector<int>& l, vector<int>& r) {\n        vector<bool> result(size(l));\n        for (int i = 0; i < size(l); ++i) {\n            result[i] = isArith(vector<int>(cbegin(nums) + l[i], cbegin(nums) + r[i] + 1));\n        }\n        return result;\n    }\n\nprivate:\n    bool isArith(const vector<int>& nums) {\n        unordered_set<int> lookup(cbegin(nums), cend(nums));\n        int mn = *min_element(cbegin(nums), cend(nums));\n        int mx = *max_element(cbegin(nums), cend(nums));\n        if (mx == mn) {\n            return true;\n        }\n        if ((mx - mn) % (size(nums) - 1)) {\n            return false;\n        }\n        int d = (mx - mn) / (size(nums) - 1);\n        for (int i = mn; i <= mx; i += d) {\n            if (!lookup.count(i)) {\n                return false;\n            }\n        }\n        return true;\n    }\n};\n | Tags: Array,Hash Table,Sorting',
    'Name: Reverse Odd Levels of Binary Tree | Code: // Time:  O(n)\n// Space: O(n)\n\n// bfs\nclass Solution {\npublic:\n    TreeNode* reverseOddLevels(TreeNode* root) {\n        vector<TreeNode*> q = {root};\n        for (int parity = 0; !empty(q); parity ^= 1) {\n            if (parity) {\n                for (int left = 0, right = size(q) - 1; left < right; ++left, --right) {\n                    swap(q[left]->val, q[right]->val);\n                }\n            }\n            if (!q[0]->left) {\n                break;\n            }\n            vector<TreeNode*> new_q;\n            for (const auto& node : q) {\n                new_q.emplace_back(node->left);\n                new_q.emplace_back(node->right);\n            }\n            q = move(new_q);\n        }\n        return root;\n    }\n};\n | Tags: Binary Tree,Breadth-First Search,Depth-First Search,Tree',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.7905, 0.5824],
#         [0.7905, 1.0000, 0.6028],
#         [0.5824, 0.6028, 1.0000]])

Evaluation

Metrics

Semantic Similarity

Metric Value
pearson_cosine 0.8845
spearman_cosine 0.884

Training Details

Training Dataset

Unnamed Dataset

  • Size: 4,088 training samples
  • Columns: text1, text2, and score
  • Approximate statistics based on the first 1000 samples:
    text1 text2 score
    type string string float
    details
    • min: 51 tokens
    • mean: 224.9 tokens
    • max: 256 tokens
    • min: 77 tokens
    • mean: 226.23 tokens
    • max: 256 tokens
    • min: 0.14
    • mean: 0.43
    • max: 1.0
  • Samples:
    text1 text2 score
    Name: As Far from Land as Possible | Code: // Time: O(m * n)
    // Space: O(m * n)

    class Solution {
    public:
    int maxDistance(vector>& grid) {
    static const vector> directions{{0, 1}, {1, 0}, {0, -1}, {-1, 0}};
    queue> q;
    for (int i = 0; i < grid.size(); ++i) {
    for (int j = 0; j < grid[i].size(); ++j) {
    if (grid[i][j]) {
    q.emplace(i, j);
    }
    }
    }
    if (q.size() == grid.size() * grid[0].size()) {
    return -1;
    }
    int level = -1;
    while (!q.empty()) {
    queue> next_q;
    while (!q.empty()) {
    const auto [x, y] = q.front(); q.pop();
    for (const auto& [dx, dy] : directions) {
    const auto& nx = x + dx;
    const auto& ny = y + dy;
    if (!(0 <= nx && nx < grid.size() &&
    ...
    Name: Maximum Manhattan Distance After K Changes | Code: // Time: O(n)
    // Space: O(1)

    // greedy
    class Solution {
    public:
    int maxDistance(string s, int k) {
    int result = 0;
    for (int i = 0, x = 0, y = 0; i < size(s); ++i) {
    if (s[i] == 'E') {
    ++x;
    } else if (s[i] == 'W') {
    --x;
    } else if (s[i] == 'N') {
    ++y;
    } else if (s[i] == 'S') {
    --y;
    }
    result = max(result, min(abs(x) + abs(y) + 2 * k, i + 1));
    }
    return result;
    }
    };
    | Tags: Counting,Hash Table,Math,String
    0.3427242051079267
    Name: Wiggle Sort II | Code: // Time: O(n) ~ O(n^2), O(n) on average.
    // Space: O(1)

    // Tri Partition (aka Dutch National Flag Problem) with virtual index solution. (44ms)
    class Solution {
    public:
    void wiggleSort(vector& nums) {
    int mid = (nums.size() - 1) / 2;
    nth_element(nums.begin(), nums.begin() + mid, nums.end()); // O(n) ~ O(n^2) time
    reversedTriPartitionWithVI(nums, nums[mid]); // O(n) time, O(1) space
    }

    void reversedTriPartitionWithVI(vector& nums, int val) {
    const int N = nums.size() / 2 * 2 + 1;
    #define Nums(i) nums[(1 + 2 * (i)) % N]
    for (int i = 0, j = 0, n = nums.size() - 1; j <= n;) {
    if (Nums(j) > val) {
    swap(Nums(i++), Nums(j++));
    } else if (Nums(j) < val) {
    swap(Nums(j), Nums(n--));
    } else {
    ++j;
    }
    }
    }
    };

    // Time: O(n) ~ O(n^2)
    // Space: O(n)
    // Tri Partition (aka Dutch National Flag Pro...
    Name: Array With Elements Not Equal to Average of Neighbors | Code: // Time: O(n) ~ O(n^2), O(n) on average
    // Space: O(1)

    // Tri Partition (aka Dutch National Flag Problem) with virtual index solution
    class Solution {
    public:
    vector rearrangeArray(vector& nums) {
    int mid = (size(nums) - 1) / 2;
    nth_element(begin(nums), begin(nums) + mid, end(nums)); // O(n) ~ O(n^2) time
    reversedTriPartitionWithVI(nums, nums[mid]); // O(n) time, O(1) space
    return nums;
    }

    private:
    void reversedTriPartitionWithVI(vector& nums, int val) {
    const int N = size(nums) / 2 * 2 + 1;
    #define Nums(i) nums[(1 + 2 * (i)) % N]
    for (int i = 0, j = 0, n = size(nums) - 1; j <= n;) {
    if (Nums(j) > val) {
    swap(Nums(i++), Nums(j++));
    } else if (Nums(j) < val) {
    swap(Nums(j), Nums(n--));
    } else {
    ++j;
    }
    }
    }
    };

    // Time: O(nlogn)
    ...
    0.7248856548541956
    Name: Minimum Time to Visit a Cell In a Grid | Code: // Time: O(m * n * log(m * n))
    // Space: O(m * n)

    // dijkstra's algorithm
    class Solution {
    public:
    int minimumTime(vector>& grid) {
    static const vector> DIRECTIONS = {{1, 0}, {0, 1}, {-1, 0}, {0, -1}};
    if (min(grid[0][1], grid[1][0]) > 1) {
    return -1;
    }
    const auto& dijkstra = [&](const pair& start, const pair& target) {
    vector> best(size(grid), vector(size(grid[0]), numeric_limits::max()));
    best[start.first][start.second] = 0;
    using Data = tuple;
    priority_queue, greater> min_heap;
    min_heap.emplace(0, start.first, start.second);
    while (!empty(min_heap)) {
    const auto [curr, i, j] = min_heap.top(); min_heap.pop();
    if (best[i][j] < curr) {
    continue;
    ...
    Name: Sentence Similarity III | Code: // Time: O(n)
    // Space: O(1)

    class Solution {
    public:
    bool areSentencesSimilar(string sentence1, string sentence2) {
    if (size(sentence1) > size(sentence2)) {
    swap(sentence1, sentence2);
    }
    int count = 0;
    for (int step = 0; step < 2; ++step) {
    for (int i = 0; i <= size(sentence1); ++i) {
    char c1 = i != size(sentence1) ? sentence1[step == 0 ? i : size(sentence1) - 1 - i] : ' ';
    char c2 = i != size(sentence2) ? sentence2[step == 0 ? i : size(sentence2) - 1 - i] : ' ';
    if (c1 != c2) {
    break;
    }
    if (c1 == ' ') {
    ++count;
    }
    }
    }
    return count >= count_if(cbegin(sentence1), cend(sentence1),
    [](char x) { return x == ' '; }) + 1;
    }
    };
    | Tags: Array,String,Two Pointers
    0.2964020586887101
  • Loss: CoSENTLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "pairwise_cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • learning_rate: 2e-05
  • num_train_epochs: 2
  • warmup_steps: 102
  • use_cpu: True
  • data_seed: 42
  • remove_unused_columns: False
  • load_best_model_at_end: True
  • dataloader_pin_memory: False
  • gradient_checkpointing: True

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 8
  • per_device_eval_batch_size: 8
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 2
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.0
  • warmup_steps: 102
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: True
  • use_mps_device: False
  • seed: 42
  • data_seed: 42
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: False
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: False
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • hub_revision: None
  • gradient_checkpointing: True
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • liger_kernel_config: None
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Epoch Step Training Loss unixcoder_leetcode_eval_spearman_cosine
1.6634 850 2.6601 -
1.7613 900 2.5066 0.8875
1.8591 950 2.3788 -
1.9569 1000 2.34 0.8840
  • The bold row denotes the saved checkpoint.

Framework Versions

  • Python: 3.12.11
  • Sentence Transformers: 5.1.0
  • Transformers: 4.55.2
  • PyTorch: 2.8.0+cu126
  • Accelerate: 1.10.0
  • Datasets: 4.0.0
  • Tokenizers: 0.21.4

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

CoSENTLoss

@online{kexuefm-8847,
    title={CoSENT: A more efficient sentence vector scheme than Sentence-BERT},
    author={Su Jianlin},
    year={2022},
    month={Jan},
    url={https://kexue.fm/archives/8847},
}
Downloads last month
9
Safetensors
Model size
126M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Sid-the-sloth/leetcode_unixcoder_final

Finetuned
(6)
this model

Space using Sid-the-sloth/leetcode_unixcoder_final 1

Evaluation results