| --- |
| license: mit |
| datasets: |
| - WhereIsAI/github-issue-similarity |
| language: |
| - en |
| library_name: sentence-transformers |
| pipeline_tag: feature-extraction |
| --- |
| |
| # WhereIsAI/UAE-Code-Large-V1 |
|
|
| 📢 `WhereIsAI/UAE-Code-Large-V1` **is licensed under MIT. Feel free to use it in any scenario.** |
| If you use it for academic papers, we would greatly appreciate it if you could cite us. 👉 [citation info](#citation). |
|
|
| This model builds upon [WhereIsAI/UAE-Large-V1](https://huggingface.co/WhereIsAI/UAE-Large-V1) and is fine-tuned on the [GIS: Github Issue Similarity](https://huggingface.co/datasets/WhereIsAI/github-issue-similarity) dataset using [AnglE](https://github.com/SeanLee97/AnglE) loss (https://arxiv.org/abs/2309.12871). |
| It can be used to measure **code/issue similarity**. |
|
|
| Results (test set): |
|
|
| - Spearman correlation: 71.19 |
| - Accuracy: 84.37 |
|
|
|
|
| ## Usage |
|
|
| ### 1. angle-emb |
|
|
| You can use it via `angle-emb` as follows: |
|
|
| install: |
|
|
| ``` |
| python -m pip install -U angle-emb |
| ``` |
|
|
| example: |
|
|
| ```python |
| from scipy import spatial |
| from angle_emb import AnglE |
| |
| model = AnglE.from_pretrained('WhereIsAI/UAE-Code-Large-V1').cuda() |
| |
| quick_sort = '''# Approach 2: Quicksort using list comprehension |
| |
| def quicksort(arr): |
| if len(arr) <= 1: |
| return arr |
| else: |
| pivot = arr[0] |
| left = [x for x in arr[1:] if x < pivot] |
| right = [x for x in arr[1:] if x >= pivot] |
| return quicksort(left) + [pivot] + quicksort(right) |
| |
| # Example usage |
| arr = [1, 7, 4, 1, 10, 9, -2] |
| sorted_arr = quicksort(arr) |
| print("Sorted Array in Ascending Order:") |
| print(sorted_arr)''' |
| |
| |
| bubble_sort = '''def bubblesort(elements): |
| # Looping from size of array from last index[-1] to index [0] |
| for n in range(len(elements)-1, 0, -1): |
| swapped = False |
| for i in range(n): |
| if elements[i] > elements[i + 1]: |
| swapped = True |
| # swapping data if the element is less than next element in the array |
| elements[i], elements[i + 1] = elements[i + 1], elements[i] |
| if not swapped: |
| # exiting the function if we didn't make a single swap |
| # meaning that the array is already sorted. |
| return |
| |
| elements = [39, 12, 18, 85, 72, 10, 2, 18] |
| |
| print("Unsorted list is,") |
| print(elements) |
| bubblesort(elements) |
| print("Sorted Array is, ") |
| print(elements)''' |
| |
| vecs = model.encode([ |
| 'def echo(): print("hello world")', |
| quick_sort, |
| bubble_sort |
| ]) |
| |
| |
| print('cos sim (0, 1):', 1 - spatial.distance.cosine(vecs[0], vecs[1])) |
| print('cos sim (0, 2)', 1 - spatial.distance.cosine(vecs[0], vecs[2])) |
| print('cos sim (1, 2):', 1 - spatial.distance.cosine(vecs[1], vecs[2])) |
| |
| ``` |
|
|
| output: |
|
|
| ``` |
| cos sim (0, 1): 0.34329649806022644 |
| cos sim (0, 2) 0.3627094626426697 |
| cos sim (1, 2): 0.6972219347953796 |
| ``` |
|
|
| ## sentence-transformers |
|
|
| You can also use it via `sentence-transformers` |
|
|
| ```python |
| from scipy import spatial |
| from sentence_transformers import SentenceTransformer |
| |
| model = SentenceTransformer('WhereIsAI/UAE-Code-Large-V1').cuda() |
| |
| quick_sort = '''# Approach 2: Quicksort using list comprehension |
| |
| def quicksort(arr): |
| if len(arr) <= 1: |
| return arr |
| else: |
| pivot = arr[0] |
| left = [x for x in arr[1:] if x < pivot] |
| right = [x for x in arr[1:] if x >= pivot] |
| return quicksort(left) + [pivot] + quicksort(right) |
| |
| # Example usage |
| arr = [1, 7, 4, 1, 10, 9, -2] |
| sorted_arr = quicksort(arr) |
| print("Sorted Array in Ascending Order:") |
| print(sorted_arr)''' |
| |
| |
| bubble_sort = '''def bubblesort(elements): |
| # Looping from size of array from last index[-1] to index [0] |
| for n in range(len(elements)-1, 0, -1): |
| swapped = False |
| for i in range(n): |
| if elements[i] > elements[i + 1]: |
| swapped = True |
| # swapping data if the element is less than next element in the array |
| elements[i], elements[i + 1] = elements[i + 1], elements[i] |
| if not swapped: |
| # exiting the function if we didn't make a single swap |
| # meaning that the array is already sorted. |
| return |
| |
| elements = [39, 12, 18, 85, 72, 10, 2, 18] |
| |
| print("Unsorted list is,") |
| print(elements) |
| bubblesort(elements) |
| print("Sorted Array is, ") |
| print(elements)''' |
| |
| vecs = model.encode([ |
| 'def echo(): print("hello world")', |
| quick_sort, |
| bubble_sort |
| ]) |
| |
| |
| print('cos sim (0, 1):', 1 - spatial.distance.cosine(vecs[0], vecs[1])) |
| print('cos sim (0, 2)', 1 - spatial.distance.cosine(vecs[0], vecs[2])) |
| print('cos sim (1, 2):', 1 - spatial.distance.cosine(vecs[1], vecs[2])) |
| ``` |
|
|
| output: |
|
|
| ``` |
| cos sim (0, 1): 0.34329649806022644 |
| cos sim (0, 2) 0.3627094626426697 |
| cos sim (1, 2): 0.6972219347953796 |
| ``` |
|
|
| # Citation |
|
|
| ```bibtex |
| @article{li2023angle, |
| title={AnglE-optimized Text Embeddings}, |
| author={Li, Xianming and Li, Jing}, |
| journal={arXiv preprint arXiv:2309.12871}, |
| year={2023} |
| } |
| ``` |