Text Classification
Transformers
PyTorch
JAX
roberta
code_x_glue_cc_defect_detection
code
security
vulnerability-detection
codebert
apache-2.0
Instructions to use mangsense/codebert_java with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use mangsense/codebert_java with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="mangsense/codebert_java")# Load model directly from transformers import AutoTokenizer, AutoModelForSequenceClassification tokenizer = AutoTokenizer.from_pretrained("mangsense/codebert_java") model = AutoModelForSequenceClassification.from_pretrained("mangsense/codebert_java") - Notebooks
- Google Colab
- Kaggle
| library_name: transformers | |
| pipeline_tag: text-classification | |
| tags: | |
| - text-classification | |
| - pytorch | |
| - jax | |
| - code_x_glue_cc_defect_detection | |
| - code | |
| - roberta | |
| - security | |
| - vulnerability-detection | |
| - codebert | |
| - apache-2.0 | |
| license: apache-2.0 | |
| # CodeBERT fine-tuned for Java Vulnerability Detection | |
| CodeBERT model fine-tuned for detecting security vulnerabilities in Java code. | |
| ## Model Description | |
| This model is fine-tuned from [microsoft/codebert-base](https://huggingface.co/microsoft/codebert-base) for binary classification of secure/insecure Java code. | |
| ## Intended Uses | |
| - Detect security vulnerabilities in Java source code | |
| - Binary classification: Safe (LABEL_0) vs Vulnerable (LABEL_1) | |
| ## How to Use | |
| ```python | |
| from transformers import AutoTokenizer, AutoModelForSequenceClassification | |
| tokenizer = AutoTokenizer.from_pretrained("mangsense/codebert_java") | |
| model = AutoModelForSequenceClassification.from_pretrained("mangsense/codebert_java") | |
| # run code | |
| ```python | |
| from transformers import AutoTokenizer, AutoModelForSequenceClassification | |
| import torch | |
| import numpy as np | |
| tokenizer = AutoTokenizer.from_pretrained('mrm8488/codebert-base-finetuned-detect-insecure-code') | |
| model = AutoModelForSequenceClassification.from_pretrained('mrm8488/codebert-base-finetuned-detect-insecure-code') | |
| inputs = tokenizer("your code here", return_tensors="pt", truncation=True, padding='max_length') | |
| labels = torch.tensor([1]).unsqueeze(0) # Batch size 1 | |
| outputs = model(**inputs, labels=labels) | |
| loss = outputs.loss | |
| logits = outputs.logits | |
| print(np.argmax(logits.detach().numpy())) | |
| ``` | |
| ## Training Data | |
| Trained on CodeXGLUE Defect Detection dataset. | |
| ## Limitations | |
| - Focused on Java code only | |
| - May not detect all types of vulnerabilities |