|
--- |
|
license: apache-2.0 |
|
language: |
|
- en |
|
library_name: transformers |
|
pipeline_tag: text-generation |
|
inference: true |
|
widget: |
|
- text: "public class HelloWorld {\n public static void main(String[] args) {" |
|
example_title: Hello world |
|
group: Java |
|
--- |
|
|
|
|
|
# JavaCoder |
|
|
|
|
|
## Table of Contents |
|
|
|
1. [Model Summary](##model-summary) |
|
2. [Use](##use) |
|
3. [Limitations](##limitations) |
|
4. [Training](##training) |
|
5. [License](##license) |
|
6. [Citation](##citation) |
|
|
|
## Model Summary |
|
|
|
The JavaCoder models are !B parameter models trained on 80+ programming languages from [The Stack (v1.2)](https://huggingface.co/datasets/bigcode/the-stack), with opt-out requests excluded. The model uses [Multi Query Attention](https://arxiv.org/abs/1911.02150), [a context window of 8192 tokens](https://arxiv.org/abs/2205.14135), and was trained using the [Fill-in-the-Middle objective](https://arxiv.org/abs/2207.14255) on 1 trillion tokens. |
|
|
|
- **Repository:** |
|
- **Project Website:** |
|
- **Paper:** |
|
- **Point of Contact:** |
|
- **Languages:** 80+ Programming languages |
|
|
|
## Use |
|
|
|
### Intended use |
|
|
|
The model was trained on GitHub code. As such it is _not_ an instruction model and commands like "Write a function that computes the square root." do not work well. However, by using the [Tech Assistant prompt](https://huggingface.co/datasets/bigcode/ta-prompt) you can turn it into a capable technical assistant. |
|
|
|
**Feel free to share your generations in the Community tab!** |
|
|
|
### Generation |
|
```Java |
|
# pip install -q transformers |
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
|
|
checkpoint = "infosys/javacoder-1b" |
|
device = "cuda" # for GPU usage or "cpu" for CPU usage |
|
|
|
tokenizer = AutoTokenizer.from_pretrained(checkpoint) |
|
model = AutoModelForCausalLM.from_pretrained(checkpoint).to(device) |
|
|
|
inputs = tokenizer.encode("public class HelloWorld {\n public static void main(String[] args) {", return_tensors="pt").to(device) |
|
outputs = model.generate(inputs) |
|
print(tokenizer.decode(outputs[0])) |
|
``` |
|
|
|
### Fill-in-the-middle |
|
Fill-in-the-middle uses special tokens to identify the prefix/middle/suffix part of the input and output: |
|
|
|
```Java |
|
input_text = "<fim_prefix>public class HelloWorld {\n public static void main(String[] args) {<fim_suffix>}\n}<fim_middle>" |
|
inputs = tokenizer.encode(input_text, return_tensors="pt").to(device) |
|
outputs = model.generate(inputs) |
|
print(tokenizer.decode(outputs[0])) |
|
``` |