Model Card for super-cool-instruct

This model is a fine-tuned version of codellama/CodeLlama-7b-Instruct-hf designed to enhance instruction-following capabilities. It was developed as part of a Master's thesis project.

Model Details

Model Description

The super-cool-instruct model is a large language model fine-tuned using the QLoRA (4-bit Quantization + LoRA) technique. The goal of this model was to adapt the base CodeLlama model to better follow user instructions while maintaining its coding and reasoning capabilities.

Developed by: mingyue0101
Model type: Causal Language Model (Fine-tuned with PEFT/LoRA)
Language(s) (NLP): English, Chinese
License: Apache-2.0 (inherited from CodeLlama)
Finetuned from model: codellama/CodeLlama-7b-Instruct-hf

Model Sources

Repository: https://huggingface.co/mingyue0101/super-cool-instruct
Dataset: https://huggingface.co/datasets/mingyue0101/prompt_code_parquet

Uses

Direct Use

The model can be used for text generation, code assistance, and general-purpose instruction following. It is particularly suited for tasks where a balance of technical coding knowledge and conversational instruction following is required.

Out-of-Scope Use

The model should not be used for high-stakes decision-making, generating malicious code, or any application that violates the safety guidelines of the base CodeLlama model.

Bias, Risks, and Limitations

This model may inherit biases present in the training data or the base model. Since it was fine-tuned on a specific dataset (parquet02), it might exhibit limitations when handling domains outside of its training distribution. Users should expect potential hallucinations in complex reasoning tasks.

Recommendations

Users are encouraged to use safety filters when deploying this model in production and to perform domain-specific evaluation before use.

How to Get Started with the Model

Use the code below to load the model in 4-bit precision:

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import PeftModel

model_id = "codellama/CodeLlama-7b-Instruct-hf"
peft_model_id = "mingyue0101/super-cool-instruct"

# Load 4-bit configuration
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16,
)

# Load base model and tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_id)
base_model = AutoModelForCausalLM.from_pretrained(
    model_id, 
    quantization_config=bnb_config, 
    device_map="auto"
)

# Load the fine-tuned adapter
model = PeftModel.from_pretrained(base_model, peft_model_id)

# Inference
prompt = "Write a Python function to sort a list."
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=128)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Training Details

Training Data

The model was trained on the mingyue0101/parquet02 dataset. This dataset contains instruction-response pairs formatted for Supervised Fine-Tuning (SFT).

Training Procedure

Training Hyperparameters

Training regime: QLoRA 4-bit (NF4) mixed precision (fp16)
Learning rate: 2e-4
Optimizer: paged_adamw_32bit
Batch size: 4
Epochs: 1
LoRA Rank (r): 64
LoRA Alpha: 16
LoRA Dropout: 0.1
LR Scheduler: constant
Warmup Ratio: 0.03

Technical Specifications

Model Architecture and Objective

Based on the Llama 2 architecture, this model utilizes grouped-query attention (GQA) and rotary positional embeddings (RoPE), fine-tuned with a causal language modeling objective.

Compute Infrastructure

Software

PEFT 0.10.0
Transformers
Bitsandbytes
TRL (SFTTrainer)

Downloads last month: 1,096

Model tree for mingyue0101/super-cool-instruct

Base model

codellama/CodeLlama-7b-Instruct-hf

Adapter

(374)

this model

mingyue0101
/

super-cool-instruct