BrtGPT-124m-Base / README.md

Bertug1911

Update README.md

e782d2f 6 months ago

preview code

raw

history blame

6.3 kB

metadata

license: cc-by-nc-4.0
language:
  - en
pipeline_tag: text-generation
model-index:
  - name: BrtGPT 124M-Base
    results:
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: IFEval (0-Shot)
          type: HuggingFaceH4/ifeval
          args:
            num_few_shot: 0
        metrics:
          - type: inst_level_strict_acc and prompt_level_strict_acc
            value: 1
            name: strict accuracy
        source:
          url: >-
            https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=MaziyarPanahi/calme-3.2-instruct-78b
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: BBH (3-Shot)
          type: BBH
          args:
            num_few_shot: 3
        metrics:
          - type: acc_norm
            value: 1
            name: normalized accuracy
        source:
          url: >-
            https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=MaziyarPanahi/calme-3.2-instruct-78b
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: MATH Lvl 5 (4-Shot)
          type: hendrycks/competition_math
          args:
            num_few_shot: 4
        metrics:
          - type: exact_match
            value: 1
            name: exact match
        source:
          url: >-
            https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=MaziyarPanahi/calme-3.2-instruct-78b
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: GPQA (0-shot)
          type: Idavidrein/gpqa
          args:
            num_few_shot: 0
        metrics:
          - type: acc_norm
            value: 1
            name: acc_norm
        source:
          url: >-
            https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=MaziyarPanahi/calme-3.2-instruct-78b
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: MuSR (0-shot)
          type: TAUR-Lab/MuSR
          args:
            num_few_shot: 0
        metrics:
          - type: acc_norm
            value: 1
            name: acc_norm
        source:
          url: >-
            https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=MaziyarPanahi/calme-3.2-instruct-78b
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: MMLU-PRO (5-shot)
          type: TIGER-Lab/MMLU-Pro
          config: main
          split: test
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 1
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=MaziyarPanahi/calme-3.2-instruct-78b
          name: Open LLM Leaderboard

BrtGPT-1-124M-Base

This model trained on 306K (306.000) Tokens, With "English" Sentences.
Model is not for QA (UNLIKE ChatGPT or LLama), this model is only Pre-Trained on Large Corpus, so this is a Base Model

Model Details

Model Description

Developed by: Bertug Gunel (Bertuğ Günel)
Funded by [optional]: Nobody
Shared by [optional]: Nobody
Model type: Decoder-Only Transformer
Language(s) (NLP): Turkish
License: CC-BY-NC-4.0
Finetuned from model [optional]: Not Fine-Tuned

Model Sources [optional]

Repository: Cooming Soon!
Paper [optional]: "Attention All You Need", 1706.03762
Demo [optional]: Model is already a demo model.

Uses

This codes, loads the model (BrtGPT-124M-Base), you can use it!

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_name = "Bertug1911/BrtGPT-124m-Base"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

input_text = "They're hiding"
inputs = tokenizer(input_text, return_tensors="pt")

# Test etmek istediğin repetition_penalty değerleri
penalty_values = [penalty]

print(f"🧪 Input: {input_text}\n")

for penalty in penalty_values:
    outputs = model.generate(
        inputs["input_ids"],
        max_new_tokens=30,
        do_sample=True,
        temperature=1.2,
        top_p=0.95,
        repetition_penalty=penalty
    )

    generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
    generated_text = generated_text.replace(" ", "")
    generated_text = generated_text.replace("Ġ", " ")

    print(f"📌 repetition_penalty={penalty}:\n{generated_text}\n{'-'*50}")

NEW: Repetition Penalty Added!

Direct Use

You Can't Direct use! (Web use is, coming soon!)

Out-of-Scope Use

Model only generates (completes) "English" sentences with "English" tokens.

Bias, Risks, and Limitations

Model can generates: "Political" contents, USE YOUR OWN RISK

Recommendations

No big risks or biases!

How to Get Started with the Model

You can use model for generating English texts.

Training Details

Training Data

Model trained on: Train.csv (300k tokens, 100 lines)

Data Type	Training Type	Tokens (Total)
Raw (sentences)	Pre Training	About 306K
Instruction (Coming soon!)	Cooming soon!	Cooming soon!

Training Procedure

Model Trained on: A100 GPU (Nvidia) for 15 Minutes.

Training Hyperparameters

Training regime: Training precision is: FP32, Sparsity: ON

Evaluation

Environmental Impact

Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).

Hardware Type: GPU
Hours used: 0.25 (15 Mins)
Cloud Provider: "Runpod" (https://www.runpod.io/)
Compute Region: EU
Carbon Emitted: 0.024 KG (24 Gram(s))

Model Card Authors [optional]

Bertug Gunel
Turkey/Eskisehir

Model Card Contact

bertugscpmail@gmail.com