Arsh-llm: A Compact 500M Parameter Powerhouse πŸš€

Arsh-llm is a 500-million-parameter language model built on the Llama architecture, designed to shine in generating creative stories, coherent text, and functional code. Pretrained for 35 hours on a T4 GPU using a curated mix of small yet powerful datasets, and fine-tuned for 20 hours on conversational data, this model is a lean, mean, text-generating machine with massive potential. With a training loss between 1.2–1.9, it’s already showing promise and is ready to level up with more training. Buckle upβ€”this is just the beginning! 😎

Model Overview

  • Architecture: Llama-based causal language model
  • Parameters: 500M
  • Context Length: 128 tokens
  • Pretraining Duration: ~35 hours on NVIDIA T4 GPU
  • Fine-tuning Duration: ~20 hours on conversational datasets
  • Training Loss: 1.2–1.9 (with room to improve!)
  • Library: Transformers (Hugging Face)
  • License: MIT

Datasets

Arsh-llm was trained on a diverse set of datasets to ensure versatility in storytelling, text generation, and code-related tasks:

  • roneneldan/TinyStories: Short, creative stories for narrative generation.
  • Salesforce/wikitext: Wikipedia-based text for general knowledge and coherence.
  • abhinand/alpaca-gpt4-sharegpt: Instruction-based conversational data for task-oriented responses.
  • shibing624/sharegpt_gpt4: High-quality conversational data for chat-like interactions.
  • ChristophSchuhmann/basic-math-problems-with-step-by-step-solutions: Math problems with solutions to boost logical reasoning.

Fine-tuning was performed on a structured ShareGPT chat template to enhance conversational abilities, making Arsh-llm a great starting point for dialogue-based applications.

Use Cases

Arsh-llm is a versatile model with applications in:

  • Creative Writing: Generate engaging short stories or narrative prompts.
  • Code Generation: Produce functional code snippets for various programming tasks.
  • Conversational AI: Power chatbots or assistants with natural dialogue.
  • Educational Tools: Assist with math problem-solving or explain concepts step-by-step.

Note: This model is a work in progress. For production-grade performance, further pretraining on larger datasets and post-training on conversational data is recommended.

Getting Started

To use Arsh-llm, you can load it directly from Hugging Face:

import torch
from transformers import pipeline, set_seed

# Set up the text-generation pipeline
model_name = "arshiaafshani/Arsh-llm"
chatbot = pipeline(
    "text-generation",
    model=model_name,
    device=0 if torch.cuda.is_available() else -1
)

# Ensure that bos_token and eos_token are explicitly set as strings
chatbot.tokenizer.bos_token = "<sos>"
chatbot.tokenizer.eos_token = "<|endoftext|>"

# Set seed for reproducibility (optional)
set_seed(42)

print("Arsh llm is ready! Type 'exit' to end the conversation.")

# Initialize the conversation history
conversation_history = []

conversation_history.append({"role": "system", "content": "You are a helpful assistant."})

while True:
    user_input = input("You: ").strip()
    if user_input.lower() == "exit":
        print("Exited from the chat. Bye!")
        break

    # Append user message to the conversation history
    conversation_history.append({"role": "user", "content": user_input})

    # Prepare the messages with the conversation history and an empty assistant turn
    messages = conversation_history + [{"role": "assistant", "content": ""}]

    # Use the tokenizer's apply_chat_template() method to format the prompt.
    prompt = chatbot.tokenizer.apply_chat_template(messages, tokenize=False)

    # Generate text using the formatted prompt.
    response = chatbot(
        prompt,
        do_sample=True,
        max_new_tokens=512,
        top_k=50,
        temperature=0.6,
        num_return_sequences=1,
        repetition_penalty=1.1,
        pad_token_id=chatbot.tokenizer.eos_token_id,
        min_new_tokens=20
    )

    # The returned 'generated_text' includes the prompt plus the generation.
    full_text = response[0]["generated_text"]
    # Extract the assistant's response by removing the prompt portion.
    bot_response = full_text[len(prompt):].strip()
    print(f"Bot: {bot_response}")

Training Details

  • Pretraining: Conducted on a T4 GPU for ~35 hours using a mix of TinyStories, WikiText, and other datasets to build a strong foundation in text and story generation.
  • Fine-tuning: 20 hours on ShareGPT-based conversational data with a structured chat template to enhance dialogue capabilities.
  • Hardware: NVIDIA T4 GPU (15GB VRAM).
  • Training Loss: Achieved 1.2–1.9, indicating solid performance with significant potential for improvement through extended training.

Limitations

  • Current Stage: Arsh-llm is not yet fully optimized. It performs well for its size but requires additional training to compete with larger models.
  • Dataset Size: Pretrained on relatively small datasets, which limits its generalization. Scaling up to larger datasets will unlock its full potential.
  • Context Length: Limited to 128 tokens, which may constrain performance on longer sequences.
  • Not Production-Ready: This model is best used as a base for further fine-tuning rather than as a standalone solution.

Future Plans

The journey doesn’t end here! Arsh-llm is set to evolve with:

  • Extended Pretraining: Leveraging larger datasets for broader knowledge and better generalization.
  • Conversational Fine-tuning: Enhancing dialogue capabilities with advanced post-training techniques.
  • Benchmarking: Evaluating performance against similar models (e.g., TinyLlama, Phi-1.5) on tasks like MMLU, HumanEval, and GSM8K.
  • Community Feedback: Incorporating user insights to refine and improve the model.

Stay tunedβ€”Arsh-llm is on its way to becoming a legend! πŸ”₯

License

This model is licensed under the MIT License, allowing for flexible use in both research and commercial applications. Feel free to build upon, modify, or share it!

Acknowledgments

  • Built with ❀️ by Arshia Afshani.
  • Powered by the Hugging Face Transformers library.
  • Thanks to the open-source community for providing the amazing datasets that made this model possible.

Ready to take Arsh-llm for a spin? Clone it, train it, and let’s make it a superstar together! 🌟 For questions, feedback, or collabs, reach out via Hugging Face or open an issue in the repo.

Downloads last month
499
Safetensors
Model size
503M params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ 5 Ask for provider support

Model tree for arshiaafshani/Arsh-llm

Finetunes
1 model

Datasets used to train arshiaafshani/Arsh-llm

Space using arshiaafshani/Arsh-llm 1