Instructions to use tinycompany/ShawtyIsBad-ib with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use tinycompany/ShawtyIsBad-ib with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="tinycompany/ShawtyIsBad-ib")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("tinycompany/ShawtyIsBad-ib") model = AutoModelForCausalLM.from_pretrained("tinycompany/ShawtyIsBad-ib") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use tinycompany/ShawtyIsBad-ib with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "tinycompany/ShawtyIsBad-ib" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "tinycompany/ShawtyIsBad-ib", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/tinycompany/ShawtyIsBad-ib
- SGLang
How to use tinycompany/ShawtyIsBad-ib with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "tinycompany/ShawtyIsBad-ib" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "tinycompany/ShawtyIsBad-ib", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "tinycompany/ShawtyIsBad-ib" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "tinycompany/ShawtyIsBad-ib", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use tinycompany/ShawtyIsBad-ib with Docker Model Runner:
docker model run hf.co/tinycompany/ShawtyIsBad-ib
WE are COOKED
Test Log 08 March 2025
First Test:
Mean Perplexity : tested on wikitext-2-raw-v1, ~2k English samples was 1420.7414870547489
Second Test
Evaluated the tokenizer's performance on:
- Unicode coverage.
- Token distribution.
- Tokenization complexity across different scripts.
- Encoding and decoding capabilities &
- Edge cases e.g., special characters, numbers, etc.
- 1k samples: 500 Hindi, 500 English
1. Edge Case Handling
| Language | Test Type | Token Count | Unique Tokens |
|---|---|---|---|
| Hindi | Script Test | 14 | 13 |
| Unicode Test | 21 | 21 | |
| Special Characters | 19 | 19 | |
| English | Script Test | 16 | 15 |
| Unicode Test | 14 | 14 | |
| Special Characters | 18 | 18 |
2. Unicode Coverage
| Language | Coverage Ratio | Token Count | Unique Tokens |
|---|---|---|---|
| Hindi | 100% | 21 | 21 |
| English | 100% | 14 | 14 |
3. Complexity
| Language | Original Length | Token Count | Avg Token Length | Token Diversity |
|---|---|---|---|---|
| Hindi | 49 | 14 | 9.07 | 0.928 |
| English | 65 | 16 | 4.06 | 0.937 |
4. Encoding-Decoding Capabilities
Hindi Analysis:
Original Text: नमस्ते, मैं भारत से हूँ। दिल्ली बहुत बड़ा शहर है।
Token IDs Count: 14
Token Strings: ['नम', 'सà¥įतà¥ĩ', ',', 'Ġमà¥Īà¤Ĥ', 'Ġà¤Ńारत', 'Ġसà¥ĩ', 'Ġहà¥Ĥà¤ģ', '।', 'Ġदिलà¥įलà¥Ģ', 'Ġबहà¥ģत', 'Ġबड़ा', 'Ġशहर', 'Ġहà¥Ī', '।']
Decoded Text: नमस्ते, मैं भारत से हूँ। दिल्ली बहुत बड़ा शहर है।
Text Reconstruction: True
Hindi Analysis:
Original Text: हिंदी भाषा बहुत सुंदर है।
Token IDs Count: 7
Token Strings: ['ह', 'िà¤Ĥदà¥Ģ', 'Ġà¤Ńाषा', 'Ġबहà¥ģत', 'Ġसà¥ģà¤Ĥदर', 'Ġहà¥Ī', '।']
Decoded Text: हिंदी भाषा बहुत सुंदर है।
Text Reconstruction: True
Hindi Analysis:
Original Text: मुझे किताबें पढ़ना पसंद है।
Token IDs Count: 7
Token Strings: ['म', 'à¥ģà¤Ŀà¥ĩ', 'Ġà¤ķिताबà¥ĩà¤Ĥ', 'Ġपढ़ना', 'Ġपसà¤Ĥद', 'Ġहà¥Ī', '।']
Decoded Text: मुझे किताबें पढ़ना पसंद है।
Text Reconstruction: True
Hindi Analysis:
Original Text: यह एक उदाहरण वाक्य है।
Token IDs Count: 6
Token Strings: ['यह', 'Ġà¤ıà¤ķ', 'Ġà¤īदाहरण', 'Ġवाà¤ķà¥įय', 'Ġहà¥Ī', '।']
Decoded Text: यह एक उदाहरण वाक्य है।
Text Reconstruction: True
English Analysis:
Original Text: Hello, I am from India. Delhi is a big city.
Token IDs Count: 13
Token Strings: ['Hello', ',', 'ĠI', 'Ġam', 'Ġfrom', 'ĠIndia', '.', 'ĠDelhi', 'Ġis', 'Ġa', 'Ġbig', 'Ġcity', '.']
Decoded Text: Hello, I am from India. Delhi is a big city.
Text Reconstruction: True
English Analysis:
Original Text: The English language is widely spoken.
Token IDs Count: 7
Token Strings: ['The', 'ĠEnglish', 'Ġlanguage', 'Ġis', 'Ġwidely', 'Ġspoken', '.']
Decoded Text: The English language is widely spoken.
Text Reconstruction: True
English Analysis:
Original Text: I enjoy reading books.
Token IDs Count: 5
Token Strings: ['I', 'Ġenjoy', 'Ġreading', 'Ġbooks', '.']
Decoded Text: I enjoy reading books.
Text Reconstruction: True
English Analysis:
Original Text: This is an example sentence.
Token IDs Count: 6
Token Strings: ['This', 'Ġis', 'Ġan', 'Ġexample', 'Ġsentence', '.']
Decoded Text: This is an example sentence.
Text Reconstruction: True
- Downloads last month
- 12

