LFM2-VL-450M-jp (Japanese)
Model Description
LFM2-VL-450M-jp is a Japanese fine-tuned variant of LiquidAI/LFM2-VL-450M, optimized for Japanese vision-language tasks. This model maintains the efficiency and low-latency characteristics of the original LFM2-VL architecture while specializing in Japanese language understanding and image description.
- Developed by: Alfaxad
- Base Model: LiquidAI/LFM2-VL-450M
- Model type: Vision-Language Model (Multimodal)
- Language: Japanese (日本語)
- License: LFM Open License v1.0
- Finetuned from: LiquidAI/LFM2-VL-450M (450M parameters)
Key Features
- Japanese Language Support: Specialized for Japanese image understanding and description tasks
- Efficient Architecture: Maintains the 450M parameter count (350M LM + 86M vision encoder)
- Low Latency: Optimized for edge AI applications and resource-constrained environments
- Multi-turn Conversations: Trained on conversational data for interactive vision-language tasks
- Native Resolution Processing: Handles images up to 512×512 pixels without upscaling
Model Details
Property | Value |
---|---|
Parameters (LM only) | 350M |
Vision encoder | SigLIP2 NaFlex base (86M) |
Backbone layers | hybrid conv+attention |
Context (text) | 32,768 tokens |
Image tokens | dynamic, user-tunable |
Vocab size | 65,536 |
Precision | bfloat16 |
Training Data
The model was fine-tuned on approximately 98,000 multi-turn conversational samples from:
- Dataset: llm-jp/ja-vg-vqa-conversation
- Content: Japanese visual question-answering conversations
- Format: Multi-turn dialogues with image context
Intended Use
Primary Use Cases
- Japanese image captioning and description
- Visual question answering in Japanese
- Multi-turn conversations about images in Japanese
- Japanese document understanding and OCR tasks
- Edge AI applications requiring Japanese language support
Recommended Applications
- Japanese e-commerce product description
- Japanese accessibility tools for visual content
- Japanese educational applications
- Japanese content moderation and analysis
- Japanese chatbots with visual understanding
How to Use
Installation
pip install -U transformers pillow
Basic Usage
from transformers import AutoProcessor, AutoModelForImageTextToText
from transformers.image_utils import load_image
# Load model and processor
model_id = "Alfaxad/LFM2-VL-450M-jp"
model = AutoModelForImageTextToText.from_pretrained(
model_id,
device_map="auto",
torch_dtype="bfloat16",
trust_remote_code=True
)
processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True)
# Load image and create conversation in Japanese
image = load_image("your_image_url_or_path.jpg")
conversation = [
{
"role": "user",
"content": [
{"type": "image", "image": image},
{"type": "text", "text": "この画像には何が写っていますか?"},
],
},
]
# Generate response
inputs = processor.apply_chat_template(
conversation,
add_generation_prompt=True,
return_tensors="pt",
return_dict=True,
tokenize=True,
).to(model.device)
outputs = model.generate(**inputs, max_new_tokens=128)
response = processor.batch_decode(outputs, skip_special_tokens=True)[0]
print(response)
Recommended Generation Parameters
- Temperature: 0.1
- min_p: 0.15
- repetition_penalty: 1.05
- min_image_tokens: 64
- max_image_tokens: 256
- do_image_splitting: True
Chat Template
The model uses a ChatML-like format:
<|startoftext|><|im_start|>system
あなたはLiquid AIによる有用なマルチモーダルアシスタントです。<|im_end|>
<|im_start|>user
<image>この画像を説明してください。<|im_end|>
<|im_start|>assistant
この画像には...<|im_end|>
Training Details
Training Procedure
- Base Model: LiquidAI/LFM2-VL-450M
- Fine-tuning Method: Supervised Fine-Tuning (SFT) with LoRA adapters
- Framework: Hugging Face TRL (Transformer Reinforcement Learning)
- Training Data: ~98,000 multi-turn conversations
- Training Regime: bfloat16 mixed precision
Training Hyperparameters
- Training approach: LoRA (Low-Rank Adaptation) fine-tuning
- Dataset size: ~98,000 samples
- Data format: Multi-turn conversational VQA
Performance Considerations
As a fine-tuned variant of LFM2-VL-450M:
- Optimized for Japanese: Best performance on Japanese language tasks
- Resource Efficient: Suitable for edge devices and constrained environments
- Recommended Use: Fine-tune further on specific Japanese use cases for optimal performance
Note: This is a specialized model for Japanese. For English tasks, consider using the original LiquidAI/LFM2-VL-450M.
Limitations
- Language Specialization: Primarily designed for Japanese; performance on other languages may be limited
- Model Size: As a 450M parameter model, it may not match the capabilities of larger models on complex reasoning tasks
- Domain Specificity: Performance is optimized for the types of conversations present in the training data
- Safety: Not intended for safety-critical decisions without additional validation
- Narrow Use Cases: Best results when fine-tuned on specific downstream tasks
Ethical Considerations
- Bias: The model may reflect biases present in the training data (ja-vg-vqa-conversation dataset)
- Misuse Potential: Should not be used for generating misleading or harmful content
- Privacy: Do not process images containing sensitive personal information without appropriate consent
- Cultural Context: Trained on Japanese data; cultural nuances should be considered
Citation
If you use this model, please cite both the original LFM2-VL model and this fine-tuned variant:
@misc{lfm2-vl-450m-jp,
author = {Alfaxad},
title = {LFM2-VL-450M-jp: Japanese Fine-tuned Vision-Language Model},
year = {2025},
publisher = {HuggingFace},
url = {https://huggingface.co/Alfaxad/LFM2-VL-450M-jp}
}
@misc{liquid-lfm2-vl,
author = {Liquid AI},
title = {LFM2-VL: Efficient Vision-Language Models},
year = {2025},
url = {https://huggingface.co/LiquidAI/LFM2-VL-450M}
}
Acknowledgments
- Base Model: Liquid AI for the LFM2-VL architecture
- Training Data: llm-jp for the ja-vg-vqa-conversation dataset
- Framework: Hugging Face for transformers and TRL libraries
Additional Resources
- Original Model: LiquidAI/LFM2-VL-450M
- Training Dataset: llm-jp/ja-vg-vqa-conversation
- LFM2-VL Blog Post: Liquid AI Blog
- Downloads last month
- 32
Model tree for Alfaxad/LFM2-VL-450M-jp
Base model
LiquidAI/LFM2-VL-450M