LFM2-VL-450M-jp (Japanese)

Model Description

LFM2-VL-450M-jp is a Japanese fine-tuned variant of LiquidAI/LFM2-VL-450M, optimized for Japanese vision-language tasks. This model maintains the efficiency and low-latency characteristics of the original LFM2-VL architecture while specializing in Japanese language understanding and image description.

Developed by: Alfaxad
Base Model: LiquidAI/LFM2-VL-450M
Model type: Vision-Language Model (Multimodal)
Language: Japanese (日本語)
License: LFM Open License v1.0
Finetuned from: LiquidAI/LFM2-VL-450M (450M parameters)

Key Features

Japanese Language Support: Specialized for Japanese image understanding and description tasks
Efficient Architecture: Maintains the 450M parameter count (350M LM + 86M vision encoder)
Low Latency: Optimized for edge AI applications and resource-constrained environments
Multi-turn Conversations: Trained on conversational data for interactive vision-language tasks
Native Resolution Processing: Handles images up to 512×512 pixels without upscaling

Model Details

Property	Value
Parameters (LM only)	350M
Vision encoder	SigLIP2 NaFlex base (86M)
Backbone layers	hybrid conv+attention
Context (text)	32,768 tokens
Image tokens	dynamic, user-tunable
Vocab size	65,536
Precision	bfloat16

Training Data

The model was fine-tuned on approximately 98,000 multi-turn conversational samples from:

Dataset: llm-jp/ja-vg-vqa-conversation
Content: Japanese visual question-answering conversations
Format: Multi-turn dialogues with image context

Intended Use

Primary Use Cases

Japanese image captioning and description
Visual question answering in Japanese
Multi-turn conversations about images in Japanese
Japanese document understanding and OCR tasks
Edge AI applications requiring Japanese language support

Recommended Applications

Japanese e-commerce product description
Japanese accessibility tools for visual content
Japanese educational applications
Japanese content moderation and analysis
Japanese chatbots with visual understanding

How to Use

Installation

pip install -U transformers pillow

Basic Usage

from transformers import AutoProcessor, AutoModelForImageTextToText
from transformers.image_utils import load_image

# Load model and processor
model_id = "Alfaxad/LFM2-VL-450M-jp"
model = AutoModelForImageTextToText.from_pretrained(
    model_id,
    device_map="auto",
    torch_dtype="bfloat16",
    trust_remote_code=True
)
processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True)

# Load image and create conversation in Japanese
image = load_image("your_image_url_or_path.jpg")
conversation = [
    {
        "role": "user",
        "content": [
            {"type": "image", "image": image},
            {"type": "text", "text": "この画像には何が写っていますか？"},
        ],
    },
]

# Generate response
inputs = processor.apply_chat_template(
    conversation,
    add_generation_prompt=True,
    return_tensors="pt",
    return_dict=True,
    tokenize=True,
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=128)
response = processor.batch_decode(outputs, skip_special_tokens=True)[0]
print(response)

Recommended Generation Parameters

Temperature: 0.1
min_p: 0.15
repetition_penalty: 1.05
min_image_tokens: 64
max_image_tokens: 256
do_image_splitting: True

Chat Template

The model uses a ChatML-like format:

<|startoftext|><|im_start|>system
あなたはLiquid AIによる有用なマルチモーダルアシスタントです。<|im_end|>
<|im_start|>user
<image>この画像を説明してください。<|im_end|>
<|im_start|>assistant
この画像には...<|im_end|>

Training Details

Training Procedure

Base Model: LiquidAI/LFM2-VL-450M
Fine-tuning Method: Supervised Fine-Tuning (SFT) with LoRA adapters
Framework: Hugging Face TRL (Transformer Reinforcement Learning)
Training Data: ~98,000 multi-turn conversations
Training Regime: bfloat16 mixed precision

Training Hyperparameters

Training approach: LoRA (Low-Rank Adaptation) fine-tuning
Dataset size: ~98,000 samples
Data format: Multi-turn conversational VQA

Performance Considerations

As a fine-tuned variant of LFM2-VL-450M:

Optimized for Japanese: Best performance on Japanese language tasks
Resource Efficient: Suitable for edge devices and constrained environments
Recommended Use: Fine-tune further on specific Japanese use cases for optimal performance

Note: This is a specialized model for Japanese. For English tasks, consider using the original LiquidAI/LFM2-VL-450M.

Limitations

Language Specialization: Primarily designed for Japanese; performance on other languages may be limited
Model Size: As a 450M parameter model, it may not match the capabilities of larger models on complex reasoning tasks
Domain Specificity: Performance is optimized for the types of conversations present in the training data
Safety: Not intended for safety-critical decisions without additional validation
Narrow Use Cases: Best results when fine-tuned on specific downstream tasks

Ethical Considerations

Bias: The model may reflect biases present in the training data (ja-vg-vqa-conversation dataset)
Misuse Potential: Should not be used for generating misleading or harmful content
Privacy: Do not process images containing sensitive personal information without appropriate consent
Cultural Context: Trained on Japanese data; cultural nuances should be considered

Citation

If you use this model, please cite both the original LFM2-VL model and this fine-tuned variant:

@misc{lfm2-vl-450m-jp,
  author = {Alfaxad},
  title = {LFM2-VL-450M-jp: Japanese Fine-tuned Vision-Language Model},
  year = {2025},
  publisher = {HuggingFace},
  url = {https://huggingface.co/Alfaxad/LFM2-VL-450M-jp}
}

@misc{liquid-lfm2-vl,
  author = {Liquid AI},
  title = {LFM2-VL: Efficient Vision-Language Models},
  year = {2025},
  url = {https://huggingface.co/LiquidAI/LFM2-VL-450M}
}

Acknowledgments

Base Model: Liquid AI for the LFM2-VL architecture
Training Data: llm-jp for the ja-vg-vqa-conversation dataset
Framework: Hugging Face for transformers and TRL libraries

Additional Resources

Original Model: LiquidAI/LFM2-VL-450M
Training Dataset: llm-jp/ja-vg-vqa-conversation
LFM2-VL Blog Post: Liquid AI Blog

Downloads last month: 32

Safetensors

Model size

0.5B params

Tensor type

BF16

Model tree for Alfaxad/LFM2-VL-450M-jp

Base model

LiquidAI/LFM2-VL-450M

Finetuned

(14)

this model