JPharmatron-7B

JPharmatron-7B is a 7B large language model designed for pharmaceutical applications and researches.

Model Details

Model Description

The JPharmatron-7B is continually pre-trained using 8.8B tokens from Japanese and English datasets, based on Qwen2.5-7B. Compared to the JPharmatron-7B-base model, JPharmatron-7B has enhanced chat capabilities, obtained from Qwen2.5-7B-Instruct's chat vector.

Developed by: EQUES Inc.
Funded by [optional]: GENIAC Project
Model type: Causal decoder-only
Language(s) (NLP): Japanese, English
License: CC-BY-SA-4.0

Model Sources [optional]

Repository: https://github.com/EQUES-Inc/pharma-LLM-eval
Paper [optional]: A Japanese Language Model and Three New Evaluation Benchmarks for Pharmaceutical NLP

Uses

This model is intended for applications in pharmaceutical paperwork and research. It is not validated for medical use or any other risk-sensitive use.

Evaluation

We evaluated our model, JPharmatron-7B, with other general / domain-specific models of a similar size.

Testing Data

JPharmaBench and two existing benchmarks (JMMLU (pharma) and IgakuQA) were used.

Results

Compared to Meditron3-Qwen2.5-7B and Llama3.1-Swallow-8B-Instruct-v0.3, JPharmatron-7B achieved the highest score on all of the five benchmarks.

Citation [optional]

BibTeX:

@misc{sukeda_japanese_2025,
  title     = {A {Japanese} {Language} {Model} and {Three} {New} {Evaluation} {Benchmarks} for {Pharmaceutical} {NLP}},
  url       = {http://arxiv.org/abs/2505.16661},
  doi       = {10.48550/arXiv.2505.16661},
  abstract  = {We present a Japanese domain-specific language model for the pharmaceutical field, developed through continual pretraining on 2 billion Japanese pharmaceutical tokens and 8 billion English biomedical tokens. To enable rigorous evaluation, we introduce three new benchmarks: YakugakuQA, based on national pharmacist licensing exams; NayoseQA, which tests cross-lingual synonym and terminology normalization; and SogoCheck, a novel task designed to assess consistency reasoning between paired statements. We evaluate our model against both open-source medical LLMs and commercial models, including GPT-4o. Results show that our domain-specific model outperforms existing open models and achieves competitive performance with commercial ones, particularly on terminology-heavy and knowledge-based tasks. Interestingly, even GPT-4o performs poorly on SogoCheck, suggesting that cross-sentence consistency reasoning remains an open challenge. Our benchmark suite offers a broader diagnostic lens for pharmaceutical NLP, covering factual recall, lexical variation, and logical consistency. This work demonstrates the feasibility of building practical, secure, and cost-effective language models for Japanese domain-specific applications, and provides reusable evaluation resources for future research in pharmaceutical and healthcare NLP. Our model, codes, and datasets are released at https://github.com/EQUES-Inc/pharma-LLM-eval.},
  urldate   = {2025-05-30},
  publisher = {arXiv},
  author    = {Sukeda, Issey and Fujii, Takuro and Buma, Kosei and Sasaki, Shunsuke and Ono, Shinnosuke},
  month     = may,
  year      = {2025},
  note      = {arXiv:2505.16661 [cs]},
  annote    = {Comment: 15 pages, 9 tables, 5 figures}
}

More Information [optional]

See our preprint: A Japanese Language Model and Three New Evaluation Benchmarks for Pharmaceutical NLP.

Model Card Authors [optional]

@shinnosukeono

EQUES
/

JPharmatron-7B