|
--- |
|
license: mit |
|
datasets: |
|
- vector-institute/open-pmc |
|
metrics: |
|
- accuracy |
|
- f1 |
|
- recall |
|
--- |
|
<div align="center"> |
|
<img src="https://github.com/VectorInstitute/pmc-data-extraction/blob/0a969136344a07267bb558d01f3fe76b36b93e1a/media/open-pmc-pipeline.png?raw=true" |
|
alt="Open-PMC Pipeline" |
|
width="1000" /> |
|
</div> |
|
|
|
<p align="center"> |
|
<strong>Arxiv:</strong> <a href="http://arxiv.org/abs/2503.14377" target="_blank">Arxiv</a> |
|
| |
|
<strong>Code:</strong> <a href="https://github.com/VectorInstitute/pmc-data-extraction" target="_blank">Open-PMC Github</a> |
|
| |
|
<strong>Dataset:</strong> <a href="https://huggingface.co/datasets/vector-institute/open-pmc" target="_blank">Hugging Face</a> |
|
</p> |
|
|
|
|
|
## Model Overview |
|
|
|
This model is a checkpoint trained on the **Open-PMC** dataset. It utilizes a **Vision Transformer (ViT-base16)** as the backbone for visual feature extraction and **PubMedBERT** for processing text data. The model is trained using **Contrastive Learning** with the **vanilla Info-NCE loss** to learn meaningful representations across different modalities. |
|
|
|
## Model Architecture |
|
|
|
- **Vision Backbone**: ViT-B/16 (Pretrained on ImageNet) |
|
- **Text Backbone**: PubMedBERT (Pretrained on PubMedCentral Abstracts) |
|
- **Training Objective**: Contrastive Learning with **Info-NCE Loss** |
|
|
|
## Training Framework |
|
|
|
The model was trained using the **mmlearn** framework, which is designed for multimodal learning. You can find more information and access the framework [here](https://github.com/vectorInstitute/mmlearn). |
|
|
|
## How to Use |
|
|
|
Please visit out GitHub for information on how to run benchmarking using this checkpoint |
|
|