Add pipeline tag
Browse filesThis PR adds the `pipeline_tag: any-to-any` to the model card metadata, ensuring the model is discoverable through the Hugging Face model search functionality (https://huggingface.co/models?pipeline_tag=any-to-any).
README.md
CHANGED
|
@@ -1,4 +1,8 @@
|
|
| 1 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2 |
tags:
|
| 3 |
- drug-discovery
|
| 4 |
- ibm
|
|
@@ -9,32 +13,18 @@ tags:
|
|
| 9 |
- affinity
|
| 10 |
- safetensors
|
| 11 |
- biomed-multi-alignment
|
| 12 |
-
|
| 13 |
-
library_name: biomed-multi-alignment
|
| 14 |
-
base_model:
|
| 15 |
-
- ibm/biomed.omics.bl.sm.ma-ted-458m
|
| 16 |
---
|
| 17 |
|
| 18 |
-
|
| 19 |
-
|
| 20 |
-
|
| 21 |
-
|
| 22 |
-
|
| 23 |
-
|
| 24 |
-
|
| 25 |
-
|
| 26 |
-
|
| 27 |
-
to the vast variability in TCR sequences. Accurate prediction of TCR-peptide binding
|
| 28 |
-
from sequence data could revolutionize immunology by offering deeper insights
|
| 29 |
-
into a patient’s immune status and disease history. This capability holds potential
|
| 30 |
-
applications in personalized immunotherapy, early diagnosis, and the treatment of
|
| 31 |
-
diseases such as cancer and autoimmune disorders. In silico tools designed to model
|
| 32 |
-
TCR-peptide interactions could also facilitate the study of therapeutic T-cell efficacy
|
| 33 |
-
and assess cross-reactivity risks, presenting a transformative opportunity for precision
|
| 34 |
-
medicine.
|
| 35 |
-
|
| 36 |
-
The benchmark defined in: https://academic.oup.com/bioinformatics/article/37/Supplement_1/i237/6319659?login=false
|
| 37 |
-
Data retrieved from: https://tdcommons.ai/multi_pred_tasks/tcrepitope
|
| 38 |
|
| 39 |
## Model Summary
|
| 40 |
|
|
@@ -54,7 +44,7 @@ pip install git+https://github.com/BiomedSciAI/biomed-multi-alignment.git
|
|
| 54 |
|
| 55 |
A simple example for a task already supported by `ibm/biomed.omics.bl.sm.ma-ted-458m.tcr_epitope_bind`:
|
| 56 |
|
| 57 |
-
```
|
| 58 |
from mammal.examples.tcr_epitope_binding.main_infer import load_model, task_infer
|
| 59 |
|
| 60 |
tcr_beta_seq = "NAGVTQTPKFQVLKTGQSMTLQCAQDMNHEYMSWYRQDPGMGLRLIHYSVGAGITDQGEVPNGYNVSRSTTEDFPLRLLSAAPSQTSVYFCASSYSWDRVLEQYFGPGTRLTVT"
|
|
@@ -70,7 +60,7 @@ result = task_infer(
|
|
| 70 |
print(f"The prediction for {epitope_seq} and {tcr_beta_seq} is {result}")
|
| 71 |
```
|
| 72 |
|
| 73 |
-
See our detailed example at:
|
| 74 |
|
| 75 |
|
| 76 |
## Citation
|
|
|
|
| 1 |
---
|
| 2 |
+
base_model:
|
| 3 |
+
- ibm/biomed.omics.bl.sm.ma-ted-458m
|
| 4 |
+
library_name: biomed-multi-alignment
|
| 5 |
+
license: apache-2.0
|
| 6 |
tags:
|
| 7 |
- drug-discovery
|
| 8 |
- ibm
|
|
|
|
| 13 |
- affinity
|
| 14 |
- safetensors
|
| 15 |
- biomed-multi-alignment
|
| 16 |
+
pipeline_tag: any-to-any
|
|
|
|
|
|
|
|
|
|
| 17 |
---
|
| 18 |
|
| 19 |
+
# Paper title and link
|
| 20 |
+
|
| 21 |
+
The model was presented in the paper [MAMMAL -- Molecular Aligned Multi-Modal Architecture and Language](https://huggingface.co/papers/2410.22367).
|
| 22 |
+
|
| 23 |
+
# Paper abstract
|
| 24 |
+
|
| 25 |
+
The abstract of the paper is the following:
|
| 26 |
+
|
| 27 |
+
Drug discovery typically consists of multiple steps, including identifying a target protein key to a disease's etiology, validating that interacting with this target could prevent symptoms or cure the disease, discovering a small molecule or biologic therapeutic to interact with it, and optimizing the candidate molecule through a complex landscape of required properties. Drug discovery related tasks often involve prediction and generation while considering multiple entities that potentially interact, which poses a challenge for typical AI models. For this purpose we present MAMMAL - Molecular Aligned Multi-Modal Architecture and Language - a method that we applied to create a versatile multi-task multi-align foundation model that learns from large-scale biological datasets (2 billion samples) across diverse modalities, including proteins, small molecules, and genes. We introduce a prompt syntax that supports a wide range of classification, regression, and generation tasks. It allows combining different modalities and entity types as inputs and/or outputs. Our model handles combinations of tokens and scalars and enables the generation of small molecules and proteins, property prediction, and transcriptomic lab test predictions. We evaluated the model on 11 diverse downstream tasks spanning different steps within a typical drug discovery pipeline, where it reaches new SOTA in 9 tasks and is comparable to SOTA in 2 tasks. This performance is achieved while using a unified architecture serving all tasks, in contrast to the original SOTA performance achieved using tailored architectures. The model code and pretrained weights are publicly available at https://github.com/BiomedSciAI/biomed-multi-alignment and https://huggingface.co/ibm/biomed.omics.bl.sm.ma-ted-458m.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 28 |
|
| 29 |
## Model Summary
|
| 30 |
|
|
|
|
| 44 |
|
| 45 |
A simple example for a task already supported by `ibm/biomed.omics.bl.sm.ma-ted-458m.tcr_epitope_bind`:
|
| 46 |
|
| 47 |
+
```python
|
| 48 |
from mammal.examples.tcr_epitope_binding.main_infer import load_model, task_infer
|
| 49 |
|
| 50 |
tcr_beta_seq = "NAGVTQTPKFQVLKTGQSMTLQCAQDMNHEYMSWYRQDPGMGLRLIHYSVGAGITDQGEVPNGYNVSRSTTEDFPLRLLSAAPSQTSVYFCASSYSWDRVLEQYFGPGTRLTVT"
|
|
|
|
| 60 |
print(f"The prediction for {epitope_seq} and {tcr_beta_seq} is {result}")
|
| 61 |
```
|
| 62 |
|
| 63 |
+
See our detailed example at: https://github.com/BiomedSciAI/biomed-multi-alignment
|
| 64 |
|
| 65 |
|
| 66 |
## Citation
|