Add pipeline tag

This PR adds the `pipeline_tag: any-to-any` to the model card metadata, ensuring the model is discoverable through the Hugging Face model search functionality (https://huggingface.co/models?pipeline_tag=any-to-any).

Files changed (1) hide show

README.md +16 -26

README.md CHANGED Viewed

@@ -1,4 +1,8 @@
 ---
 tags:
 - drug-discovery
 - ibm
@@ -9,32 +13,18 @@ tags:
 - affinity
 - safetensors
 - biomed-multi-alignment
-license: apache-2.0
-library_name: biomed-multi-alignment
-base_model:
-- ibm/biomed.omics.bl.sm.ma-ted-458m
 ---
-T-cell receptor (TCR) binding to immunogenic peptides (epitopes) presented by major
-histocompatibility complex (MHC) molecules is a critical mechanism in the adaptive
-immune system, essential for antigen recognition and triggering immune responses.
-The T-cell receptor (TCR) repertoire exhibits considerable diversity, consisting of an
-α-chain and a β-chain that function together to enable T cells to recognize a wide
-array of epitopes. The β-chain is especially significant, as it is crucial for the early
-stages of T-cell development and possesses greater variability, which enhances the
-TCR’s capacity to identify diverse pathogens effectively. However, understanding the
-specific interactions between TCRs and epitopes remains a significant challenge due
-to the vast variability in TCR sequences. Accurate prediction of TCR-peptide binding
-from sequence data could revolutionize immunology by offering deeper insights
-into a patient’s immune status and disease history. This capability holds potential
-applications in personalized immunotherapy, early diagnosis, and the treatment of
-diseases such as cancer and autoimmune disorders. In silico tools designed to model
-TCR-peptide interactions could also facilitate the study of therapeutic T-cell efficacy
-and assess cross-reactivity risks, presenting a transformative opportunity for precision
-medicine.
-The benchmark defined in: https://academic.oup.com/bioinformatics/article/37/Supplement_1/i237/6319659?login=false
-Data retrieved from: https://tdcommons.ai/multi_pred_tasks/tcrepitope
 ## Model Summary
@@ -54,7 +44,7 @@ pip install git+https://github.com/BiomedSciAI/biomed-multi-alignment.git
 A simple example for a task already supported by `ibm/biomed.omics.bl.sm.ma-ted-458m.tcr_epitope_bind`:
-```
 from mammal.examples.tcr_epitope_binding.main_infer import load_model, task_infer
 tcr_beta_seq = "NAGVTQTPKFQVLKTGQSMTLQCAQDMNHEYMSWYRQDPGMGLRLIHYSVGAGITDQGEVPNGYNVSRSTTEDFPLRLLSAAPSQTSVYFCASSYSWDRVLEQYFGPGTRLTVT"
@@ -70,7 +60,7 @@ result = task_infer(
 print(f"The prediction for {epitope_seq} and {tcr_beta_seq} is {result}")
 ```
-See our detailed example at: on `https://github.com/BiomedSciAI/biomed-multi-alignment`
 ## Citation

 ---
+base_model:
+- ibm/biomed.omics.bl.sm.ma-ted-458m
+library_name: biomed-multi-alignment
+license: apache-2.0
 tags:
 - drug-discovery
 - ibm
 - affinity
 - safetensors
 - biomed-multi-alignment
+pipeline_tag: any-to-any
 ---
+# Paper title and link
+The model was presented in the paper [MAMMAL -- Molecular Aligned Multi-Modal Architecture and Language](https://huggingface.co/papers/2410.22367).
+# Paper abstract
+The abstract of the paper is the following:
+Drug discovery typically consists of multiple steps, including identifying a target protein key to a disease's etiology, validating that interacting with this target could prevent symptoms or cure the disease, discovering a small molecule or biologic therapeutic to interact with it, and optimizing the candidate molecule through a complex landscape of required properties. Drug discovery related tasks often involve prediction and generation while considering multiple entities that potentially interact, which poses a challenge for typical AI models. For this purpose we present MAMMAL - Molecular Aligned Multi-Modal Architecture and Language - a method that we applied to create a versatile multi-task multi-align foundation model that learns from large-scale biological datasets (2 billion samples) across diverse modalities, including proteins, small molecules, and genes. We introduce a prompt syntax that supports a wide range of classification, regression, and generation tasks. It allows combining different modalities and entity types as inputs and/or outputs. Our model handles combinations of tokens and scalars and enables the generation of small molecules and proteins, property prediction, and transcriptomic lab test predictions. We evaluated the model on 11 diverse downstream tasks spanning different steps within a typical drug discovery pipeline, where it reaches new SOTA in 9 tasks and is comparable to SOTA in 2 tasks. This performance is achieved while using a unified architecture serving all tasks, in contrast to the original SOTA performance achieved using tailored architectures. The model code and pretrained weights are publicly available at https://github.com/BiomedSciAI/biomed-multi-alignment and https://huggingface.co/ibm/biomed.omics.bl.sm.ma-ted-458m.
 ## Model Summary
 A simple example for a task already supported by `ibm/biomed.omics.bl.sm.ma-ted-458m.tcr_epitope_bind`:
+```python
 from mammal.examples.tcr_epitope_binding.main_infer import load_model, task_infer
 tcr_beta_seq = "NAGVTQTPKFQVLKTGQSMTLQCAQDMNHEYMSWYRQDPGMGLRLIHYSVGAGITDQGEVPNGYNVSRSTTEDFPLRLLSAAPSQTSVYFCASSYSWDRVLEQYFGPGTRLTVT"
 print(f"The prediction for {epitope_seq} and {tcr_beta_seq} is {result}")
 ```
+See our detailed example at: https://github.com/BiomedSciAI/biomed-multi-alignment
 ## Citation