nielsr HF Staff commited on
Commit
a042447
·
verified ·
1 Parent(s): f8371c4

Add pipeline tag

Browse files

This PR adds the `pipeline_tag: any-to-any` to the model card metadata, ensuring the model is discoverable through the Hugging Face model search functionality (https://huggingface.co/models?pipeline_tag=any-to-any).

Files changed (1) hide show
  1. README.md +16 -26
README.md CHANGED
@@ -1,4 +1,8 @@
1
  ---
 
 
 
 
2
  tags:
3
  - drug-discovery
4
  - ibm
@@ -9,32 +13,18 @@ tags:
9
  - affinity
10
  - safetensors
11
  - biomed-multi-alignment
12
- license: apache-2.0
13
- library_name: biomed-multi-alignment
14
- base_model:
15
- - ibm/biomed.omics.bl.sm.ma-ted-458m
16
  ---
17
 
18
- T-cell receptor (TCR) binding to immunogenic peptides (epitopes) presented by major
19
- histocompatibility complex (MHC) molecules is a critical mechanism in the adaptive
20
- immune system, essential for antigen recognition and triggering immune responses.
21
- The T-cell receptor (TCR) repertoire exhibits considerable diversity, consisting of an
22
- α-chain and a β-chain that function together to enable T cells to recognize a wide
23
- array of epitopes. The β-chain is especially significant, as it is crucial for the early
24
- stages of T-cell development and possesses greater variability, which enhances the
25
- TCR’s capacity to identify diverse pathogens effectively. However, understanding the
26
- specific interactions between TCRs and epitopes remains a significant challenge due
27
- to the vast variability in TCR sequences. Accurate prediction of TCR-peptide binding
28
- from sequence data could revolutionize immunology by offering deeper insights
29
- into a patient’s immune status and disease history. This capability holds potential
30
- applications in personalized immunotherapy, early diagnosis, and the treatment of
31
- diseases such as cancer and autoimmune disorders. In silico tools designed to model
32
- TCR-peptide interactions could also facilitate the study of therapeutic T-cell efficacy
33
- and assess cross-reactivity risks, presenting a transformative opportunity for precision
34
- medicine.
35
-
36
- The benchmark defined in: https://academic.oup.com/bioinformatics/article/37/Supplement_1/i237/6319659?login=false
37
- Data retrieved from: https://tdcommons.ai/multi_pred_tasks/tcrepitope
38
 
39
  ## Model Summary
40
 
@@ -54,7 +44,7 @@ pip install git+https://github.com/BiomedSciAI/biomed-multi-alignment.git
54
 
55
  A simple example for a task already supported by `ibm/biomed.omics.bl.sm.ma-ted-458m.tcr_epitope_bind`:
56
 
57
- ```
58
  from mammal.examples.tcr_epitope_binding.main_infer import load_model, task_infer
59
 
60
  tcr_beta_seq = "NAGVTQTPKFQVLKTGQSMTLQCAQDMNHEYMSWYRQDPGMGLRLIHYSVGAGITDQGEVPNGYNVSRSTTEDFPLRLLSAAPSQTSVYFCASSYSWDRVLEQYFGPGTRLTVT"
@@ -70,7 +60,7 @@ result = task_infer(
70
  print(f"The prediction for {epitope_seq} and {tcr_beta_seq} is {result}")
71
  ```
72
 
73
- See our detailed example at: on `https://github.com/BiomedSciAI/biomed-multi-alignment`
74
 
75
 
76
  ## Citation
 
1
  ---
2
+ base_model:
3
+ - ibm/biomed.omics.bl.sm.ma-ted-458m
4
+ library_name: biomed-multi-alignment
5
+ license: apache-2.0
6
  tags:
7
  - drug-discovery
8
  - ibm
 
13
  - affinity
14
  - safetensors
15
  - biomed-multi-alignment
16
+ pipeline_tag: any-to-any
 
 
 
17
  ---
18
 
19
+ # Paper title and link
20
+
21
+ The model was presented in the paper [MAMMAL -- Molecular Aligned Multi-Modal Architecture and Language](https://huggingface.co/papers/2410.22367).
22
+
23
+ # Paper abstract
24
+
25
+ The abstract of the paper is the following:
26
+
27
+ Drug discovery typically consists of multiple steps, including identifying a target protein key to a disease's etiology, validating that interacting with this target could prevent symptoms or cure the disease, discovering a small molecule or biologic therapeutic to interact with it, and optimizing the candidate molecule through a complex landscape of required properties. Drug discovery related tasks often involve prediction and generation while considering multiple entities that potentially interact, which poses a challenge for typical AI models. For this purpose we present MAMMAL - Molecular Aligned Multi-Modal Architecture and Language - a method that we applied to create a versatile multi-task multi-align foundation model that learns from large-scale biological datasets (2 billion samples) across diverse modalities, including proteins, small molecules, and genes. We introduce a prompt syntax that supports a wide range of classification, regression, and generation tasks. It allows combining different modalities and entity types as inputs and/or outputs. Our model handles combinations of tokens and scalars and enables the generation of small molecules and proteins, property prediction, and transcriptomic lab test predictions. We evaluated the model on 11 diverse downstream tasks spanning different steps within a typical drug discovery pipeline, where it reaches new SOTA in 9 tasks and is comparable to SOTA in 2 tasks. This performance is achieved while using a unified architecture serving all tasks, in contrast to the original SOTA performance achieved using tailored architectures. The model code and pretrained weights are publicly available at https://github.com/BiomedSciAI/biomed-multi-alignment and https://huggingface.co/ibm/biomed.omics.bl.sm.ma-ted-458m.
 
 
 
 
 
 
 
 
 
 
 
28
 
29
  ## Model Summary
30
 
 
44
 
45
  A simple example for a task already supported by `ibm/biomed.omics.bl.sm.ma-ted-458m.tcr_epitope_bind`:
46
 
47
+ ```python
48
  from mammal.examples.tcr_epitope_binding.main_infer import load_model, task_infer
49
 
50
  tcr_beta_seq = "NAGVTQTPKFQVLKTGQSMTLQCAQDMNHEYMSWYRQDPGMGLRLIHYSVGAGITDQGEVPNGYNVSRSTTEDFPLRLLSAAPSQTSVYFCASSYSWDRVLEQYFGPGTRLTVT"
 
60
  print(f"The prediction for {epitope_seq} and {tcr_beta_seq} is {result}")
61
  ```
62
 
63
+ See our detailed example at: https://github.com/BiomedSciAI/biomed-multi-alignment
64
 
65
 
66
  ## Citation