VatsalPatel18's picture
I want to design a "Certificate of Experience" , bascially an Internship certificate for 8 weeks , So the person " Application of Graph Neural Networks in Drug Discovery Internship Report by Sonia Bara Supervisor: Vatsal Pravinbhai Patel Senior Research Engineer and Manager, HawkFranklin Research July 2025 Acknowledgements I would like to express my heartfelt gratitude to my advisor, Mr Vatsal Patel , for their invaluable guidance, encouragement, and support throughout the course of this internship. Their expertise and insights have been instrumental in helping me navigate challenges and refine my ideas. This internship project would not have been possible without their mentorship, for which I am deeply appreciative. i Contents 1 Introduction 1 1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1.1 Emergence of Graph Neural Networks (GNNs) as state-of-the-art tools for molecular representation. . . . . . . . . . . . . . . . . . . . . . . 1 1.1.2 Relevance of combination therapy in pancreatic ductal adenocarcinoma (PDAC). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 2 Literature Review 3 2.1 Graph Neural Networks (GNNs) . . . . . . . . . . . . . . . . . . . . . . . . 3 2.2 Drug Synergy Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.3 MIT ComboNet Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . 6 3 Application of GCN: MIT model 8 3.0.1 Data Pipeline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 3.0.2 Model Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . 8 3.0.3 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 3.0.4 Findings and Current Progress . . . . . . . . . . . . . . . . . . . . . 11 3.0.5 Interim Baseline Results under GPU Constraints . . . . . . . . . . . . 13 3.0.6 Comparison with NCATS and UNC Pipelines . . . . . . . . . . . . . 13 4 Future Directions And Enhancements 15 References 16 ii Chapter 1 Introduction Cancer remains one of the leading causes of mortality because effective single-agent therapies are rare and resistance evolves quickly; consequently, clinicians increasingly rely on synergistic drug combinations. Exhaustively testing the millions of possible pairings in wet-lab screens is prohibitively slow and expensive. By representing molecules as graphs and applying Graph Neural Networks (GNNs), modern machine-learning models can learn the structural rules that govern synergy and rapidly prioritise the most promising combinations. This research therefore seeks to harness GNN-based prediction to streamline combination-drug discovery in oncology, with pancreatic cancer as the primary use-case. 1.1 Background 1.1.1 Emergence of Graph Neural Networks (GNNs) as state-of-the-art tools for molecular representation. Molecules can be viewed as graphs in which atoms are nodes and bonds are edges. Graph Neural Networks pass “messages” along these edges so that each atom embedding captures its full chemical environment. This learned, task-specific representation has been shown to outperform traditional fingerprints on most property-prediction benchmarks because it can model stereochemistry, conjugation and long-range interactions automatically. Because the same set of weights is reused for every edge, GNNs also generalise well when labelled data are scarce—a frequent situation in drug discovery. These advantages make GNNs the current state-of-the-art starting point for any machine-learning pipeline that reasons about small molecules. 1.1.2 Relevance of combination therapy in pancreatic ductal adenocarcinoma (PDAC). PDAC is one of the deadliest cancers, with a 5-year survival rate below 10% (6). Singleagent drugs seldom produce durable responses because tumours quickly activate resistance pathways. Combining two or more agents can block these escape routes, lower the required dose of each drug and reduce systemic toxicity. However, the number of possible pairs among even a modest library of 1,000 compounds exceeds half a million, making exhaustive wet-lab testing impractical. Accurate in-silico synergy prediction therefore offers a cost-effective way to prioritise the small subset of combinations most likely to succeed in preclinical and clinical studies for PDAC. 1 CHAPTER 1. INTRODUCTION 2 1.2 Objective This study aims to do the following tasks: 1. Replication – Re-implement and execute MIT’s GCN–Chemprop pipeline on the full dataset comprising 47 000 single-agent assays and 4 000 drug-combination assays. 2. Validation – Benchmark the reproduced model by comparing • area under the ROC curve (target: AUC ≈ 0.84), against the metrics reported in the reference study. 3. Code Audit – Document the integration of graph-convolution layers within Chemprop paying particular attention to • drug synergy metrics used • the multi-task loss that couples single-agent activity with combination-synergy prediction. Chapter 2 Literature Review 2.1 Graph Neural Networks (GNNs) Graph-structured data are non-Euclidean: distances are encoded by an adjacency matrix rather than Cartesian coordinates. Classical machine-learning models—which expect vectors or grids—cannot natively exploit this relational information. Graph Neural Networks (GNNs) fill this gap by learning directly on graphs whose nodes carry feature vectors and whose edges encode pairwise interactions. Message-Passing Framework All contemporary GNN variants can be expressed by the generic message-passing recursion. h (k+1) u = Update(k) h (k) u , Aggregate(k) { h (k) v : v ∈ N (u)}  | {z } messages from neighbours  | {z } new embedding of node u , where • h (k) u is the embedding of node u after k layers; • N (u) denotes the neighbours of u given by the adjacency matrix; • Aggregate collects information from the neighbours; • Update combines the aggregate with the node’s current state. Different GNN families (e.g. GraphSAGE, GAT, D-MPNN) differ mainly in their choices of Aggregate (mean, attention-weighted sum, edge-conditioned sum, . . . ) and Update (linear, GRU, MLP, . . . ) functions. After T layers every node has integrated signals from its T-hop neighbourhood, yielding a permutation-invariant embedding h (T) u suitable for tasks such as node classification, link prediction or whole-graph classification. CHAPTER 2. LITERATURE REVIEW 4 Example: Graph Convolutional Network (GCN) The spectral GCN of Kipf & Welling(3) instantiates Aggregate(k) ({h (k) v }) = X v∈N (u) A˜ uv h (k) v , Update(k) (h (k) u , m (k) u ) = σ W (k)m (k) u  , with a renormalised adjacency A˜ = D − 1 2 (A + I)D − 1 2 and trainable weight matrix W (k) . Consider the example graph in Figure 2.1. (a) In the first layer, node 1 aggregates features only from its immediate blue neighbours (nodes 2,3,4) and itself, updating its embedding. (b) Passing the resulting embeddings through a second layer lets node 1 absorb information that originally resided two hops away (e.g. the green node 5). This hierarchical receptive field mirrors how a learnable convolutional kernel in a CNN grows its spatial context by stacking layers—only here the “pixels” are irregularly connected atoms. Figure 2.1: Two-layer GCN: successive neighbourhood aggregation for node 1. Why GNNs Matter for Molecular Modelling • Expressivity: They learn chemically aware features that encode stereochemistry, conjugation and distal functional-group interactions better than fixed fingerprints. • Data efficiency: Weight sharing over edges reduces the number of parameters and mitigates overfitting when labelled assays are scarce. • Extensibility: Edge and global graph descriptors allow seamless fusion with physicochemical, proteomic or genomic covariates—crucial for multi-modal drug–cancer synergy prediction. Chemprop’s Directed Message-Passing Neural Network (D-MPNN)—the encoder inside MIT’s ComboNet pipeline—leverages these benefits CHAPTER 2. LITERATURE REVIEW 5 2.2 Drug Synergy Metrics Definition of Drug Synergy. A drug pair is said to be synergistic when its combined therapeutic effect exceeds the effect expected from the two drug agents acting independently. If the observed combined effect equals the expectation the interaction is termed additive, while a lower-than-expected effect indicates antagonism. Bliss Independence Assumption.(1) (a) Single-drug effects EA = fraction killed by drug A at dose x, EB = fraction killed by drug B at dose y. (Example: EA = 0.30 means 30 % of cells die, 70 % survive.) (b) Independent survival probabilities P(survive A) = 1 − EA, P(survive B) = 1 − EB, hence P(survive both) = (1 − EA)(1 − EB). (c) Expected combined inhibition E Bliss AB = 1 − (1 − EA)(1 − EB) = EA + EB − EAEB. (Bliss equation) (d) Interpretation    E obs AB > E Bliss AB synergy E obs AB = E Bliss AB additivity E obs AB < E Bliss AB antagonism (e) Intuitive derivation Drug A removes a fraction EA. Drug B then acts only on the survivors, contributing EB(1 − EA). Summing the two terms reproduces the Bliss expectation: EA + EB(1 − EA) = EA + EB − EAEB. Bespoke Synergy Score Based on Bliss Independence: To evaluate the synergistic effect of a drug combination (A, B), we first predict the individual inhibition scores of each drug: s(A), s(B) Using the Bliss independence assumption, the expected combined effect (inhibition) is calculated as: s(AB) = s(A) + s(B) − s(A) · s(B) The actual predicted inhibition score of the combination is denoted as c(AB). The bespoke synergy score is then defined as: Synergy(A, B) = c(AB) − s(AB) A positive synergy score indicates that the combination performs better than expected under the assumption of independent action, suggesting a synergistic interaction. Higher values of the synergy score reflect stronger synergy. CHAPTER 2. LITERATURE REVIEW 6 2.3 MIT ComboNet Architecture The ComboNet(2) model leverages the Chemprop (a message-passing neural network library for molecular property prediction) framework to extract graph-based molecular features using a graph convolutional network (GCN), which are then utilized for the drug synergy prediction task. 1) Model Overview • Architecture. ComboNet consists of two primary components: (i) a Drug–Target Interaction (DTI) module, which encodes molecular features and predicts target binding probabilities, and (ii) a Target–Disease Association module, which maps these molecular representations to task-specific outcomes such as single-drug activity or drug-pair synergy. • Molecular Encoding via DMPNN. Each drug is represented as a molecular graph, where atoms are nodes and bonds are directed edges. A Directed MessagePassing Neural Network (DMPNN) propagates information along edges for T steps. At each step, edge embeddings are updated and aggregated to refine node (atom) states. A final readout operation (e.g., sum-pooling) condenses the peratom representations into a fixed-dimensional vector hdrug, used across all tasks. • Processing Pipeline. SMILES → graph construction → T iterations of message passing → atom-level aggregation → molecular embedding hdrug → task-specific prediction layers (e.g., activity or synergy). 2) Single-Agent Activity Prediction • The learned molecular embedding hdrug (i.e., zA) is passed through a sigmoidactivated linear layer to compute the probability pA that drug A inhibits viral replication below a cytotoxicity threshold: pA = σ(w ⊤zA + b) where w and b are learnable parameters and σ denotes the sigmoid activation function. • The model is trained using binary cross-entropy loss between the predicted probability pA and experimentally measured single-agent activity labels. This task is trained jointly with the synergy prediction task to ensure the model learns chemically and biologically relevant representations, especially in scenarios with limited combination data. 3) Drug Synergy Prediction • To predict drug–drug synergy, ComboNet takes as input a pair of drugs (A, B) and generates their individual embeddings zA and zB using the shared DTI network (pA = f (zA)). CHAPTER 2. LITERATURE REVIEW 7 • These embeddings are combined via the Bliss-inspired aggregation: zAB = zA + zB − zA ⊙ zB where ⊙ denotes element-wise multiplication. This formulation captures independent effects while discounting overlapping interactions, consistent with the Bliss independence assumption. • The combined representation zAB is passed through a sigmoid-activated linear layer to compute the predicted combination activity: pAB = σ(w ⊤zAB + b) • Given the individual activities pA and pB, the expected inhibition score under Bliss independence is: s(AB) = pA + pB − pA · pB • The final predicted synergy score is computed as: Synergy(A, B) = pAB − s(AB) This quantity, referred to earlier as the bespoke synergy score in 2.2, measures how much better the predicted combination effect is compared to the expected Bliss-independent outcome. A higher score indicates stronger synergy. Chapter 3 Application of GCN: MIT model 3.0.1 Data Pipeline • Ingestion: The raw CSV files are read into memory. The record contains the drug SMILES strings, the cell-line identifier and the experimentally measured gamma -label (synergy score).47k single-agent data and 4k combination data points were used to train the model. The training dataset spans assays from 40 distinct cancer cell lines: single-drug responses are sourced from the NCI-60 panel, while drug-pair synergy measurements come from the NCI ALMANAC screen. • Splitting: The curated dataset is divided into 80% training, 10% validation and 10% test sets. • Featurization: Each SMILES string is converted into a molecular graph and passed through Chemprop’s Directed Message-Passing Neural Network (D-MPNN) featuriser. 3.0.2 Model Configuration 1. Shared Molecular Encoder • Architecture: Directed Message-Passing Neural Network (D-MPNN) implemented in chemprop/mpn.py. • Key hyper-parameters – Hidden size (projected to latent_size=100). – Message-passing depth T = 3. – Drop-out p = 0.0. – Undirected messages (averaged with reverse edges). – Activation function: ReLU. – Attention mechanism: disabled. – Atom messages: disabled (atom_messages=False). • Read-out: Mean-pooling over atom embeddings yields a fixed vector hdrug ∈ R hidden_size . 8 CHAPTER 3. APPLICATION OF GCN: MIT MODEL 9 2. Task-Specific Feed-Forward Heads Head Input Layers Output Loss weight (λ) DTI (aux.) hdrug Lin→ReLU→Lin 45 tasks λdti = 1 Single-agent hdrug Linear 60 probs λsingle = 1 Combination hA, hB Linear Synergy score λcombo = 1 Table 3.1: Specification of task-specific heads in ComboNet. Bliss-style Fusion Mechanism To predict the synergistic effect/score of a drug combination, the model employs a Bliss-style fusion strategy grounded in probability theory. This approach is inspired by the Bliss independence assumption, described in section 2.2. First, the individual molecular embeddings of Drug A and Drug B, denoted as hA and hB, are fused into a single combination representation using: hAB = hA + hB − hA ⊙ hB Here, ⊙ denotes element-wise multiplication. This fusion operation models the additive contribution of the two drugs while substracting overlapping (redundant) features, consistent with the assumption of bliss independence. The model then produces two types of outputs: • scoreAB: the raw synergy prediction for the drug combination. • pA and pB: the predicted probabilities of single-agent effectiveness for Drug A and Drug B, respectively (obtained via sigmoid activations). To reflect the expected effect under non-interaction, the model computes the final Combo score as: Combo score = scoreAB − log(pA + pB − pApB) This subtracts the logarithm of the Bliss expected response, ensuring that the model predicts only the excess effect beyond what would be anticipated if the two drugs acted independently. The logarithm stabilizes the subtraction by compressing the probability scale and aiding gradient-based optimization. This formulation enables the model to distinguish positive synergy (when the combo effect exceeds expectation) from antagonism (when the combo underperforms), leading to more accurate and biologically meaningful synergy predictions. 3. Multi-Task Training Loop • Each optimisation step consumes five mini-batches (DTI, source single, target single, source combo, target combo) to balance label scarcity. • Losses are computed with per-label masking to handle sparse assay labels. • Aggregate loss L = λdtiLDTI + λsingle(Lsrc + Ltgt) + λcombo(LsrcCombo + LtgtCombo) . CHAPTER 3. APPLICATION OF GCN: MIT MODEL 10 • Optimiser: AdamW, learning rate 1 × 10−3 ; scheduler: 2-epoch warm-up then linear decay (HuggingFace WarmupLinearSchedule). 4. Data-Handling Choices • Duplication: target single-drug data repeated ×4 and source combo data repeated ×2 to up-weight scarce synergy samples. • Splits: 80/10/10 random split for target combos; source data reshuffled each epoch. • Batch size: batch_size = 100. 5. Check-pointing and Model Selection • Best epoch chosen by highest ROC-AUC on the target-combo validation set. • Secondary .dti checkpoint saved when validation score is within ±0.02 of the best, easing later embedding visualisation. 6. Practical Implications • While full convergence typically requires at least 30 epochs, we extended training to 15 epochs within Colab’s free GPU limits and achieved a 5-fold cross-validated ROC–AUC of 0.824 (± 0.050). This performance is substantially closer to the original MIT benchmark (0.84 ± 0.04), suggesting that further gains are likely with continued training. • Tuning the loss weighting coefficients (λ values) and the latent_size offers straightforward levers to emphasize synergy prediction without compromising the performance on auxiliary tasks (DTI and single-agent response). These hyperparameters can be adjusted depending on the application context—e.g., discovery vs. interpretability. 3.0.3 Evaluation Metrics The predictive performance of the replicated model is quantified using two complementary metrics that measure discrimination power and practical utility for downstream experimental validation. 1. ROC–AUC (Receiver Operating Characteristic – Area Under Curve). This metric quantifies the model’s ability to distinguish between synergistic and non-synergistic drug combinations. Binary synergy labels are assigned based on experimentally derived γ scores, where γ < 0.95 (4) indicates synergy. The ROC curve plots the True Positive Rate (TPR) against the False Positive Rate (FPR) at varying threshold levels, and the area under this curve (AUC) summarizes the model’s global discriminative ability. The original MIT model achieved a test ROC–AUC of 0.84±0.04 across five stratified folds on the PANC-1 dataset, indicating strong classification performance for synergy prediction. CHAPTER 3. APPLICATION OF GCN: MIT MODEL 11 2. Hit Rate at k (HR@k). This metric reflects real-world screening settings where only a fixed number of topranked predictions can be experimentally validated. Following the MIT study, the top k = 30 predicted drug pairs are selected, and the hit rate is computed as: HR@30 = # validated synergistic pairs in top-30 30 . In the MIT study, 25 of the top-30 predicted combinations were confirmed as synergistic in wet-lab experiments, yielding a hit rate of 0.83 (83%). This is a critical metric for validating the practical impact of the model in guiding lab experiments. These metrics were computed on the held-out test fold following full training over 30 epochs. While ROC–AUC offers a threshold-independent performance summary, HR@30 directly quantifies experimental efficiency and thus serves as the primary figure of merit for translational validation. 3.0.4 Findings and Current Progress Code Walk-through • D-MPNN Encoder. Implemented in chemprop/models/mpn.py; this class modifies the standard message-passing routine to propagate information along directed bonds instead of atom-to-atom edges, improving molecular representation fidelity. • Multi-task heads. The task-specific heads for DTI prediction, single-agent response, and drug combination synergy are implemented using a shared encoder architecture. Data Sanity The following datasets were successfully loaded: • NCI single-agent (source): 47,133 records loaded; 1 invalid SMILES skipped. • DTI matrix: 22,025 drug–target rows with 45 target columns. • PANC-1 single-agent: 1,769 entries. • NCI combinations (source): 4,002 valid combinations; 4 invalid SMILES skipped. • PANC-1 combinations (target): 496 valid combinations used for classification. SMILES parsing used RDKit, and minor warnings related to hydrogen atoms or malformed strings were encountered but did not halt execution. Model Configuration This section summarizes the architecture and settings used in the replicated model via cancer_train.py. • Task Breakdown CHAPTER 3. APPLICATION OF GCN: MIT MODEL 12 – Number of DTI tasks: 45 Auxiliary binary classification tasks predicting drug–target binding across 45 protein targets from UniProt IDs. – Number of source tasks: 60 Regression tasks for single-agent response over 60 NCI-60 cancer cell lines. – Number of target tasks: 1 Single-agent efficacy prediction for the PANC-1 pancreatic cell line. – Number of target-combo tasks: 1 Main synergy prediction task for PANC-1 drug combinations. • Architecture & Hyperparameters – Hidden size: 300 Intermediate representation dimensionality in the encoder. – Latent size: 100 Size of the latent drug embedding fed to task-specific heads. – Loss Weights (λ values): λdti = 1, λsingle = 1, λcombo = 1 Ensures equal contribution from each task type during backpropagation. – Dropout: 0.0 No stochastic dropout used; model regularized via multi-task supervision. – Message-passing depth (T): 3 Three rounds of directed message passing over molecular graphs. – Atom messages enabled? False Only bond-centered messages used (D-MPNN), which avoids atom-centric aggregation. – Attention mechanism enabled? False Embedding aggregation uses uniform averaging (no attention weighting). The configuration follows a multi-task paradigm tailored to improve generalization on low-resource combo prediction by leveraging abundant DTI and single-drug data. The design emphasizes structural regularization via auxiliary tasks and interpretable molecular embeddings for downstream synergy scoring. Loss Functions The model uses task-specific loss functions based on the nature of each prediction task. In this replication study, we follow the classification-based setup used in the original ComboNet implementation. • Drug–Target Interaction (DTI) and Single-Agent Tasks: Both tasks are treated as binary classification problems. Labels indicate whether a drug binds to a target (DTI) or is effective against a cell line (single-agent). The loss function used is the BCEWithLogitsLoss, which combines a sigmoid activation with binary cross-entropy loss: LBCE = − [y · log σ(x) + (1 − y) · log(1 − σ(x))] where y ∈ {0, 1} is the binary label and σ(x) is the sigmoid of the raw model output. CHAPTER 3. APPLICATION OF GCN: MIT MODEL 13 • Combination (Synergy) Task: The experimentally measured synergy scores (γ) are binarized using a threshold (typically γ > 0 indicates synergy). Hence, the synergy task is also framed as binary classification and optimized using the same BCEWithLogitsLoss. • Loss Reduction: All losses are computed with reduction=’none’ to enable per-label masking, ensuring that missing or undefined labels (common in sparse bioassays) do not affect gradient updates. Overall, binary cross-entropy loss is used consistently across tasks to train the model in a multi-task classification setting. This aligns with the original MIT ComboNet framework, and evaluation is subsequently performed using ROC–AUC for synergy prediction. 3.0.5 Interim Baseline Results under GPU Constraints To ensure end-to-end reproducibility within the constraints of free Colab T4 GPUs, training was limited to 15 epochs per fold, incontrast to the 30-epoch schedule adopted in the original MIT study. Under this partially converged regime, the 5-fold cross-validated ROC–AUC averaged 0.82 ± 0.05, compared to the published benchmark of 0.84 ± 0.04. The remaining shortfall of approximately 0.02 AUC points is attributed primarily to the still-limited training duration, rather than any differences in model architecture or training procedure. This result confirms that the model continues to improve with training duration and supports the expectation that full convergence (at ≥30 epochs) would enable replication of the published performance. 3.0.6 Comparison with NCATS and UNC Pipelines The MIT modeling approach was evaluated alongside two independently developed pipelines from the NCATS (National Center for Advancing Translational Sciences) and the University of North Carolina (UNC). All three teams were provided with the same curated PANC-1 drug combination dataset and tasked with predicting synergistic drug pairs. – NCATS Pipeline: This approach combined Random Forest (RF), XGBoost, and Deep Neural Networks (DNN), emphasizing engineered chemical and biological descriptors. These included molecular fingerprints, physicochemical properties, biological features from the NCATS predictor, and mechanisms of action. Hit Rate: 53% (16/30 correct predictions) – UNC Pipeline: The UNC team developed an ensemble of RF, Gradient Boosting, DNN, and Graph Convolutional Networks (GCN). They utilized a diverse set of descriptors, including physicochemical properties, molecular fingerprints, simplex descriptors, and mechanism of action. Hit Rate: 40% (12/30 correct predictions) – MIT Pipeline (ComboNet): The MIT model employed a graph-based Directed Message Passing Neural Network (D-MPNN) as a shared encoder, integrated within a multi-task learning framework. It simultaneously optimized drug–target interaction (DTI) and single-agent efficacy tasks, with a synergy head leveraging a Bliss-style fusion mechanism for drug–pair scoring. Hit Rate: 83% (25/30 correct predictions) CHAPTER 3. APPLICATION OF GCN: MIT MODEL 14 Blinded Evaluation Results: On a common held-out validation set of 88 blinded drug combinations (5), the MIT model significantly outperformed both the NCATS and UNC pipelines: – MIT: ROC–AUC = 0.78 – UNC: ROC–AUC = 0.60 – NCATS: ROC–AUC = 0.56 Conclusion: The MIT pipeline demonstrated superior generalization and predictive accuracy, likely due to its graph-based molecular encoding and auxiliary task supervision. These findings reinforce the advantage of multi-task learning and structure-aware representations for low-data drug synergy prediction. Using a consistent synergy threshold (γ < 0.95), all three pipelines achieved notable hit rates, though the MIT model emerged as the most robust and reliable across metrics. Chapter 4 Future Directions And Enhancements Although our reproduction of the MIT ComboNet pipeline under constrained GPU resources produced promising results, several avenues remain to maximize both performance and reusability: – Full Convergence & Curriculum Training. Resume training from existing checkpoints to the originally intended 30–50 epochs, or employ a curriculum learning schedule (e.g. gradual unfreezing of the encoder) to accelerate convergence under limited GPU quotas. – Adaptive Loss Weighting. Replace fixed λ coefficients with learnable weights (e.g. uncertainty–based multitask loss) to dynamically balance the DTI, single–agent, and combination objectives. – Graph Attention & Edge Features. Incorporate an attention mechanism or additional edge attributes (e.g. atom pair distances, bond orders) in the D-MPNN to focus on the most informative substructures for synergy. – Transfer & Meta-Learning. Pretrain the encoder on larger public synergy or bioactivity datasets (e.g. NCI-ALMANAC, ChEMBL), then fine-tune on PANC-1 combinations to boost generalization in low–data regimes. – Data Augmentation & Active Learning. Generate in silico perturbations (e.g. SMILES augmentations or virtual docking scores) and deploy an active learning loop to iteratively select the most informative combinations for additional wet-lab validation. Together, these enhancements will not only close the remaining performance gap but also strengthen the portability and extensibility of the synergy prediction framework across different cell lines and disease contexts. 15 References [1] Milad Besharatifard and Fatemeh Vafaee. A review on graph neural networks for predicting synergistic drug combinations. Artificial Intelligence Review, 57, 02 2024. [2] Wengong Jin, Jonathan M. Stokes, Richard T. Eastman, Zina Itkin, Alexey V. Zakharov, James J. Collins, Tommi S. Jaakkola, and Regina Barzilay. Deep learning identifies synergistic drug combinations for treating covid-19. Proceedings of the National Academy of Sciences, 118(39):e2105070118, 2021. [3] Thomas Kipf and Max Welling. Semi-supervised classification with graph convolutional networks. 09 2016. [4] Lesley A. Mathews Griner, Rajarshi Guha, Paul Shinn, Robert M. Young, Jennifer M. Keller, Daphne Liu, Isaac S. Goldlust, Adam Yasgar, Colleen McKnight, Matthew B. Boxer, Daniel Y. Duveau, Jian-Kang Jiang, Shyamal Michael, Thomas Mierzwa, Wei Huang, Michelle J. Walsh, Bryan T. Mott, Purva Patel, William Leister, David J. Maloney, Cathy A. Leclair, Ganesha Rai, Ajit Jadhav, Bruce D. Peyser, Christopher P. Austin, Sherry E. Martin, Anton Simeonov, Marc Ferrer, Louis M. Staudt, and Craig J. Thomas. High-throughput combinatorial screening identifies drugs that cooperate with ibrutinib to kill activated b-cell-like diffuse large b-cell lymphoma cells. Proceedings of the National Academy of Sciences of the United States of America, 111(6):2349–2354, Feb 2014. Epub 2014 Jan 27. [5] Mahya Pourmousa, Shobhit Jain, Ekaterina Barnaeva, and et al. Ai-driven discovery of synergistic drug combinations against pancreatic cancer. Nature Communications, 16:4020, 2025. [6] Panagiotis Sarantis, Eirini Koustas, Aikaterini Papadimitropoulou, Athanasios G. Papavassiliou, and Michalis V. Karamouzis. Pancreatic ductal adenocarcinoma: Treatment hurdles, tumor microenvironment and immunotherapy. World Journal of Gastrointestinal Oncology, 12(2):173–181, Feb 2020. 16", you can mention in really short that , She had worked under the supevision of Vatsal , and had performed researched on the topic "Application of Graph Neural Networks in Drug Discovery" , She has performed necessary research revieews and performec analsys if codebased of existing machine learning models in drug discovery, here performance was great, We had to just mentor her minimastiacally and she could drive an independent work given your strateute. We compliement her all the best in the journary . - Initial Deployment
7c2ebf5 verified