MoML-CA / README.md

Update README.md

2bbfb05 verified 29 days ago

5.26 kB

	---
	license: mit
	tags:
	- molecular-property-prediction
	- graph-neural-network
	- chemistry
	- pytorch
	- molecular-dynamics
	- force-fields
	datasets:
	- qm9
	- spice
	- pfas
	metrics:
	- mse
	- mae
	pipeline_tag: graph-ml
	library_name: moml
	---

	# MoML-CA: Molecular Machine Learning for Coarse-grained Applications

	This repository contains the DJMGNN (Dense Jump Multi-Graph Neural Network) models from the MoML-CA project, designed for molecular property prediction and coarse-grained molecular modeling applications.

	## 🚀 Models Available

	### 1. Base Model (`base_model/`)
	- Pre-trained DJMGNN model trained on multiple molecular datasets
	- Datasets: QM9, SPICE, PFAS
	- Task: General molecular property prediction
	- Use case: Starting point for transfer learning or direct molecular property prediction

	### 2. Fine-tuned Model (`finetuned_model/`)
	- PFAS-specialized DJMGNN model fine-tuned for PFAS molecular properties
	- Base: Built upon the base model
	- Specialization: Per- and polyfluoroalkyl substances (PFAS)
	- Use case: Optimized for PFAS molecular property prediction

	## 🏗️ Architecture

	DJMGNN (Dense Jump Multi-Graph Neural Network) features:
	- Multi-task learning: Simultaneous node-level and graph-level predictions
	- Jump connections: Enhanced information flow between layers
	- Dense blocks: Improved gradient flow and feature reuse
	- Supernode aggregation: Global graph representation
	- RBF features: Radial basis function encoding for distance information

	### Architecture Details
	- Hidden Dimensions: 128
	- Number of Blocks: 3-4
	- Layers per Block: 6
	- Input Node Dimensions: 11-29 (depending on featurization)
	- Node Output Dimensions: 3 (forces/properties per atom)
	- Graph Output Dimensions: 19 (molecular descriptors)
	- Energy Output Dimensions: 1 (total energy)

	## 📊 Training Details

	### Datasets
	- QM9: ~130k small organic molecules with quantum mechanical properties
	- SPICE: Molecular dynamics trajectories with forces and energies
	- PFAS: Per- and polyfluoroalkyl substances dataset with specialized descriptors

	### Training Configuration
	- Optimizer: Adam
	- Learning Rate: 3e-5 (fine-tuning), 1e-3 (base training)
	- Batch Size: 4-8 (node tasks), 8-32 (graph tasks)
	- Loss Functions: MSE for regression, weighted multi-task loss
	- Regularization: Dropout (0.2), gradient clipping

	## 🔧 Usage

	### Loading the Base Model

	```python
	import torch
	from moml.models.mgnn.djmgnn import DJMGNN

	# Initialize model architecture
	model = DJMGNN(
	in_node_dim=29, # Adjust based on your featurization
	in_edge_dim=0,
	hidden_dim=128,
	n_blocks=4,
	layers_per_block=6,
	node_output_dims=3,
	graph_output_dims=19,
	energy_output_dims=1,
	jk_mode="attention",
	dropout=0.2,
	use_supernode=True,
	use_rbf=True,
	rbf_K=32
	)

	# Load base model checkpoint
	checkpoint = torch.hub.load_state_dict_from_url(
	"https://huggingface.co/saketh11/MoML-CA/resolve/main/base_model/pytorch_model.pt"
	)
	model.load_state_dict(checkpoint["model_state_dict"])
	model.eval()
	```

	### Loading the Fine-tuned Model

	```python
	# Same architecture setup as above, then:
	checkpoint = torch.hub.load_state_dict_from_url(
	"https://huggingface.co/saketh11/MoML-CA/resolve/main/finetuned_model/pytorch_model.pt"
	)
	model.load_state_dict(checkpoint["model_state_dict"])
	model.eval()
	```

	### Making Predictions

	```python
	# Assuming you have a molecular graph 'data' (torch_geometric.data.Data)
	with torch.no_grad():
	output = model(
	x=data.x,
	edge_index=data.edge_index,
	edge_attr=data.edge_attr,
	batch=data.batch
	)

	# Extract predictions
	node_predictions = output["node_pred"] # Per-atom properties/forces
	graph_predictions = output["graph_pred"] # Molecular descriptors
	energy_predictions = output["energy_pred"] # Total energy
	```

	## 📈 Performance

	### Base Model
	- Trained on diverse molecular datasets for robust generalization
	- Multi-task learning across node and graph-level properties
	- Suitable for transfer learning to specialized domains

	### Fine-tuned Model
	- Specialized for PFAS molecular properties
	- Improved accuracy on fluorinated compounds
	- Optimized for environmental and toxicological applications

	## 🔬 Applications

	- Molecular Property Prediction: HOMO/LUMO, dipole moments, polarizability
	- Force Field Development: Atomic forces and energies for MD simulations
	- Environmental Chemistry: PFAS behavior and properties
	- Drug Discovery: Molecular screening and optimization
	- Materials Science: Polymer and surface properties

	## 🔗 Links

	- GitHub Repository: [SAKETH11111/MoML-CA](https://github.com/SAKETH11111/MoML-CA)
	- Documentation: See repository README and docs/
	- Issues: Report bugs and request features on GitHub

	## 📄 License

	This project is licensed under the MIT License. See the LICENSE file for details.

	## 👥 Contributing

	Contributions are welcome! Please see the contributing guidelines in the GitHub repository.

	---

	For questions or support, please open an issue in the [GitHub repository](https://github.com/SAKETH11111/MoML-CA).