--- license: apache-2.0 tags: - vision-transformer - image-classification - pytorch - timm - mlp-mixer - gravitational-lensing - strong-lensing - astronomy - astrophysics datasets: - parlange/gravit-c21 metrics: - accuracy - auc - f1 paper: - title: "GraViT: A Gravitational Lens Discovery Toolkit with Vision Transformers" url: "https://arxiv.org/abs/2509.00226" authors: "Parlange et al." model-index: - name: MLP-Mixer-a1 results: - task: type: image-classification name: Strong Gravitational Lens Discovery dataset: type: common-test-sample name: Common Test Sample (More et al. 2024) metrics: - type: accuracy value: 0.8203 name: Average Accuracy - type: auc value: 0.8352 name: Average AUC-ROC - type: f1 value: 0.4975 name: Average F1-Score --- # 🌌 mlp-mixer-gravit-a1 πŸ”­ This model is part of **GraViT**: Transfer Learning with Vision Transformers and MLP-Mixer for Strong Gravitational Lens Discovery πŸ”— **GitHub Repository**: [https://github.com/parlange/gravit](https://github.com/parlange/gravit) ## πŸ›°οΈ Model Details - **πŸ€– Model Type**: MLP-Mixer - **πŸ§ͺ Experiment**: A1 - C21-classification-head - **🌌 Dataset**: C21 - **πŸͺ Fine-tuning Strategy**: classification-head ## πŸ’» Quick Start ```python import torch import timm # Load the model directly from the Hub model = timm.create_model( 'hf-hub:parlange/mlp-mixer-gravit-a1', pretrained=True ) model.eval() # Example inference dummy_input = torch.randn(1, 3, 224, 224) with torch.no_grad(): output = model(dummy_input) predictions = torch.softmax(output, dim=1) print(f"Lens probability: {predictions[0][1]:.4f}") ``` ## ⚑️ Training Configuration **Training Dataset:** C21 (CaΓ±ameras et al. 2021) **Fine-tuning Strategy:** classification-head | πŸ”§ Parameter | πŸ“ Value | |--------------|----------| | Batch Size | 192 | | Learning Rate | AdamW with ReduceLROnPlateau | | Epochs | 100 | | Patience | 10 | | Optimizer | AdamW | | Scheduler | ReduceLROnPlateau | | Image Size | 224x224 | | Fine Tune Mode | classification_head | | Stochastic Depth Probability | 0.1 | ## πŸ“ˆ Training Curves ![Combined Training Metrics](https://huggingface.co/parlange/mlp-mixer-gravit-a1/resolve/main/training_curves/MLP-Mixer_combined_metrics.png) ## 🏁 Final Epoch Training Metrics | Metric | Training | Validation | |:---------:|:-----------:|:-------------:| | πŸ“‰ Loss | 0.4397 | 0.3864 | | 🎯 Accuracy | 0.7943 | 0.8250 | | πŸ“Š AUC-ROC | 0.8769 | 0.9199 | | βš–οΈ F1 Score | 0.7967 | 0.8351 | ## β˜‘οΈ Evaluation Results ### ROC Curves and Confusion Matrices Performance across all test datasets (a through l) in the Common Test Sample (More et al. 2024): ![ROC + Confusion Matrix - Dataset A](https://huggingface.co/parlange/mlp-mixer-gravit-a1/resolve/main/roc_confusion_matrix/MLP-Mixer_roc_confusion_matrix_a.png) ![ROC + Confusion Matrix - Dataset B](https://huggingface.co/parlange/mlp-mixer-gravit-a1/resolve/main/roc_confusion_matrix/MLP-Mixer_roc_confusion_matrix_b.png) ![ROC + Confusion Matrix - Dataset C](https://huggingface.co/parlange/mlp-mixer-gravit-a1/resolve/main/roc_confusion_matrix/MLP-Mixer_roc_confusion_matrix_c.png) ![ROC + Confusion Matrix - Dataset D](https://huggingface.co/parlange/mlp-mixer-gravit-a1/resolve/main/roc_confusion_matrix/MLP-Mixer_roc_confusion_matrix_d.png) ![ROC + Confusion Matrix - Dataset E](https://huggingface.co/parlange/mlp-mixer-gravit-a1/resolve/main/roc_confusion_matrix/MLP-Mixer_roc_confusion_matrix_e.png) ![ROC + Confusion Matrix - Dataset F](https://huggingface.co/parlange/mlp-mixer-gravit-a1/resolve/main/roc_confusion_matrix/MLP-Mixer_roc_confusion_matrix_f.png) ![ROC + Confusion Matrix - Dataset G](https://huggingface.co/parlange/mlp-mixer-gravit-a1/resolve/main/roc_confusion_matrix/MLP-Mixer_roc_confusion_matrix_g.png) ![ROC + Confusion Matrix - Dataset H](https://huggingface.co/parlange/mlp-mixer-gravit-a1/resolve/main/roc_confusion_matrix/MLP-Mixer_roc_confusion_matrix_h.png) ![ROC + Confusion Matrix - Dataset I](https://huggingface.co/parlange/mlp-mixer-gravit-a1/resolve/main/roc_confusion_matrix/MLP-Mixer_roc_confusion_matrix_i.png) ![ROC + Confusion Matrix - Dataset J](https://huggingface.co/parlange/mlp-mixer-gravit-a1/resolve/main/roc_confusion_matrix/MLP-Mixer_roc_confusion_matrix_j.png) ![ROC + Confusion Matrix - Dataset K](https://huggingface.co/parlange/mlp-mixer-gravit-a1/resolve/main/roc_confusion_matrix/MLP-Mixer_roc_confusion_matrix_k.png) ![ROC + Confusion Matrix - Dataset L](https://huggingface.co/parlange/mlp-mixer-gravit-a1/resolve/main/roc_confusion_matrix/MLP-Mixer_roc_confusion_matrix_l.png) ### πŸ“‹ Performance Summary Average performance across 12 test datasets from the Common Test Sample (More et al. 2024): | Metric | Value | |-----------|----------| | 🎯 Average Accuracy | 0.8203 | | πŸ“ˆ Average AUC-ROC | 0.8352 | | βš–οΈ Average F1-Score | 0.4975 | ## πŸ“˜ Citation If you use this model in your research, please cite: ```bibtex @misc{parlange2025gravit, title={GraViT: Transfer Learning with Vision Transformers and MLP-Mixer for Strong Gravitational Lens Discovery}, author={RenΓ© Parlange and Juan C. Cuevas-Tello and Octavio Valenzuela and Omar de J. Cabrera-Rosas and TomΓ‘s Verdugo and Anupreeta More and Anton T. Jaelani}, year={2025}, eprint={2509.00226}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2509.00226}, } ``` --- ## Model Card Contact For questions about this model, please contact the author through: https://github.com/parlange/