Safe and Robust Watermark Injection with a Single OoD Image Paper • 2309.01786 • Published Sep 4, 2023
LoX: Low-Rank Extrapolation Robustifies LLM Safety Against Fine-tuning Paper • 2506.15606 • Published Jun 18 • 1
Cross-Modal Safety Alignment: Is textual unlearning all you need? Paper • 2406.02575 • Published May 27, 2024 • 1
Model Tampering Attacks Enable More Rigorous Evaluations of LLM Capabilities Paper • 2502.05209 • Published Feb 3 • 1
MORSE-500: A Programmatically Controllable Video Benchmark to Stress-Test Multimodal Reasoning Paper • 2506.05523 • Published Jun 5 • 34
MedHallu: A Comprehensive Benchmark for Detecting Medical Hallucinations in Large Language Models Paper • 2502.14302 • Published Feb 20 • 9
Decoding Compressed Trust: Scrutinizing the Trustworthiness of Efficient LLMs Under Compression Paper • 2403.15447 • Published Mar 18, 2024 • 16
GTBench: Uncovering the Strategic Reasoning Limitations of LLMs via Game-Theoretic Evaluations Paper • 2402.12348 • Published Feb 19, 2024 • 1
Decoding Compressed Trust: Scrutinizing the Trustworthiness of Efficient LLMs Under Compression Paper • 2403.15447 • Published Mar 18, 2024 • 16
Decoding Compressed Trust: Scrutinizing the Trustworthiness of Efficient LLMs Under Compression Paper • 2403.15447 • Published Mar 18, 2024 • 16
Good at captioning, bad at counting: Benchmarking GPT-4V on Earth observation data Paper • 2401.17600 • Published Jan 31, 2024
DP-OPT: Make Large Language Model Your Privacy-Preserving Prompt Engineer Paper • 2312.03724 • Published Nov 27, 2023 • 1
Revisiting Zeroth-Order Optimization for Memory-Efficient LLM Fine-Tuning: A Benchmark Paper • 2402.11592 • Published Feb 18, 2024 • 2
Data-Free Knowledge Distillation for Heterogeneous Federated Learning Paper • 2105.10056 • Published May 20, 2021