Efficient Drop-In Replacement for the Classification Head in Language Model Inference. https://github.com/embedl/flash-head
AI & ML interests
None defined yet.
Recent Activity
View all activity
nvidia/Cosmos-Reason2 multi-modal reasoning models optimized by Embedl.
-
embedl/Cosmos-Reason2-2B-W4A16-Edge2
Image-Text-to-Text • 2B • Updated • 11.9k • 12 -
embedl/Cosmos-Reason2-2B-W4A16-Edge2-FlashHead
Image-Text-to-Text • 2B • Updated • 1.71k • 7 -
embedl/Cosmos-Reason2-2B-NVFP4A16
Image-Text-to-Text • 2B • Updated • 466 • 1 -
embedl/Cosmos-Reason2-2B-W4A16
Image-Text-to-Text • 2B • Updated • 692 • 7
Ultra-efficient model variants optimized for Jetson Orin Nano. Designed for constrained edge environments requiring low memory footprint.
-
embedl/Cosmos-Reason2-2B-W4A16-Edge2
Image-Text-to-Text • 2B • Updated • 11.9k • 12 -
embedl/Cosmos-Reason2-2B-W4A16
Image-Text-to-Text • 2B • Updated • 692 • 7 -
embedl/Cosmos-Reason2-2B-W4A16-Edge2-FlashHead
Image-Text-to-Text • 2B • Updated • 1.71k • 7 -
Edge Inference Benchmarks
🚀6On-Device benchmarks across devices and models.
Models validated and performance-optimized for NVIDIA Jetson AGX Thor. Tailored for high-performance edge AI workloads.
-
embedl/Cosmos-Reason2-2B-NVFP4A16
Image-Text-to-Text • 2B • Updated • 466 • 1 -
embedl/Cosmos-Reason2-2B-W4A16-Edge2
Image-Text-to-Text • 2B • Updated • 11.9k • 12 -
embedl/Cosmos-Reason2-2B-W4A16
Image-Text-to-Text • 2B • Updated • 692 • 7 -
embedl/Cosmos-Reason2-2B-W4A16-Edge2-FlashHead
Image-Text-to-Text • 2B • Updated • 1.71k • 7
Quantization strategy where most weights are converted to INT4, activations remain in FP16, and sensitive layers are preserved in FP16.
Qwen/Qwen3.5 variants optimized by embedl.
-
embedl/Qwen3.5-0.8B-FlashHead
Image-Text-to-Text • 0.9B • Updated • 350 -
embedl/Qwen3.5-2B-FlashHead
Image-Text-to-Text • 2B • Updated • 416 -
embedl/Qwen3.5-4B-FlashHead
Image-Text-to-Text • 5B • Updated • 401 -
embedl/Qwen3.5-9B-FlashHead
Image-Text-to-Text • 10B • Updated • 386
Models optimized and bench-marked for NVIDIA Jetson AGX Orin. Memory-efficient and latency-optimized variants designed for real-time edge inference.
-
embedl/Cosmos-Reason2-2B-W4A16-Edge2
Image-Text-to-Text • 2B • Updated • 11.9k • 12 -
embedl/Cosmos-Reason2-2B-W4A16
Image-Text-to-Text • 2B • Updated • 692 • 7 -
embedl/Cosmos-Reason2-2B-W4A16-Edge2-FlashHead
Image-Text-to-Text • 2B • Updated • 1.71k • 7 -
embedl/Qwen3.5-0.8B-FlashHead
Image-Text-to-Text • 0.9B • Updated • 350
Efficient Drop-In Replacement for the Classification Head in Language Model Inference. https://github.com/embedl/flash-head
Quantization strategy where most weights are converted to INT4, activations remain in FP16, and sensitive layers are preserved in FP16.
nvidia/Cosmos-Reason2 multi-modal reasoning models optimized by Embedl.
-
embedl/Cosmos-Reason2-2B-W4A16-Edge2
Image-Text-to-Text • 2B • Updated • 11.9k • 12 -
embedl/Cosmos-Reason2-2B-W4A16-Edge2-FlashHead
Image-Text-to-Text • 2B • Updated • 1.71k • 7 -
embedl/Cosmos-Reason2-2B-NVFP4A16
Image-Text-to-Text • 2B • Updated • 466 • 1 -
embedl/Cosmos-Reason2-2B-W4A16
Image-Text-to-Text • 2B • Updated • 692 • 7
Qwen/Qwen3.5 variants optimized by embedl.
-
embedl/Qwen3.5-0.8B-FlashHead
Image-Text-to-Text • 0.9B • Updated • 350 -
embedl/Qwen3.5-2B-FlashHead
Image-Text-to-Text • 2B • Updated • 416 -
embedl/Qwen3.5-4B-FlashHead
Image-Text-to-Text • 5B • Updated • 401 -
embedl/Qwen3.5-9B-FlashHead
Image-Text-to-Text • 10B • Updated • 386
Ultra-efficient model variants optimized for Jetson Orin Nano. Designed for constrained edge environments requiring low memory footprint.
-
embedl/Cosmos-Reason2-2B-W4A16-Edge2
Image-Text-to-Text • 2B • Updated • 11.9k • 12 -
embedl/Cosmos-Reason2-2B-W4A16
Image-Text-to-Text • 2B • Updated • 692 • 7 -
embedl/Cosmos-Reason2-2B-W4A16-Edge2-FlashHead
Image-Text-to-Text • 2B • Updated • 1.71k • 7 -
Edge Inference Benchmarks
🚀6On-Device benchmarks across devices and models.
Models optimized and bench-marked for NVIDIA Jetson AGX Orin. Memory-efficient and latency-optimized variants designed for real-time edge inference.
-
embedl/Cosmos-Reason2-2B-W4A16-Edge2
Image-Text-to-Text • 2B • Updated • 11.9k • 12 -
embedl/Cosmos-Reason2-2B-W4A16
Image-Text-to-Text • 2B • Updated • 692 • 7 -
embedl/Cosmos-Reason2-2B-W4A16-Edge2-FlashHead
Image-Text-to-Text • 2B • Updated • 1.71k • 7 -
embedl/Qwen3.5-0.8B-FlashHead
Image-Text-to-Text • 0.9B • Updated • 350
Models validated and performance-optimized for NVIDIA Jetson AGX Thor. Tailored for high-performance edge AI workloads.
-
embedl/Cosmos-Reason2-2B-NVFP4A16
Image-Text-to-Text • 2B • Updated • 466 • 1 -
embedl/Cosmos-Reason2-2B-W4A16-Edge2
Image-Text-to-Text • 2B • Updated • 11.9k • 12 -
embedl/Cosmos-Reason2-2B-W4A16
Image-Text-to-Text • 2B • Updated • 692 • 7 -
embedl/Cosmos-Reason2-2B-W4A16-Edge2-FlashHead
Image-Text-to-Text • 2B • Updated • 1.71k • 7