AI & ML interests
None defined yet.
Recent Activity
View all activity

dalle2
authored
8
papers
21 days ago
Shears: Unstructured Sparsity with Neural Low-rank Adapter Search
Paper
•
2404.10934
•
Published
SQFT: Low-cost Model Adaptation in Low-precision Sparse Foundation Models
Paper
•
2410.03750
•
Published
•
2
Post-Training Statistical Calibration for Higher Activation Sparsity
Paper
•
2412.07174
•
Published
•
1
Mamba-Shedder: Post-Transformer Compression for Efficient Selective Structured State Space Models
Paper
•
2501.17088
•
Published
•
2
MultiPruner: Balanced Structure Removal in Foundation Models
Paper
•
2501.09949
•
Published
TokenButler: Token Importance is Predictable
Paper
•
2503.07518
•
Published
•
1
KVCrush: Key value cache size-reduction using similarity in head-behaviour
Paper
•
2503.00022
•
Published
SparAMX: Accelerating Compressed LLMs Token Generation on AMX-powered CPUs
Paper
•
2502.12444
•
Published

dalle2
authored
a
paper
9 months ago