Mayank Mishra's picture

Mayank Mishra

mayank-mishra

·

https://mayank31398.github.io/

AI & ML interests

Large Language Models, Distributed Training and Inference

Recent Activity

authored a paper 3 days ago

FlashFormer: Whole-Model Kernels for Efficient Low-Batch Inference

authored a paper 3 days ago

PaTH Attention: Position Encoding via Accumulating Householder Transformations

authored a paper 3 months ago

Granite Vision: a lightweight, open-source multimodal model for enterprise Intelligence

View all activity

Organizations

mayank-mishra's activity

authored 2 papers 3 days ago

FlashFormer: Whole-Model Kernels for Efficient Low-Batch Inference

Paper • 2505.22758 • Published 10 days ago

PaTH Attention: Position Encoding via Accumulating Householder Transformations

Paper • 2505.16381 • Published 17 days ago

authored 2 papers 3 months ago

Granite Vision: a lightweight, open-source multimodal model for enterprise Intelligence

Paper • 2502.09927 • Published Feb 14

Ladder-residual: parallelism-aware architecture for accelerating large model inference with communication overlapping

Paper • 2501.06589 • Published Jan 11

updated a model 4 months ago

ibm-research/moe-7b-1b-active-shared-experts

Updated Feb 11 • 5.3k • 4

published a model 4 months ago

ibm-research/moe-7b-1b-active-shared-experts

Updated Feb 11 • 5.3k • 4

updated a model 4 months ago

ibm-granite/granite-3.2-8b-instruct-preview

Text Generation • Updated Feb 26 • 862 • 69

New activity in ibm-granite/granite-3.1-2b-instruct 5 months ago

RE-ADD float32 please.

#3 opened 5 months ago by

ctranslate2-4you

upvoted a collection 6 months ago

Granite 3.1 Language Models

A series of language models with 128K context length trained by IBM licensed under Apache 2.0 license. • 9 items • Updated May 2 • 61

New activity in ibm-granite/granite-3.1-8b-instruct 6 months ago

Exceptional creative writer

#1 opened 6 months ago by

authored a paper 6 months ago

Selective Self-Rehearsal: A Fine-Tuning Approach to Improve Generalization in Large Language Models

Paper • 2409.04787 • Published Sep 7, 2024 • 1

upvoted a paper 7 months ago

SelfCodeAlign: Self-Alignment for Code Generation

Paper • 2410.24198 • Published Oct 31, 2024 • 25

upvoted a collection 7 months ago

SmolLM2

State-of-the-art compact LLMs for on-device applications: 1.7B, 360M, 135M • 16 items • Updated May 5 • 266

New activity in ibm-granite/granite-3.0-2b-instruct 8 months ago

add base model metadata

#3 opened 8 months ago by

New activity in ibm-granite/granite-3.0-8b-instruct 8 months ago

add base model metadata

#5 opened 8 months ago by

New activity in ibm-granite/granite-3.0-1b-a400m-instruct 8 months ago

Add base model metadata

#2 opened 8 months ago by

upvoted a collection 8 months ago

Granite 3.0 Language Models

A series of language models trained by IBM licensed under Apache 2.0 license. We release both the base pretrained and instruct models. • 8 items • Updated May 2 • 96

updated 2 collections 8 months ago

Power-LM

Dense & MoE LLMs trained with power learning rate scheduler. • 4 items • Updated Oct 17, 2024 • 15

Granite 3.0 Language Models

A series of language models trained by IBM licensed under Apache 2.0 license. We release both the base pretrained and instruct models. • 8 items • Updated May 2 • 96