Spaces:
Running
Apply for a GPU community grant: Personal project
Hi Hugging Face team,
I鈥檓 building a small side project to provide a free, open鈥憇ource API endpoint for the Qwen1.5-0.5B-Chat model. The goal is to let developers and researchers experiment with a lightweight chat model without needing their own GPU.
I have already set up a Space with the model and a Flask鈥慴ased API (/v1/chat/completions). However, on the free CPU instance the model runs out of memory (or is extremely slow), and CPU does not support float16 inference. I鈥檝e tried all optimisations but the hardware is simply insufficient.
If granted a T4 small (or any GPU with at least 4GB VRAM), I can:
路 Run the model in float16 mode, which fits comfortably.
路 Keep the API publicly accessible for the open鈥憇ource community.
路 Document the setup so others can replicate it.
The Space is public and the code will remain open source. This is purely a non鈥慶ommercial, educational project.
Thank you for considering my request!
Best,
[wd21]
Space: [model]