Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
prithivMLmods 
posted an update Aug 29
Post
3442
Introducing prithivMLmods/DeepCaption-VLA-7B, a multimodal VLM designed for reasoning with long-shot captions (Captioning and Vision-Language Attribution). It focuses on defining visual properties, object attributes, and scene details across a wide spectrum of images and aspect ratios, generating attribute-rich image captions. The model supports creative, artistic, and technical applications that require detailed descriptions. 🤗🔥

✦︎ Models: prithivMLmods/DeepCaption-VLA-7B, also includes prithivMLmods/DeepAttriCap-VLA-3B, an experimental model for vision-language attribution.

✦︎ Try the demo here: prithivMLmods/VisionScope-R2

✦︎ Try it now on Google Colab, with support for T4 GPUs in 4-bit quant_type: https://github.com/PRITHIVSAKTHIUR/Multimodal-Outpost-Notebooks/blob/main/DeepCaption-VLA-7B%5B4bit%20-%20notebook%20demo%5D/DeepCaption-VLA-7B.ipynb

✦︎ Collection: prithivMLmods/deepcaption-attr-68b041172ebcb867e45c556a

.
.
.

To know more about it, visit the model card of the respective model. !!

Can we apply it for traffic Engineering for example to understand and detect the vehicle and road scenario as well as level of traffjc jam

·

Yes, you can! @mahbubchula
Try the simple workflow I’ve created below:

↗️notebook demo: https://github.com/PRITHIVSAKTHIUR/Multimodal-Outpost-Notebooks/blob/main/Behemoth-3B-070225-post0.1_Traffic_Analysis/Behemoth_3B_070225_post0_1_Traffic_Analysis.ipynb

Here, I implemented https://huggingface.co/prithivMLmods/Behemoth-3B-070225-post0.1, which is close in functionality to DeepCaption-VLA-7B. I switched to this model because it better fits the VRAM usage on a T4 Colab instance. You can adapt the model according to your use cases, requirements, and available resources.

For detection functionality, refer to the following app: https://huggingface.co/spaces/sergiopaniego/vlm_object_understanding

Demo UI Image Inference
Demo UI Image Inference

~ prithivsakthi ur

Thank you so much, If I apply your model in my research , how can I inform you or cite you , are you willing to hear me more , I wish I could connect with you. "mahbub.hassan@ieee.org" my email. Thank you

·

@mahbubchula
You can just use the DOI or simply mention that this model is active here.

https://huggingface.co/prithivMLmods/DeepCaption-VLA-7B?doi=true