Groq Add multi-modal model llava-v1.5-7b-4096-preview support | Voters

Groq Add multi-modal model llava-v1.5-7b-4096-preview support

complete

teslop

LLaVA stands for Large Language and Vision Assistant, a powerful multimodal model that combines the strengths of language and vision. Based on OpenAI’s CLIP and a fine-tuned version of Meta’s Llama 2 7B model, LLaVA uses visual instruction tuning to support image-based natural instruction following and visual reasoning capabilities. This allows LLaVA to perform a range of tasks, including:
Visual question answering: answering questions based on image content
Caption generation: generating text descriptions of images
Optical Character Recognition: identifying text in image
Multimodal dialogue: engaging in conversations that involve both text and images

September 5, 2024

MindMac

marked this post as

complete

These models have been added from the version 1.9.22. Please upgrade on your end.

MindMac

marked this post as

in progress