Groq Add multi-modal model llava-v1.5-7b-4096-preview support
complete
teslop
LLaVA stands for Large Language and Vision Assistant, a powerful multimodal model that combines the strengths of language and vision. Based on OpenAI’s CLIP and a fine-tuned version of Meta’s Llama 2 7B model, LLaVA uses visual instruction tuning to support image-based natural instruction following and visual reasoning capabilities. This allows LLaVA to perform a range of tasks, including:
Visual question answering: answering questions based on image content
Caption generation: generating text descriptions of images
Optical Character Recognition: identifying text in image
Multimodal dialogue: engaging in conversations that involve both text and images
MindMac
marked this post as
complete
These models have been added from the version 1.9.22. Please upgrade on your end.
MindMac
marked this post as
in progress