Groq Add multi-modal model llava-v1.5-7b-4096-preview support
complete
teslop
LLaVA stands for Large Language and Vision Assistant, a powerful multimodal model that combines the strengths of language and vision. Based on OpenAI’s CLIP and a fine-tuned version of Meta’s Llama 2 7B model, LLaVA uses visual instruction tuning to support image-based natural instruction following and visual reasoning capabilities. This allows LLaVA to perform a range of tasks, including:
Visual question answering: answering questions based on image content
Caption generation: generating text descriptions of images
Optical Character Recognition: identifying text in image
Multimodal dialogue: engaging in conversations that involve both text and images
MindMac
complete
These models have been added from the version 1.9.22. Please upgrade on your end.
MindMac
in progress