Add Mistral Pixtral-12B support
complete
teslop
the first-ever multimodal Mistral model.
Pixtral 12B in short:
Natively multimodal, trained with interleaved image and text data
Strong performance on multimodal tasks, excels in instruction following
Maintains state-of-the-art performance on text-only benchmarks
Architecture:
New 400M parameter vision encoder trained from scratch
12B parameter multimodal decoder based on Mistral Nemo
Supports variable image sizes and aspect ratios
Supports multiple images in the long context window of 128k tokens
MindMac
complete
Now Pixtral Large & Pixtral 12B have been added since the version 1.9.23. Please upgrade on your end.
MindMac
in progress