Voice Input/Output | Voters

Voice Input/Output

complete

MindMac

Support Speech to Text to input prompt and Text to Speech for response

November 1, 2023

Dave

can you add eleven labs as well

MindMac

Dave: Yes, we will work to support ElevenLabs soon. Please stay tuned.

MindMac

marked this post as

complete

Voice Input (Speech to Text) and Voice Output (Text to Speech) now is available on the version 1.9.16.

You can use Apple native capability or OpenAI Whisper/TTS/TTS HD for this feature.

Victor R

MindMac Thanks for adding this!!

MindMac

marked this post as

in progress

Victor R

Looking forward to this feature to be added.

Sascha

I would recommend to wait until ChatGPT Voice API
 becomes public (if not already). This would make the implementation maybe easier for the MindMac developer. (as long it also creates and deploys the conversation data as like in the app does)
Can be tested since today for free-users in the offical ChatGPT app on Andorid and iOS too.
Otherwise there would be first a call to WhisperAI necessary and then one to the target API.
Hint
: Macs dictation feature also got improved with MacOS Sonoma. This could be a good workaround, putting it on your function-keys, for the time being.
So my final conclusion, even if I wish this feature too since the beginning. Now with ChatGPT Voice, I would recommend to just wait up the see the development of it.

Christopher Davis

Dragon Nuance was a game changer for those suffering from Repetitive Strain Injury or Carpal Tunnel Syndrome 30 years ago. As the only competitive system-wide app, it provided relief to many. However, it is not compatible with Mac and has always been a love/hate app for users like me. Requires 70% less time typing and 110% more time correcting errors. Moreover, Whisper already boasts far superior accuracy.

Yves

Same/Similar feature request, maybe you can merge them:

https://mindmac.canny.io/feature-requests/p/add-feature-to-transcribe-and-translate-mp3-file-using-openai-whisper-model

Also it would be a nice added-benefit to include features to mitigate insufficiencies by the Whisper model. (https://platform.openai.com/docs/guides/speech-to-text)

How to handle inputs larger than 25MB -> PyDub?
Mitigate spelling mistakes -> Automatic Post-processing with GPT-4

MindMac

marked this post as

planned