Skip to content

server : add Speech Recognition & Synthesis to UI #8679

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jul 25, 2024

Conversation

ElYaiko
Copy link
Contributor

@ElYaiko ElYaiko commented Jul 25, 2024

This PR adds a Speech Recognition & Synthesis to the UI (A simple voice mode).

Screenshot 2024-07-24 at 21-23-02 llama cpp - chat

Features added:
Talk button: Initiates speech-to-text.
Send after talk option: Sends the message after STT.
Voice option: Text-to-speech voice used for the bot.
Play/pause message: Play/pause message with selected TTS voice.
Play message after completition

Tested browsers:

  • Chrome
  • Firefox
  • Safari

Tested OS:

  • Windows
  • macOS
  • Linux (Requires additional packages for TTS: Guide)
  • Android
  • iOS

Copy link
Member

@ggerganov ggerganov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Just need to fix the trailing whitespaces (see the CI)

Copy link
Collaborator

@ngxson ngxson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Small detail: I'd prefer to add a small message says "TTS and speech recognition are not provided by llama.cpp", so to be clear to user that the quality depends on their browser, not on llama.cpp or the model itself.

@ElYaiko
Copy link
Contributor Author

ElYaiko commented Jul 25, 2024

@ngxson What do you think?

2024-07-25-164752_1366x768_scrot

@ElYaiko ElYaiko requested a review from ngxson July 25, 2024 20:54
@ngxson
Copy link
Collaborator

ngxson commented Jul 25, 2024

Yes it's LGTM, we can merge once the CI pass

@ngxson ngxson merged commit 01aec4a into ggml-org:master Jul 25, 2024
11 checks passed
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Jul 27, 2024
* server : add Speech Recognition & Synthesis to UI

* server : add Speech Recognition & Synthesis to UI (fixes)
@jboero
Copy link
Contributor

jboero commented Aug 15, 2024

Wow I just saw this update. Kudos merging Whisper and TTS this is brilliant. Well done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants