-
Notifications
You must be signed in to change notification settings - Fork 737
Add an LLM-based AutoSuggester #1995
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
||
def __init__(self, | ||
chat_model: Optional[BaseChatModel]=None, | ||
persona: str=DEFAULT_PERSONA, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't the system message also be part of the init here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. I'll add that feature.
I labored a long time to tune the system message to get the diverse LLMs to do the task right, so altering it may break the suggester, but caveat emptor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added the ability to change the system message.
into full, natural-sounding sentences. | ||
""" | ||
|
||
SYSTEM_MESSAGE=""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious how well this works. The examples here don't mention code at all, but I imagine that would be the most common use-case of this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does completes code one-liners correctly. Unfortunately AutoSuggest doesn't handle newlines or tabs well, so I haven't really been able to get it to insert multiline completions.
Here's what you get when you provide the persona "You are a Perl coder" and type "#loop from one to ten:"
#Loop from one to ten:
for ($i = 1; $i <= 10; $i++) { print "$i\n"; }
Another Perl one-liner, bold-facce is user input.
sub factorial()
{ my $n = shift; return 1 if $n == 0; return $n * fact($n-1); }
Python code completions are unsatisfactory due to newline handling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I altered the system message to give an example of code completion. Now the LLM inserts newlines in the appropriate places and added an asis
initialization argument that will allow the LLM to pass through newlines and other control characters.
I may have figured out how to do multiline suggesting for code completion. If you have any other "must haves" for this PR, let me know. |
See the issue I referenced in the other issue for how I managed to get multiline autosuggestions working. It's a bit of hack though so long term we may want to update prompt-toolkit to handle that sort of thing better. |
I've improved support for multi-line code completion. The behavior is that the greyed suggestion is compressed onto a single line with "^J" characters displayed where the newlines will be. Once the suggestion is accepted the multiple lines are inserted into the buffer appropriately. Here's a video of the behavior. OpenAI was a bit slow on the second completion example, so be patient. llm_autosuggest_demo-2025-06-07_14.24.49.mp4 |
Description
This PR implements a new
LLMSuggest
AutoSuggester module which connects to a remote or locally-hosted Large Language Model (LLM) to suggest completions of words, phrases or sentences. It can be used with anyBuffer
class that accepts theauto_suggest
argument.This video shows the autosuggester in action, when using a local Ollama instance. It also works with OpenAI, Anthropic, and other popular commercial LLM services.
llm_autosuggest_demo-2025-06-05_08.07.32.mp4
Installation
LLMSuggest
introduces requirements for thelangchain
,langchain_core
andPyEnchant
modules. These can be installed with pip by specifying thellm
optional dependencies when installingprompt_toolkit
.Together, these three dependencies plus their own requirements (such as pydantic) will increase the size of an installed
prompt_toolkit
by an additional 54M.Testing
This module comes with a unit test that assesses the module's ability to join the output from the LLM with the user's prompt properly. The test uses a mock LLM chat class to avoid the additional setup needed, and will be skipped if the optional dependencies are not installed.
To run the unit test:
Usage
Basic usage is quite straightforward. Just create an instance of
LLMSuggest
, wrap it in aThreadedAutoSuggest
wrapper, and pass the instance to theauto_suggest
argument of aPrompt
,PromptSession
,TextArea
, or any of the lower-levelBuffer
-derived classes that accept this argument. Suggestions appear as you type. Accept the entire suggestion by pressing^E
orright-arrow
, or accept the next word usingalt-F
. These key bindings can be changed as described in theAutoSuggest
reference manual.Below is a basic script that uses the default ChatGPT 4o-mini "free tier" for its suggestions.
For this to work, you must have an OpenAI account, and a valid API key stored in the environment variable
OPENAI_API_KEY
. See platform.openai.com for details.To use a different LLM model, initialize the suggester with an instantiated langchain chat model. This example shows how to use a
llama3
model running on a local Ollama server:For this to work, you must have the optional
langchain-ollama
module installed, usingpip install langchain-ollama
.After installing additional
langchain
chat modules, you can use many other backend LLMs, including Claude, Groq, Perplexity and Gemini.Customization
Several optional constructor arguments allow you to adjust the behavior of the autosuggester. The
persona
argument allows you to adjust the style and tone of the completions. Compare the suggestions you get from these two versions:vs
The optional
temperature
argument is a floating point number which controls the diversity of the suggestions. Lower numbers make the suggestions more deterministic. A temperature of 0.0 results in fully deterministic suggestions, but is more likely to generate cliches. Higher numbers generate prose that appear more creative, but may also introduce nonsensical completions.Finally, the
context
argument allows you to pass additional text to the LLM to help it produce context-aware suggestions. This can be a plain text string, or a Callable that returns a string. A typical use case would be to pass the LLM the contents of a buffer containing a story that is being composed or the record of a chatbot conversation.Additional options are described in the inline documentation.
Bugs and Limitations
ThreadedAutoSuggester
to avoid blocking of keystrokes while the LLM is thinking. Consider using a local model for best responsiveness.LLMSuggest
code uses thePyEnchant
library to figure out when the proferred completion is intended to complete a word or to start a new word. This is not 100% reliable.prompt_toolkit.contrib.auto_suggest.llmsuggest.DEFAULT_PERSONA
tells the LLM that it is an "uncensored writing assistant" to slightly reduce the rate of such refusals on borderline cases. You may wish to adjust the persona to be less lenient of such cases.