Add an LLM-based AutoSuggester #1995

lstein · 2025-06-05T13:52:36Z

Description

This PR implements a new LLMSuggest AutoSuggester module which connects to a remote or locally-hosted Large Language Model (LLM) to suggest completions of words, phrases or sentences. It can be used with any Buffer class that accepts the auto_suggest argument.

This video shows the autosuggester in action, when using a local Ollama instance. It also works with OpenAI, Anthropic, and other popular commercial LLM services.

llm_autosuggest_demo-2025-06-05_08.07.32.mp4

Installation

LLMSuggest introduces requirements for the langchain, langchain_core and PyEnchant modules. These can be installed with pip by specifying the llm optional dependencies when installing prompt_toolkit.

# From the project root.
pip install .[llm]

# When and if this PR is released to pypi:
pip install prompt_toolkit[llm]

Together, these three dependencies plus their own requirements (such as pydantic) will increase the size of an installed prompt_toolkit by an additional 54M.

Testing

This module comes with a unit test that assesses the module's ability to join the output from the LLM with the user's prompt properly. The test uses a mock LLM chat class to avoid the additional setup needed, and will be skipped if the optional dependencies are not installed.

To run the unit test:

# from within the repo root
pytest  tests/test_llmsuggest.py

Usage

Basic usage is quite straightforward. Just create an instance of LLMSuggest, wrap it in a ThreadedAutoSuggest wrapper, and pass the instance to the auto_suggest argument of a Prompt, PromptSession, TextArea, or any of the lower-level Buffer-derived classes that accept this argument. Suggestions appear as you type. Accept the entire suggestion by pressing ^E or right-arrow, or accept the next word using alt-F. These key bindings can be changed as described in the AutoSuggest reference manual.

Below is a basic script that uses the default ChatGPT 4o-mini "free tier" for its suggestions.

from prompt_toolkit import prompt
from prompt_toolkit.contrib.auto_suggest import LLMSuggest
from prompt_toolkit.auto_suggest import ThreadedAutoSuggest

suggester = LLMSuggest()
suggester = ThreadedAutoSuggest(suggester)  # wrap to avoid delays while typing

while True:
   response = prompt('> ', auto_suggest=suggester)
   print(f"You said '{response}'")

For this to work, you must have an OpenAI account, and a valid API key stored in the environment variable OPENAI_API_KEY . See platform.openai.com for details.

To use a different LLM model, initialize the suggester with an instantiated langchain chat model. This example shows how to use a llama3 model running on a local Ollama server:

from prompt_toolkit import prompt
from prompt_toolkit.contrib.auto_suggest import LLMSuggest
from prompt_toolkit.auto_suggest import ThreadedAutoSuggest
# new part starts here
from langchain.chat_models import init_chat_model  

model = init_chat_model('ollama:llama3', temperature=0.0)
suggester = LLMSuggest(chat_model=model)
suggester = ThreadedAutoSuggest(suggester)  

# everything below is the same

For this to work, you must have the optional langchain-ollama module installed, using pip install langchain-ollama.

After installing additional langchain chat modules, you can use many other backend LLMs, including Claude, Groq, Perplexity and Gemini.

Customization

Several optional constructor arguments allow you to adjust the behavior of the autosuggester. The persona argument allows you to adjust the style and tone of the completions. Compare the suggestions you get from these two versions:

suggester = LLMSuggest(persona='You are a copy editor for a technical journal')

vs

suggester = LLMSuggest(persona='You are a romance novelist, skilled in generating fullsome and lyrical prose.')

The optional temperature argument is a floating point number which controls the diversity of the suggestions. Lower numbers make the suggestions more deterministic. A temperature of 0.0 results in fully deterministic suggestions, but is more likely to generate cliches. Higher numbers generate prose that appear more creative, but may also introduce nonsensical completions.

Finally, the context argument allows you to pass additional text to the LLM to help it produce context-aware suggestions. This can be a plain text string, or a Callable that returns a string. A typical use case would be to pass the LLM the contents of a buffer containing a story that is being composed or the record of a chatbot conversation.

Additional options are described in the inline documentation.

Bugs and Limitations

Due to the design of the AutoSuggest class, suggestions can only be appended to the end of the current buffer, which means that inline suggestions in the style of GitHub Copilot are not feasible.
Latency can be an issue when using the remotely hosted LLMs, particularly when traffic is high. Always wrap the suggester in a ThreadedAutoSuggester to avoid blocking of keystrokes while the LLM is thinking. Consider using a local model for best responsiveness.
Some LLMs do not follow instructions to the letter and either insert additional whitespace into their responses, or fail to do so. The LLMSuggest code uses the PyEnchant library to figure out when the proferred completion is intended to complete a word or to start a new word. This is not 100% reliable.
Some LLMs will refuse to offer suggestions if the text appears to be violating their guardrails (e.g. promoting violence or hate speech). The package global prompt_toolkit.contrib.auto_suggest.llmsuggest.DEFAULT_PERSONA tells the LLM that it is an "uncensored writing assistant" to slightly reduce the rate of such refusals on borderline cases. You may wish to adjust the persona to be less lenient of such cases.

asmeurer · 2025-06-05T17:40:18Z

src/prompt_toolkit/contrib/auto_suggest/llmsuggest.py

+
+    def __init__(self,
+                 chat_model: Optional[BaseChatModel]=None,
+                 persona: str=DEFAULT_PERSONA,


Shouldn't the system message also be part of the init here?

Yeah. I'll add that feature.

I labored a long time to tune the system message to get the diverse LLMs to do the task right, so altering it may break the suggester, but caveat emptor.

I've added the ability to change the system message.

asmeurer · 2025-06-05T17:41:29Z

src/prompt_toolkit/contrib/auto_suggest/llmsuggest.py

+into full, natural-sounding sentences.
+"""
+
+SYSTEM_MESSAGE="""


I'm curious how well this works. The examples here don't mention code at all, but I imagine that would be the most common use-case of this.

It does completes code one-liners correctly. Unfortunately AutoSuggest doesn't handle newlines or tabs well, so I haven't really been able to get it to insert multiline completions.

Here's what you get when you provide the persona "You are a Perl coder" and type "#loop from one to ten:"

#Loop from one to ten: for ($i = 1; $i <= 10; $i++) { print "$i\n"; }

Another Perl one-liner, bold-facce is user input.

sub factorial() { my $n = shift; return 1 if $n == 0; return $n * fact($n-1); }

Python code completions are unsatisfactory due to newline handling.

I altered the system message to give an example of code completion. Now the LLM inserts newlines in the appropriate places and added an asis initialization argument that will allow the LLM to pass through newlines and other control characters.

…nging

lstein · 2025-06-06T20:29:59Z

I may have figured out how to do multiline suggesting for code completion. If you have any other "must haves" for this PR, let me know.

asmeurer · 2025-06-07T17:23:46Z

See the issue I referenced in the other issue for how I managed to get multiline autosuggestions working. It's a bit of hack though so long term we may want to update prompt-toolkit to handle that sort of thing better.

lstein · 2025-06-07T18:31:30Z

I've improved support for multi-line code completion. The behavior is that the greyed suggestion is compressed onto a single line with "^J" characters displayed where the newlines will be. Once the suggestion is accepted the multiple lines are inserted into the buffer appropriately.

Here's a video of the behavior. OpenAI was a bit slow on the second completion example, so be patient.

llm_autosuggest_demo-2025-06-07_14.24.49.mp4

Lincoln Stein added 2 commits June 4, 2025 17:39

add LLM suggester module and companion tests

8d881d4

applied local ruff rules

7891e36

lstein mentioned this pull request Jun 5, 2025

Does anyone want an LLM-based autosuggester? #1993

Open

asmeurer reviewed Jun 5, 2025

View reviewed changes

Lincoln Stein added 2 commits June 5, 2025 20:14

Add python coding examples to llmsuggest; let user change system message

819158c

added an "asis" initialization argument that disables all post-hoc mu…

b19a44d

…nging

added support in llmsuggest for code completion

158eff2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add an LLM-based AutoSuggester #1995

Add an LLM-based AutoSuggester #1995

lstein commented Jun 5, 2025

Uh oh!

asmeurer Jun 5, 2025

Uh oh!

lstein Jun 5, 2025

Uh oh!

lstein Jun 6, 2025

Uh oh!

asmeurer Jun 5, 2025

Uh oh!

lstein Jun 5, 2025 •

edited

Loading

Uh oh!

lstein Jun 6, 2025

Uh oh!

lstein commented Jun 6, 2025

Uh oh!

asmeurer commented Jun 7, 2025

Uh oh!

lstein commented Jun 7, 2025 •

edited

Loading

Uh oh!

Uh oh!

Add an LLM-based AutoSuggester #1995

Are you sure you want to change the base?

Add an LLM-based AutoSuggester #1995

Conversation

lstein commented Jun 5, 2025

Uh oh!

asmeurer Jun 5, 2025

Choose a reason for hiding this comment

Uh oh!

lstein Jun 5, 2025

Choose a reason for hiding this comment

Uh oh!

lstein Jun 6, 2025

Choose a reason for hiding this comment

Uh oh!

asmeurer Jun 5, 2025

Choose a reason for hiding this comment

Uh oh!

lstein Jun 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lstein Jun 6, 2025

Choose a reason for hiding this comment

Uh oh!

lstein commented Jun 6, 2025

Uh oh!

asmeurer commented Jun 7, 2025

Uh oh!

lstein commented Jun 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

lstein Jun 5, 2025 •

edited

Loading

lstein commented Jun 7, 2025 •

edited

Loading