WisdomShell
diff --git a/‎README.md
Lines changed: 76 additions & 14 deletions b/‎README.md
Lines changed: 76 additions & 14 deletions
diff --git a/‎README_EN.md
Lines changed: 76 additions & 15 deletions b/‎README_EN.md
Lines changed: 76 additions & 15 deletions
diff --git a/‎build.gradle.kts
Lines changed: 1 addition & 1 deletion b/‎build.gradle.kts
Lines changed: 1 addition & 1 deletion
diff --git a/‎src/main/java/com/codeshell/intellij/enums/ChatMaxToken.java
Lines changed: 4 additions & 4 deletions b/‎src/main/java/com/codeshell/intellij/enums/ChatMaxToken.java
Lines changed: 4 additions & 4 deletions
diff --git a/‎src/main/java/com/codeshell/intellij/enums/CodeShellURI.java
Lines changed: 20 additions & 0 deletions b/‎src/main/java/com/codeshell/intellij/enums/CodeShellURI.java
Lines changed: 20 additions & 0 deletions
diff --git a/‎src/main/java/com/codeshell/intellij/enums/CompletionMaxToken.java
Lines changed: 4 additions & 4 deletions b/‎src/main/java/com/codeshell/intellij/enums/CompletionMaxToken.java
Lines changed: 4 additions & 4 deletions
@@ -35,60 +35,122 @@ git clone https://github.com/WisdomShell/codeshell-intellij.git
 
 ##  模型服务
 
-[`llama_cpp_for_codeshell`](https://github.com/WisdomShell/llama_cpp_for_codeshell)项目提供[CodeShell大模型](https://github.com/WisdomShell/codeshell) 4bits量化后的模型服务，模型名称为`codeshell-chat-q4_0.gguf`。以下为部署模型服务步骤：
+[`llama_cpp_for_codeshell`](https://github.com/WisdomShell/llama_cpp_for_codeshell)项目提供[CodeShell大模型](https://github.com/WisdomShell/codeshell) 4bits量化后的模型，模型名称为`codeshell-chat-q4_0.gguf`。以下为部署模型服务步骤：
 
-### 获取代码
+### 编译代码
+
++ Linux / Mac(Apple Silicon设备)
+
+  ```bash
+  git clone https://github.com/WisdomShell/llama_cpp_for_codeshell.git
+  cd llama_cpp_for_codeshell
+  make
+  ```
+
+  在 macOS 上，默认情况下启用了Metal，启用Metal可以将模型加载到 GPU 上运行，从而显著提升性能。
+
++ Mac(非Apple Silicon设备)
+
+  ```bash
+  git clone https://github.com/WisdomShell/llama_cpp_for_codeshell.git
+  cd llama_cpp_for_codeshell
+  LLAMA_NO_METAL=1 make
+  ```
+
+  对于非 Apple Silicon 芯片的 Mac 用户，在编译时可以使用 `LLAMA_NO_METAL=1` 或 `LLAMA_METAL=OFF` 的 CMake 选项来禁用Metal构建，从而使模型正常运行。
+
++ Windows
+
+  您可以选择在[Windows Subsystem for Linux](https://learn.microsoft.com/en-us/windows/wsl/about)中按照Linux的方法编译代码，也可以选择参考[llama.cpp仓库](https://github.com/ggerganov/llama.cpp#build)中的方法，配置好[w64devkit](https://github.com/skeeto/w64devkit/releases)后再按照Linux的方法编译。
+
+### 下载模型
+
+在[Hugging Face Hub](https://huggingface.co/WisdomShell)上，我们提供了三种不同的模型，分别是[CodeShell-7B](https://huggingface.co/WisdomShell/CodeShell-7B)、[CodeShell-7B-Chat](https://huggingface.co/WisdomShell/CodeShell-7B-Chat)和[CodeShell-7B-Chat-int4](https://huggingface.co/WisdomShell/CodeShell-7B-Chat-int4)。以下是下载模型的步骤。
+
+- 使用[CodeShell-7B-Chat-int4](https://huggingface.co/WisdomShell/CodeShell-7B-Chat-int4)模型推理，将模型下载到本地后并放置在以上代码中的 `llama_cpp_for_codeshell/models` 文件夹的路径
+
+ ```
+ git clone https://huggingface.co/WisdomShell/CodeShell-7B-Chat-int4/blob/main/codeshell-chat-q4_0.gguf
+ ```
+
+- 使用[CodeShell-7B](https://huggingface.co/WisdomShell/CodeShell-7B)、[CodeShell-7B-Chat](https://huggingface.co/WisdomShell/CodeShell-7B-Chat)推理，将模型放置在本地文件夹后，使用[TGI](https://github.com/WisdomShell/text-generation-inference.git)加载本地模型，启动模型服务
+
+### 加载模型
+
+- `CodeShell-7B-Chat-int4`模型使用`llama_cpp_for_codeshell`项目中的`server`命令即可提供API服务
 
 ```bash
-git clone https://github.com/WisdomShell/llama_cpp_for_codeshell.git
-cd llama_cpp_for_codeshell
-make
+./server -m ./models/codeshell-chat-q4_0.gguf --host 127.0.0.1 --port 8080
 ```
 
-注意：在 macOS 上，默认情况下启用了Metal架构，启用Metal 可以将模型加载到 GPU 上进行运行，从而显著提升性能。对于非 Apple Silicon 芯片的 Mac 用户，在编译时可以使用 `LLAMA_NO_METAL=1` 或 `LLAMA_METAL=OFF` 的 CMake 选项来禁用Metal构建，从而使模型正常运行。
+注意：对于编译时启用了 Metal 的情况下，若运行时出现异常，您也可以在命令行添加参数 `-ngl 0 `显式地禁用Metal GPU推理，从而使模型正常运行。
+
+- [CodeShell-7B](https://huggingface.co/WisdomShell/CodeShell-7B)和[CodeShell-7B-Chat](https://huggingface.co/WisdomShell/CodeShell-7B-Chat)模型，使用[TGI](https://github.com/WisdomShell/text-generation-inference.git)加载本地模型，启动模型服务
+
+## 模型服务[NVIDIA GPU]
+
+对于希望使用NVIDIA GPU进行推理的用户，可以使用[`text-generation-inference`](https://github.com/huggingface/text-generation-inference)项目部署[CodeShell大模型](https://github.com/WisdomShell/codeshell)。以下为部署模型服务步骤：
 
 ### 下载模型
 
-在 [Hugging Face Hub](https://huggingface.co/WisdomShell/CodeShell-7B-Chat-int4/blob/main/codeshell-chat-q4_0.gguf)将模型下载到本地后，将模型放置在以上代码中的 `llama_cpp_for_codeshell/models` 路径，即可从本地加载模型。
+在 [Hugging Face Hub](https://huggingface.co/WisdomShell/CodeShell-7B-Chat)将模型下载到本地后，将模型放置在 `$HOME/models` 文件夹的路径下，即可从本地加载模型。
 
 ```bash
-git clone https://huggingface.co/WisdomShell/CodeShell-7B-Chat-int4/blob/main/codeshell-chat-q4_0.gguf
+git clone https://huggingface.co/WisdomShell/CodeShell-7B-Chat
 ```
 
 ### 部署模型
 
-使用`llama_cpp_for_codeshell`项目中的`server`命令即可提供API服务。
+使用以下命令即可用text-generation-inference进行GPU加速推理部署：
 
 ```bash
-./server -m ./models/codeshell-chat-q4_0.gguf --host 127.0.0.1 --port 8080
+docker run --gpus 'all' --shm-size 1g -p 9090:80 -v $HOME/models:/data \
+        --env LOG_LEVEL="info,text_generation_router=debug" \
+        ghcr.nju.edu.cn/huggingface/text-generation-inference:1.0.3 \
+        --model-id /data/CodeShell-7B-Chat --num-shard 1 \
+        --max-total-tokens 5000 --max-input-length 4096 \
+        --max-stop-sequences 12 --trust-remote-code
 ```
 
-注意：对于编译时启用了 Metal 的情况下，您也可以在命令行添加参数 `-ngl 0 `显式地禁用Metal GPU推理，从而使模型正常运行。
+更详细的参数说明请参考[text-generation-inference项目文档](https://github.com/huggingface/text-generation-inference)。
 
 ## 配置插件
 
 - 设置CodeShell大模型服务地址
 - 配置是否自动触发代码补全建议
 - 配置补全的最大tokens数量
 - 配置问答的最大tokens数量
+- 配置模型运行环境
+
+注意：不同的模型运行环境可以在插件中进行配置。对于[CodeShell-7B-Chat-int4](https://huggingface.co/WisdomShell/CodeShell-7B-Chat-int4)模型，您可以在`Model Runtime Environment`选项中选择`Use CPU Mode(with llama.cpp)`选项。而对于[CodeShell-7B](https://huggingface.co/WisdomShell/CodeShell-7B)和[CodeShell-7B-Chat](https://huggingface.co/WisdomShell/CodeShell-7B-Chat)模型，应选择`Use GPU Model(with TGI framework)`选项。
 
 ![插件配置截图](https://resource.zsmarter.cn/appdata/codeshell-intellij/screenshots/code_config.png)
 
 ## 功能特性
 
-### 1. 代码辅助
+### 1. 代码补全
+
+- 自动触发代码建议
+
+在编码时，当您停止输入时，代码建议将自动触发（可在插件设置中配置延迟时间为1到3秒）。
+
+当插件提供代码建议时，建议内容以灰色显示在编辑器光标位置，您可以按下Tab键来接受该建议，或者继续输入以忽略该建议。
+
+![代码建议截图](https://resource.zsmarter.cn/appdata/codeshell-vscode/screenshots/docs_completion.png)
+
+### 2. 代码辅助
 
 - 对一段代码进行解释/优化/清理
 - 为一段代码生成注释/单元测试
 - 检查一段代码是否存在性能/安全性问题
 
 在IDE侧边栏中打开插件问答界面，在编辑器中选中一段代码，在鼠标右键CodeShell菜单中选择对应的功能项，插件将在问答界面中给出相应的答复。
 
-在问答界面的代码块中，可以点击复制按钮复制该代码块，也可点击插入按钮将该代码块内容插入到编辑器光标处。
+在问答界面的代码块中，可以点击复制按钮复制该代码块。
 
 ![代码辅助截图](https://resource.zsmarter.cn/appdata/codeshell-intellij/screenshots/code_inte.png)
 
-### 2. 智能问答
+### 3. 智能问答
 
 - 支持多轮对话
 - 可编辑问题，重新提问
 
@@ -37,48 +37,109 @@ git clone https://github.com/WisdomShell/codeshell-intellij.git
 
 The [`llama_cpp_for_codeshell`](https://github.com/WisdomShell/llama_cpp_for_codeshell) project provides the 4-bit quantized model service of the [CodeShell](https://github.com/WisdomShell/codeshell) LLM, named `codeshell-chat-q4_0.gguf`. Here are the steps to deploy the model service:
 
-### Get the Code
+### Compile the code
+
++ Linux / Mac(Apple Silicon Devices)
+
+  ```bash
+  git clone https://github.com/WisdomShell/llama_cpp_for_codeshell.git
+  cd llama_cpp_for_codeshell
+  make
+  ```
+
+  On macOS, Metal is enabled by default, which allows loading the model onto the GPU for significant performance improvements.
+
++ Mac(Non Apple Silicon Devices)
+
+  ```bash
+  git clone https://github.com/WisdomShell/llama_cpp_for_codeshell.git
+  cd llama_cpp_for_codeshell
+  LLAMA_NO_METAL=1 make
+  ```
+
+  For Mac users with non-Apple Silicon chips, you can disable Metal builds during compilation using the CMake options `LLAMA_NO_METAL=1` or `LLAMA_METAL=OFF` to ensure the model runs properly.
+
++ Windows
+
+  You have the option to compile the code using the Linux approach within the [Windows Subsystem for Linux](https://learn.microsoft.com/en-us/windows/wsl/about) or you can follow the instructions provided in the [llama.cpp repository](https://github.com/ggerganov/llama.cpp#build). Another option is to configure [w64devkit](https://github.com/skeeto/w64devkit/releases) and then proceed with the Linux compilation method.
+
+
+### Download the model
+
+On the [Hugging Face Hub](https://huggingface.co/WisdomShell), we provide three different models: [CodeShell-7B](https://huggingface.co/WisdomShell/CodeShell-7B), [CodeShell-7B-Chat](https://huggingface.co/WisdomShell/CodeShell-7B-Chat), and [CodeShell-7B-Chat-int4](https://huggingface.co/WisdomShell/CodeShell-7B-Chat-int4). Below are the steps to download these models.
+
+- To perform inference using the [CodeShell-7B-Chat-int4](https://huggingface.co/WisdomShell/CodeShell-7B-Chat-int4) model, download the model to your local machine and place it in the path of the `llama_cpp_for_codeshell/models` folder as indicated in the code above.
+
+ ```
+ git clone https://huggingface.co/WisdomShell/CodeShell-7B-Chat-int4/blob/main/codeshell-chat-q4_0.gguf
+ ```
+
+- For performing inference using [CodeShell-7B](https://huggingface.co/WisdomShell/CodeShell-7B) and [CodeShell-7B-Chat](https://huggingface.co/WisdomShell/CodeShell-7B-Chat) models, after placing the models in a local folder, you can utilize [TGI (Text Generation Inference)](https://github.com/WisdomShell/text-generation-inference.git) to load these local models and initiate the model service.
+
+### Load the model
+
+- The `CodeShell-7B-Chat-int4` model can be served as an API using the `server` command within the `llama_cpp_for_codeshell` project.
 
 ```bash
-git clone https://github.com/WisdomShell/llama_cpp_for_codeshell.git
-cd llama_cpp_for_codeshell
-make
+./server -m ./models/codeshell-chat-q4_0.gguf --host 127.0.0.1 --port 8080
 ```
 
-Note: On macOS, the Metal architecture is enabled by default, and enabling Metal allows models to be loaded and executed on the GPU, significantly improving performance. For Mac users with non-Apple Silicon chips, you can disable Metal build during compilation using CMake options `LLAMA_NO_METAL=1` or `LLAMA_METAL=OFF` to ensure that the model functions properly.
+Note: In cases where Metal is enabled during compilation, if you encounter runtime exceptions, you can explicitly disable Metal GPU inference by adding the `-ngl 0` parameter in the command line to ensure the proper functioning of the model.
+
+- [CodeShell-7B](https://huggingface.co/WisdomShell/CodeShell-7B) and [CodeShell-7B-Chat](https://huggingface.co/WisdomShell/CodeShell-7B-Chat) models, loading local models with [TGI](https://github.com/WisdomShell/text-generation-inference.git) and starting the model service.
 
-### Load the model locally
+## Model Service [NVIDIA GPU]
 
-After downloading the model from the [Hugging Face Hub](https://huggingface.co/WisdomShell/CodeShell-7B-Chat-int4/blob/main/codeshell-chat-q4_0.gguf) to your local machine, placing the model in the `llama_cpp_for_codeshell/models` folder path in the above code will allow you to load the model locally.
+For users wishing to use NVIDIA GPUs for inference, the [`text-generation-inference`](https://github.com/huggingface/text-generation-inference) project can be used to deploy the [CodeShell Large Model](https://github.com/WisdomShell/codeshell). Below are the steps to deploy the model service:
+
+### Download the Model
+
+After downloading the model from the [Hugging Face Hub](https://huggingface.co/WisdomShell/CodeShell-7B-Chat) to your local machine, place the model under the path of the `$HOME/models` folder, and you can load the model locally.
 
 ```bash
-git clone https://huggingface.co/WisdomShell/CodeShell-7B-Chat-int4/blob/main/codeshell-chat-q4_0.gguf
+git clone https://huggingface.co/WisdomShell/CodeShell-7B-Chat
 ```
 
 ### Deploy the Model
 
-Use the `server` command in the `llama_cpp_for_codeshell` project to provide API services.
+The following command can be used for GPU-accelerated inference deployment with text-generation-inference:
 
 ```bash
-./server -m ./models/codeshell-chat-q4_0.gguf --host 127.0.0.1 --port 8080
+docker run --gpus 'all' --shm-size 1g -p 9090:80 -v $HOME/models:/data \
+        --env LOG_LEVEL="info,text_generation_router=debug" \
+        ghcr.nju.edu.cn/huggingface/text-generation-inference:1.0.3 \
+        --model-id /data/CodeShell-7B-Chat --num-shard 1 \
+        --max-total-tokens 5000 --max-input-length 4096 \
+        --max-stop-sequences 12 --trust-remote-code
 ```
 
-The default deployment is on local port 8080, and it can be called through the POST method.
-
-Note: In cases where Metal is enabled during compilation, you can also explicitly disable Metal GPU inference by adding the command-line parameter `-ngl 0`, ensuring that the model functions properly.
+For a more detailed explanation of the parameters, please refer to the [text-generation-inference project documentation](https://github.com/huggingface/text-generation-inference).
 
 ##  Configure the Plugin
 
 - Set the address for the CodeShell service
 - Configure whether to enable automatic code completion suggestions
 - Specify the maximum number of tokens for code completion
 - Specify the maximum number of tokens for Q&A
+- Configure the model runtime environment
+
+Note: Different model runtime environments can be configured within the plugin. For the [CodeShell-7B-Chat-int4](https://huggingface.co/WisdomShell/CodeShell-7B-Chat-int4) model, you can choose the `Model Runtime Environment`option in the `Use CPU Mode(with llama.cpp)` menu. However, for the [CodeShell-7B](https://huggingface.co/WisdomShell/CodeShell-7B) and [CodeShell-7B-Chat](https://huggingface.co/WisdomShell/CodeShell-7B-Chat) models, you should select the `Use GPU Model(with TGI framework)` option.
 
 ![插件配置截图](https://resource.zsmarter.cn/appdata/codeshell-intellij/screenshots/code_config.png)
 
 ## Features
 
-### 1. Code Assistance
+### 1. Code Completion
+
+- Automatic Code Suggestions
+
+During coding, when you stop typing, code suggestions will automatically trigger (configurable with a delay of 1 to 3 seconds in the plugin settings).
+
+When the plugin provides code suggestions, they are displayed in gray at the editor's cursor location. You can press the Tab key to accept the suggestion or continue typing to ignore it.
+
+![代码建议截图](https://resource.zsmarter.cn/appdata/codeshell-vscode/screenshots/docs_completion.png)
+
+### 2. Code Assistance
 
 - Explain/Optimize/Cleanse a Code Segment
 - Generate Comments/Unit Tests for Code
@@ -90,7 +151,7 @@ Within the Q&A interface's code block, you can click the copy button to copy the
 
 ![代码辅助截图](https://resource.zsmarter.cn/appdata/codeshell-intellij/screenshots/code_inte.png)
 
-### 2. Code Q&A
+### 3. Code Q&A
 
 - Support for Multi-turn Conversations
 - Edit Questions and Rephrase Inquiries
 
@@ -4,7 +4,7 @@ plugins {
 }
 
 group = "com.codeshell.intellij"
-version = "0.0.1"
+version = "0.0.2"
 
 repositories {
     mavenCentral()
 
@@ -1,10 +1,10 @@
 package com.codeshell.intellij.enums;
 
 public enum ChatMaxToken {
-    LOW("128"),
-    MEDIUM("512"),
-    HIGH("1024"),
-    ULTRA("2048");
+    LOW("1024"),
+    MEDIUM("2048"),
+    HIGH("4096"),
+    ULTRA("8192");
 
     private final String description;
 
 
@@ -0,0 +1,20 @@
+package com.codeshell.intellij.enums;
+
+public enum CodeShellURI {
+
+    CPU_COMPLETE("/infill"),
+    CPU_CHAT("/completion"),
+    GPU_COMPLETE("/generate"),
+    GPU_CHAT("/generate_stream");
+
+    private final String uri;
+
+    CodeShellURI(String uri) {
+        this.uri = uri;
+    }
+
+    public String getUri() {
+        return uri;
+    }
+
+}
@@ -2,10 +2,10 @@
 
 public enum CompletionMaxToken {
 
-    LOW("128"),
-    MEDIUM("512"),
-    HIGH("1024"),
-    ULTRA("2048");
+    LOW("32"),
+    MEDIUM("64"),
+    HIGH("128"),
+    ULTRA("256");
 
     private final String description;
Original file line number	Diff line number	Diff line change
`@@ -4,7 +4,7 @@ plugins {`
`4`	`4`	`}`
`5`	`5`
`6`	`6`	`group = "com.codeshell.intellij"`
`7`		`-version = "0.0.1"`
	`7`	`+version = "0.0.2"`
`8`	`8`
`9`	`9`	`repositories {`
`10`	`10`	`mavenCentral()`