Skip to content

Commit e1aa0ab

Browse files
committed
<merge>:merge 'feature/select-model' to master
2 parents 2774f17 + 7758354 commit e1aa0ab

22 files changed

+439
-120
lines changed

README.md

Lines changed: 76 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -35,60 +35,122 @@ git clone https://github.com/WisdomShell/codeshell-intellij.git
3535

3636
## 模型服务
3737

38-
[`llama_cpp_for_codeshell`](https://github.com/WisdomShell/llama_cpp_for_codeshell)项目提供[CodeShell大模型](https://github.com/WisdomShell/codeshell) 4bits量化后的模型服务,模型名称为`codeshell-chat-q4_0.gguf`。以下为部署模型服务步骤:
38+
[`llama_cpp_for_codeshell`](https://github.com/WisdomShell/llama_cpp_for_codeshell)项目提供[CodeShell大模型](https://github.com/WisdomShell/codeshell) 4bits量化后的模型,模型名称为`codeshell-chat-q4_0.gguf`。以下为部署模型服务步骤:
3939

40-
### 获取代码
40+
### 编译代码
41+
42+
+ Linux / Mac(Apple Silicon设备)
43+
44+
```bash
45+
git clone https://github.com/WisdomShell/llama_cpp_for_codeshell.git
46+
cd llama_cpp_for_codeshell
47+
make
48+
```
49+
50+
在 macOS 上,默认情况下启用了Metal,启用Metal可以将模型加载到 GPU 上运行,从而显著提升性能。
51+
52+
+ Mac(非Apple Silicon设备)
53+
54+
```bash
55+
git clone https://github.com/WisdomShell/llama_cpp_for_codeshell.git
56+
cd llama_cpp_for_codeshell
57+
LLAMA_NO_METAL=1 make
58+
```
59+
60+
对于非 Apple Silicon 芯片的 Mac 用户,在编译时可以使用 `LLAMA_NO_METAL=1``LLAMA_METAL=OFF` 的 CMake 选项来禁用Metal构建,从而使模型正常运行。
61+
62+
+ Windows
63+
64+
您可以选择在[Windows Subsystem for Linux](https://learn.microsoft.com/en-us/windows/wsl/about)中按照Linux的方法编译代码,也可以选择参考[llama.cpp仓库](https://github.com/ggerganov/llama.cpp#build)中的方法,配置好[w64devkit](https://github.com/skeeto/w64devkit/releases)后再按照Linux的方法编译。
65+
66+
### 下载模型
67+
68+
[Hugging Face Hub](https://huggingface.co/WisdomShell)上,我们提供了三种不同的模型,分别是[CodeShell-7B](https://huggingface.co/WisdomShell/CodeShell-7B)[CodeShell-7B-Chat](https://huggingface.co/WisdomShell/CodeShell-7B-Chat)[CodeShell-7B-Chat-int4](https://huggingface.co/WisdomShell/CodeShell-7B-Chat-int4)。以下是下载模型的步骤。
69+
70+
- 使用[CodeShell-7B-Chat-int4](https://huggingface.co/WisdomShell/CodeShell-7B-Chat-int4)模型推理,将模型下载到本地后并放置在以上代码中的 `llama_cpp_for_codeshell/models` 文件夹的路径
71+
72+
```
73+
git clone https://huggingface.co/WisdomShell/CodeShell-7B-Chat-int4/blob/main/codeshell-chat-q4_0.gguf
74+
```
75+
76+
- 使用[CodeShell-7B](https://huggingface.co/WisdomShell/CodeShell-7B)[CodeShell-7B-Chat](https://huggingface.co/WisdomShell/CodeShell-7B-Chat)推理,将模型放置在本地文件夹后,使用[TGI](https://github.com/WisdomShell/text-generation-inference.git)加载本地模型,启动模型服务
77+
78+
### 加载模型
79+
80+
- `CodeShell-7B-Chat-int4`模型使用`llama_cpp_for_codeshell`项目中的`server`命令即可提供API服务
4181

4282
```bash
43-
git clone https://github.com/WisdomShell/llama_cpp_for_codeshell.git
44-
cd llama_cpp_for_codeshell
45-
make
83+
./server -m ./models/codeshell-chat-q4_0.gguf --host 127.0.0.1 --port 8080
4684
```
4785

48-
注意:在 macOS 上,默认情况下启用了Metal架构,启用Metal 可以将模型加载到 GPU 上进行运行,从而显著提升性能。对于非 Apple Silicon 芯片的 Mac 用户,在编译时可以使用 `LLAMA_NO_METAL=1``LLAMA_METAL=OFF` 的 CMake 选项来禁用Metal构建,从而使模型正常运行。
86+
注意:对于编译时启用了 Metal 的情况下,若运行时出现异常,您也可以在命令行添加参数 `-ngl 0 `显式地禁用Metal GPU推理,从而使模型正常运行。
87+
88+
- [CodeShell-7B](https://huggingface.co/WisdomShell/CodeShell-7B)[CodeShell-7B-Chat](https://huggingface.co/WisdomShell/CodeShell-7B-Chat)模型,使用[TGI](https://github.com/WisdomShell/text-generation-inference.git)加载本地模型,启动模型服务
89+
90+
## 模型服务[NVIDIA GPU]
91+
92+
对于希望使用NVIDIA GPU进行推理的用户,可以使用[`text-generation-inference`](https://github.com/huggingface/text-generation-inference)项目部署[CodeShell大模型](https://github.com/WisdomShell/codeshell)。以下为部署模型服务步骤:
4993

5094
### 下载模型
5195

52-
[Hugging Face Hub](https://huggingface.co/WisdomShell/CodeShell-7B-Chat-int4/blob/main/codeshell-chat-q4_0.gguf)将模型下载到本地后,将模型放置在以上代码中的 `llama_cpp_for_codeshell/models` 路径,即可从本地加载模型。
96+
[Hugging Face Hub](https://huggingface.co/WisdomShell/CodeShell-7B-Chat)将模型下载到本地后,将模型放置在 `$HOME/models` 文件夹的路径下,即可从本地加载模型。
5397

5498
```bash
55-
git clone https://huggingface.co/WisdomShell/CodeShell-7B-Chat-int4/blob/main/codeshell-chat-q4_0.gguf
99+
git clone https://huggingface.co/WisdomShell/CodeShell-7B-Chat
56100
```
57101

58102
### 部署模型
59103

60-
使用`llama_cpp_for_codeshell`项目中的`server`命令即可提供API服务。
104+
使用以下命令即可用text-generation-inference进行GPU加速推理部署:
61105

62106
```bash
63-
./server -m ./models/codeshell-chat-q4_0.gguf --host 127.0.0.1 --port 8080
107+
docker run --gpus 'all' --shm-size 1g -p 9090:80 -v $HOME/models:/data \
108+
--env LOG_LEVEL="info,text_generation_router=debug" \
109+
ghcr.nju.edu.cn/huggingface/text-generation-inference:1.0.3 \
110+
--model-id /data/CodeShell-7B-Chat --num-shard 1 \
111+
--max-total-tokens 5000 --max-input-length 4096 \
112+
--max-stop-sequences 12 --trust-remote-code
64113
```
65114

66-
注意:对于编译时启用了 Metal 的情况下,您也可以在命令行添加参数 `-ngl 0 `显式地禁用Metal GPU推理,从而使模型正常运行
115+
更详细的参数说明请参考[text-generation-inference项目文档](https://github.com/huggingface/text-generation-inference)
67116

68117
## 配置插件
69118

70119
- 设置CodeShell大模型服务地址
71120
- 配置是否自动触发代码补全建议
72121
- 配置补全的最大tokens数量
73122
- 配置问答的最大tokens数量
123+
- 配置模型运行环境
124+
125+
注意:不同的模型运行环境可以在插件中进行配置。对于[CodeShell-7B-Chat-int4](https://huggingface.co/WisdomShell/CodeShell-7B-Chat-int4)模型,您可以在`Model Runtime Environment`选项中选择`Use CPU Mode(with llama.cpp)`选项。而对于[CodeShell-7B](https://huggingface.co/WisdomShell/CodeShell-7B)[CodeShell-7B-Chat](https://huggingface.co/WisdomShell/CodeShell-7B-Chat)模型,应选择`Use GPU Model(with TGI framework)`选项。
74126

75127
![插件配置截图](https://resource.zsmarter.cn/appdata/codeshell-intellij/screenshots/code_config.png)
76128

77129
## 功能特性
78130

79-
### 1. 代码辅助
131+
### 1. 代码补全
132+
133+
- 自动触发代码建议
134+
135+
在编码时,当您停止输入时,代码建议将自动触发(可在插件设置中配置延迟时间为1到3秒)。
136+
137+
当插件提供代码建议时,建议内容以灰色显示在编辑器光标位置,您可以按下Tab键来接受该建议,或者继续输入以忽略该建议。
138+
139+
![代码建议截图](https://resource.zsmarter.cn/appdata/codeshell-vscode/screenshots/docs_completion.png)
140+
141+
### 2. 代码辅助
80142

81143
- 对一段代码进行解释/优化/清理
82144
- 为一段代码生成注释/单元测试
83145
- 检查一段代码是否存在性能/安全性问题
84146

85147
在IDE侧边栏中打开插件问答界面,在编辑器中选中一段代码,在鼠标右键CodeShell菜单中选择对应的功能项,插件将在问答界面中给出相应的答复。
86148

87-
在问答界面的代码块中,可以点击复制按钮复制该代码块,也可点击插入按钮将该代码块内容插入到编辑器光标处
149+
在问答界面的代码块中,可以点击复制按钮复制该代码块。
88150

89151
![代码辅助截图](https://resource.zsmarter.cn/appdata/codeshell-intellij/screenshots/code_inte.png)
90152

91-
### 2. 智能问答
153+
### 3. 智能问答
92154

93155
- 支持多轮对话
94156
- 可编辑问题,重新提问

README_EN.md

Lines changed: 76 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -37,48 +37,109 @@ git clone https://github.com/WisdomShell/codeshell-intellij.git
3737

3838
The [`llama_cpp_for_codeshell`](https://github.com/WisdomShell/llama_cpp_for_codeshell) project provides the 4-bit quantized model service of the [CodeShell](https://github.com/WisdomShell/codeshell) LLM, named `codeshell-chat-q4_0.gguf`. Here are the steps to deploy the model service:
3939

40-
### Get the Code
40+
### Compile the code
41+
42+
+ Linux / Mac(Apple Silicon Devices)
43+
44+
```bash
45+
git clone https://github.com/WisdomShell/llama_cpp_for_codeshell.git
46+
cd llama_cpp_for_codeshell
47+
make
48+
```
49+
50+
On macOS, Metal is enabled by default, which allows loading the model onto the GPU for significant performance improvements.
51+
52+
+ Mac(Non Apple Silicon Devices)
53+
54+
```bash
55+
git clone https://github.com/WisdomShell/llama_cpp_for_codeshell.git
56+
cd llama_cpp_for_codeshell
57+
LLAMA_NO_METAL=1 make
58+
```
59+
60+
For Mac users with non-Apple Silicon chips, you can disable Metal builds during compilation using the CMake options `LLAMA_NO_METAL=1` or `LLAMA_METAL=OFF` to ensure the model runs properly.
61+
62+
+ Windows
63+
64+
You have the option to compile the code using the Linux approach within the [Windows Subsystem for Linux](https://learn.microsoft.com/en-us/windows/wsl/about) or you can follow the instructions provided in the [llama.cpp repository](https://github.com/ggerganov/llama.cpp#build). Another option is to configure [w64devkit](https://github.com/skeeto/w64devkit/releases) and then proceed with the Linux compilation method.
65+
66+
67+
### Download the model
68+
69+
On the [Hugging Face Hub](https://huggingface.co/WisdomShell), we provide three different models: [CodeShell-7B](https://huggingface.co/WisdomShell/CodeShell-7B), [CodeShell-7B-Chat](https://huggingface.co/WisdomShell/CodeShell-7B-Chat), and [CodeShell-7B-Chat-int4](https://huggingface.co/WisdomShell/CodeShell-7B-Chat-int4). Below are the steps to download these models.
70+
71+
- To perform inference using the [CodeShell-7B-Chat-int4](https://huggingface.co/WisdomShell/CodeShell-7B-Chat-int4) model, download the model to your local machine and place it in the path of the `llama_cpp_for_codeshell/models` folder as indicated in the code above.
72+
73+
```
74+
git clone https://huggingface.co/WisdomShell/CodeShell-7B-Chat-int4/blob/main/codeshell-chat-q4_0.gguf
75+
```
76+
77+
- For performing inference using [CodeShell-7B](https://huggingface.co/WisdomShell/CodeShell-7B) and [CodeShell-7B-Chat](https://huggingface.co/WisdomShell/CodeShell-7B-Chat) models, after placing the models in a local folder, you can utilize [TGI (Text Generation Inference)](https://github.com/WisdomShell/text-generation-inference.git) to load these local models and initiate the model service.
78+
79+
### Load the model
80+
81+
- The `CodeShell-7B-Chat-int4` model can be served as an API using the `server` command within the `llama_cpp_for_codeshell` project.
4182

4283
```bash
43-
git clone https://github.com/WisdomShell/llama_cpp_for_codeshell.git
44-
cd llama_cpp_for_codeshell
45-
make
84+
./server -m ./models/codeshell-chat-q4_0.gguf --host 127.0.0.1 --port 8080
4685
```
4786

48-
Note: On macOS, the Metal architecture is enabled by default, and enabling Metal allows models to be loaded and executed on the GPU, significantly improving performance. For Mac users with non-Apple Silicon chips, you can disable Metal build during compilation using CMake options `LLAMA_NO_METAL=1` or `LLAMA_METAL=OFF` to ensure that the model functions properly.
87+
Note: In cases where Metal is enabled during compilation, if you encounter runtime exceptions, you can explicitly disable Metal GPU inference by adding the `-ngl 0` parameter in the command line to ensure the proper functioning of the model.
88+
89+
- [CodeShell-7B](https://huggingface.co/WisdomShell/CodeShell-7B) and [CodeShell-7B-Chat](https://huggingface.co/WisdomShell/CodeShell-7B-Chat) models, loading local models with [TGI](https://github.com/WisdomShell/text-generation-inference.git) and starting the model service.
4990

50-
### Load the model locally
91+
## Model Service [NVIDIA GPU]
5192

52-
After downloading the model from the [Hugging Face Hub](https://huggingface.co/WisdomShell/CodeShell-7B-Chat-int4/blob/main/codeshell-chat-q4_0.gguf) to your local machine, placing the model in the `llama_cpp_for_codeshell/models` folder path in the above code will allow you to load the model locally.
93+
For users wishing to use NVIDIA GPUs for inference, the [`text-generation-inference`](https://github.com/huggingface/text-generation-inference) project can be used to deploy the [CodeShell Large Model](https://github.com/WisdomShell/codeshell). Below are the steps to deploy the model service:
94+
95+
### Download the Model
96+
97+
After downloading the model from the [Hugging Face Hub](https://huggingface.co/WisdomShell/CodeShell-7B-Chat) to your local machine, place the model under the path of the `$HOME/models` folder, and you can load the model locally.
5398

5499
```bash
55-
git clone https://huggingface.co/WisdomShell/CodeShell-7B-Chat-int4/blob/main/codeshell-chat-q4_0.gguf
100+
git clone https://huggingface.co/WisdomShell/CodeShell-7B-Chat
56101
```
57102

58103
### Deploy the Model
59104

60-
Use the `server` command in the `llama_cpp_for_codeshell` project to provide API services.
105+
The following command can be used for GPU-accelerated inference deployment with text-generation-inference:
61106

62107
```bash
63-
./server -m ./models/codeshell-chat-q4_0.gguf --host 127.0.0.1 --port 8080
108+
docker run --gpus 'all' --shm-size 1g -p 9090:80 -v $HOME/models:/data \
109+
--env LOG_LEVEL="info,text_generation_router=debug" \
110+
ghcr.nju.edu.cn/huggingface/text-generation-inference:1.0.3 \
111+
--model-id /data/CodeShell-7B-Chat --num-shard 1 \
112+
--max-total-tokens 5000 --max-input-length 4096 \
113+
--max-stop-sequences 12 --trust-remote-code
64114
```
65115

66-
The default deployment is on local port 8080, and it can be called through the POST method.
67-
68-
Note: In cases where Metal is enabled during compilation, you can also explicitly disable Metal GPU inference by adding the command-line parameter `-ngl 0`, ensuring that the model functions properly.
116+
For a more detailed explanation of the parameters, please refer to the [text-generation-inference project documentation](https://github.com/huggingface/text-generation-inference).
69117

70118
## Configure the Plugin
71119

72120
- Set the address for the CodeShell service
73121
- Configure whether to enable automatic code completion suggestions
74122
- Specify the maximum number of tokens for code completion
75123
- Specify the maximum number of tokens for Q&A
124+
- Configure the model runtime environment
125+
126+
Note: Different model runtime environments can be configured within the plugin. For the [CodeShell-7B-Chat-int4](https://huggingface.co/WisdomShell/CodeShell-7B-Chat-int4) model, you can choose the `Model Runtime Environment`option in the `Use CPU Mode(with llama.cpp)` menu. However, for the [CodeShell-7B](https://huggingface.co/WisdomShell/CodeShell-7B) and [CodeShell-7B-Chat](https://huggingface.co/WisdomShell/CodeShell-7B-Chat) models, you should select the `Use GPU Model(with TGI framework)` option.
76127

77128
![插件配置截图](https://resource.zsmarter.cn/appdata/codeshell-intellij/screenshots/code_config.png)
78129

79130
## Features
80131

81-
### 1. Code Assistance
132+
### 1. Code Completion
133+
134+
- Automatic Code Suggestions
135+
136+
During coding, when you stop typing, code suggestions will automatically trigger (configurable with a delay of 1 to 3 seconds in the plugin settings).
137+
138+
When the plugin provides code suggestions, they are displayed in gray at the editor's cursor location. You can press the Tab key to accept the suggestion or continue typing to ignore it.
139+
140+
![代码建议截图](https://resource.zsmarter.cn/appdata/codeshell-vscode/screenshots/docs_completion.png)
141+
142+
### 2. Code Assistance
82143

83144
- Explain/Optimize/Cleanse a Code Segment
84145
- Generate Comments/Unit Tests for Code
@@ -90,7 +151,7 @@ Within the Q&A interface's code block, you can click the copy button to copy the
90151

91152
![代码辅助截图](https://resource.zsmarter.cn/appdata/codeshell-intellij/screenshots/code_inte.png)
92153

93-
### 2. Code Q&A
154+
### 3. Code Q&A
94155

95156
- Support for Multi-turn Conversations
96157
- Edit Questions and Rephrase Inquiries

build.gradle.kts

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ plugins {
44
}
55

66
group = "com.codeshell.intellij"
7-
version = "0.0.1"
7+
version = "0.0.2"
88

99
repositories {
1010
mavenCentral()

src/main/java/com/codeshell/intellij/enums/ChatMaxToken.java

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,10 @@
11
package com.codeshell.intellij.enums;
22

33
public enum ChatMaxToken {
4-
LOW("128"),
5-
MEDIUM("512"),
6-
HIGH("1024"),
7-
ULTRA("2048");
4+
LOW("1024"),
5+
MEDIUM("2048"),
6+
HIGH("4096"),
7+
ULTRA("8192");
88

99
private final String description;
1010

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
package com.codeshell.intellij.enums;
2+
3+
public enum CodeShellURI {
4+
5+
CPU_COMPLETE("/infill"),
6+
CPU_CHAT("/completion"),
7+
GPU_COMPLETE("/generate"),
8+
GPU_CHAT("/generate_stream");
9+
10+
private final String uri;
11+
12+
CodeShellURI(String uri) {
13+
this.uri = uri;
14+
}
15+
16+
public String getUri() {
17+
return uri;
18+
}
19+
20+
}

src/main/java/com/codeshell/intellij/enums/CompletionMaxToken.java

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -2,10 +2,10 @@
22

33
public enum CompletionMaxToken {
44

5-
LOW("128"),
6-
MEDIUM("512"),
7-
HIGH("1024"),
8-
ULTRA("2048");
5+
LOW("32"),
6+
MEDIUM("64"),
7+
HIGH("128"),
8+
ULTRA("256");
99

1010
private final String description;
1111

0 commit comments

Comments
 (0)