You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
对于非 Apple Silicon 芯片的 Mac 用户,在编译时可以使用 `LLAMA_NO_METAL=1` 或 `LLAMA_METAL=OFF` 的 CMake 选项来禁用Metal构建,从而使模型正常运行。
61
+
62
+
+ Windows
63
+
64
+
您可以选择在[Windows Subsystem for Linux](https://learn.microsoft.com/en-us/windows/wsl/about)中按照Linux的方法编译代码,也可以选择参考[llama.cpp仓库](https://github.com/ggerganov/llama.cpp#build)中的方法,配置好[w64devkit](https://github.com/skeeto/w64devkit/releases)后再按照Linux的方法编译。
65
+
66
+
### 下载模型
67
+
68
+
在[Hugging Face Hub](https://huggingface.co/WisdomShell)上,我们提供了三种不同的模型,分别是[CodeShell-7B](https://huggingface.co/WisdomShell/CodeShell-7B)、[CodeShell-7B-Chat](https://huggingface.co/WisdomShell/CodeShell-7B-Chat)和[CodeShell-7B-Chat-int4](https://huggingface.co/WisdomShell/CodeShell-7B-Chat-int4)。以下是下载模型的步骤。
在 [Hugging Face Hub](https://huggingface.co/WisdomShell/CodeShell-7B-Chat-int4/blob/main/codeshell-chat-q4_0.gguf)将模型下载到本地后,将模型放置在以上代码中的 `llama_cpp_for_codeshell/models`路径,即可从本地加载模型。
96
+
在 [Hugging Face Hub](https://huggingface.co/WisdomShell/CodeShell-7B-Chat)将模型下载到本地后,将模型放置在 `$HOME/models`文件夹的路径下,即可从本地加载模型。
The [`llama_cpp_for_codeshell`](https://github.com/WisdomShell/llama_cpp_for_codeshell) project provides the 4-bit quantized model service of the [CodeShell](https://github.com/WisdomShell/codeshell) LLM, named `codeshell-chat-q4_0.gguf`. Here are the steps to deploy the model service:
For Mac users with non-Apple Silicon chips, you can disable Metal builds during compilation using the CMake options `LLAMA_NO_METAL=1` or `LLAMA_METAL=OFF` to ensure the model runs properly.
61
+
62
+
+ Windows
63
+
64
+
You have the option to compile the code using the Linux approach within the [Windows Subsystem for Linux](https://learn.microsoft.com/en-us/windows/wsl/about) or you can follow the instructions provided in the [llama.cpp repository](https://github.com/ggerganov/llama.cpp#build). Another option is to configure [w64devkit](https://github.com/skeeto/w64devkit/releases) and then proceed with the Linux compilation method.
65
+
66
+
67
+
### Download the model
68
+
69
+
On the [Hugging Face Hub](https://huggingface.co/WisdomShell), we provide three different models: [CodeShell-7B](https://huggingface.co/WisdomShell/CodeShell-7B), [CodeShell-7B-Chat](https://huggingface.co/WisdomShell/CodeShell-7B-Chat), and [CodeShell-7B-Chat-int4](https://huggingface.co/WisdomShell/CodeShell-7B-Chat-int4). Below are the steps to download these models.
70
+
71
+
- To perform inference using the [CodeShell-7B-Chat-int4](https://huggingface.co/WisdomShell/CodeShell-7B-Chat-int4) model, download the model to your local machine and place it in the path of the `llama_cpp_for_codeshell/models` folder as indicated in the code above.
- For performing inference using [CodeShell-7B](https://huggingface.co/WisdomShell/CodeShell-7B) and [CodeShell-7B-Chat](https://huggingface.co/WisdomShell/CodeShell-7B-Chat) models, after placing the models in a local folder, you can utilize [TGI (Text Generation Inference)](https://github.com/WisdomShell/text-generation-inference.git) to load these local models and initiate the model service.
78
+
79
+
### Load the model
80
+
81
+
- The `CodeShell-7B-Chat-int4` model can be served as an API using the `server` command within the `llama_cpp_for_codeshell` project.
Note: On macOS, the Metal architecture is enabled by default, and enabling Metal allows models to be loaded and executed on the GPU, significantly improving performance. For Mac users with non-Apple Silicon chips, you can disable Metal build during compilation using CMake options `LLAMA_NO_METAL=1` or `LLAMA_METAL=OFF` to ensure that the model functions properly.
87
+
Note: In cases where Metal is enabled during compilation, if you encounter runtime exceptions, you can explicitly disable Metal GPU inference by adding the `-ngl 0` parameter in the command line to ensure the proper functioning of the model.
88
+
89
+
-[CodeShell-7B](https://huggingface.co/WisdomShell/CodeShell-7B) and [CodeShell-7B-Chat](https://huggingface.co/WisdomShell/CodeShell-7B-Chat) models, loading local models with [TGI](https://github.com/WisdomShell/text-generation-inference.git) and starting the model service.
49
90
50
-
### Load the model locally
91
+
##Model Service [NVIDIA GPU]
51
92
52
-
After downloading the model from the [Hugging Face Hub](https://huggingface.co/WisdomShell/CodeShell-7B-Chat-int4/blob/main/codeshell-chat-q4_0.gguf) to your local machine, placing the model in the `llama_cpp_for_codeshell/models` folder path in the above code will allow you to load the model locally.
93
+
For users wishing to use NVIDIA GPUs for inference, the [`text-generation-inference`](https://github.com/huggingface/text-generation-inference) project can be used to deploy the [CodeShell Large Model](https://github.com/WisdomShell/codeshell). Below are the steps to deploy the model service:
94
+
95
+
### Download the Model
96
+
97
+
After downloading the model from the [Hugging Face Hub](https://huggingface.co/WisdomShell/CodeShell-7B-Chat) to your local machine, place the model under the path of the `$HOME/models` folder, and you can load the model locally.
The default deployment is on local port 8080, and it can be called through the POST method.
67
-
68
-
Note: In cases where Metal is enabled during compilation, you can also explicitly disable Metal GPU inference by adding the command-line parameter `-ngl 0`, ensuring that the model functions properly.
116
+
For a more detailed explanation of the parameters, please refer to the [text-generation-inference project documentation](https://github.com/huggingface/text-generation-inference).
69
117
70
118
## Configure the Plugin
71
119
72
120
- Set the address for the CodeShell service
73
121
- Configure whether to enable automatic code completion suggestions
74
122
- Specify the maximum number of tokens for code completion
75
123
- Specify the maximum number of tokens for Q&A
124
+
- Configure the model runtime environment
125
+
126
+
Note: Different model runtime environments can be configured within the plugin. For the [CodeShell-7B-Chat-int4](https://huggingface.co/WisdomShell/CodeShell-7B-Chat-int4) model, you can choose the `Model Runtime Environment`option in the `Use CPU Mode(with llama.cpp)` menu. However, for the [CodeShell-7B](https://huggingface.co/WisdomShell/CodeShell-7B) and [CodeShell-7B-Chat](https://huggingface.co/WisdomShell/CodeShell-7B-Chat) models, you should select the `Use GPU Model(with TGI framework)` option.
During coding, when you stop typing, code suggestions will automatically trigger (configurable with a delay of 1 to 3 seconds in the plugin settings).
137
+
138
+
When the plugin provides code suggestions, they are displayed in gray at the editor's cursor location. You can press the Tab key to accept the suggestion or continue typing to ignore it.
0 commit comments