You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
| Sentiment Analysis | RoBERTa |[SamLowe/roberta-base-go_emotions](https://huggingface.co/SamLowe/roberta-base-go_emotions)||
96
97
97
98
### Docker
98
99
@@ -101,7 +102,7 @@ model=BAAI/bge-large-en-v1.5
101
102
revision=refs/pr/5
102
103
volume=$PWD/data # share a volume with the Docker container to avoid downloading weights every run
103
104
104
-
docker run --gpus all -p 8080:80 -v $volume:/data --pull always ghcr.io/huggingface/text-embeddings-inference:1.2 --model-id $model --revision $revision
105
+
docker run --gpus all -p 8080:80 -v $volume:/data --pull always ghcr.io/huggingface/text-embeddings-inference:1.3 --model-id $model --revision $revision
105
106
```
106
107
107
108
And then you can make requests like
@@ -163,9 +164,11 @@ Options:
163
164
[env: POOLING=]
164
165
165
166
Possible values:
166
-
- cls: Select the CLS token as embedding
167
-
- mean: Apply Mean pooling to the model embeddings
168
-
- splade: Apply SPLADE (Sparse Lexical and Expansion) to the model embeddings. This option is only available if the loaded model is a `ForMaskedLM` Transformer model
167
+
- cls: Select the CLS token as embedding
168
+
- mean: Apply Mean pooling to the model embeddings
169
+
- splade: Apply SPLADE (Sparse Lexical and Expansion) to the model embeddings. This option is only
170
+
available if the loaded model is a `ForMaskedLM` Transformer model
The maximum amount of concurrent requests for this particular deployment.
@@ -199,6 +202,37 @@ Options:
199
202
[env: MAX_CLIENT_BATCH_SIZE=]
200
203
[default: 32]
201
204
205
+
--auto-truncate
206
+
Automatically truncate inputs that are longer than the maximum supported size
207
+
208
+
Unused for gRPC servers
209
+
210
+
[env: AUTO_TRUNCATE=]
211
+
212
+
--default-prompt-name <DEFAULT_PROMPT_NAME>
213
+
The name of the prompt that should be used by default for encoding. If not set, no prompt will be applied.
214
+
215
+
Must be a key in the `Sentence Transformers` configuration `prompts` dictionary.
216
+
217
+
For example if ``default_prompt_name`` is "query" and the ``prompts`` is {"query": "query: ", ...}, then the
218
+
sentence "What is the capital of France?" will be encoded as "query: What is the capital of France?" because
219
+
the prompt text will be prepended before any text to encode.
220
+
221
+
The argument '--default-prompt-name <DEFAULT_PROMPT_NAME>' cannot be used with '--default-prompt <DEFAULT_PROMPT>`
222
+
223
+
[env: DEFAULT_PROMPT_NAME=]
224
+
225
+
--default-prompt <DEFAULT_PROMPT>
226
+
The prompt that should be used by default for encoding. If not set, no prompt will be applied.
227
+
228
+
For example if ``default_prompt`` is "query: " then the sentence "What is the capital of France?" will be
229
+
encoded as "query: What is the capital of France?" because the prompt text will be prepended before any text
230
+
to encode.
231
+
232
+
The argument '--default-prompt <DEFAULT_PROMPT>' cannot be used with '--default-prompt-name <DEFAULT_PROMPT_NAME>`
233
+
234
+
[env: DEFAULT_PROMPT=]
235
+
202
236
--hf-api-token <HF_API_TOKEN>
203
237
Your HuggingFace hub token
204
238
@@ -224,9 +258,10 @@ Options:
224
258
[default: /tmp/text-embeddings-inference-server]
225
259
226
260
--huggingface-hub-cache <HUGGINGFACE_HUB_CACHE>
227
-
The location of the huggingface hub cache. Used to override the location if you want to provide a mounted disk for instance
261
+
The location of the huggingface hub cache. Used to override the location if you want to provide a mounted disk
262
+
for instance
228
263
229
-
[env: HUGGINGFACE_HUB_CACHE=/data]
264
+
[env: HUGGINGFACE_HUB_CACHE=]
230
265
231
266
--payload-limit <PAYLOAD_LIMIT>
232
267
Payload size limit in bytes
@@ -239,7 +274,8 @@ Options:
239
274
--api-key <API_KEY>
240
275
Set an api key for request authorization.
241
276
242
-
By default the server responds to every request. With an api key set, the requests must have the Authorization header set with the api key as Bearer token.
277
+
By default the server responds to every request. With an api key set, the requests must have the Authorization
278
+
header set with the api key as Bearer token.
243
279
244
280
[env: API_KEY=]
245
281
@@ -254,12 +290,14 @@ Options:
254
290
[env: OTLP_ENDPOINT=]
255
291
256
292
--otlp-service-name <OTLP_SERVICE_NAME>
257
-
The service name for opentelemetry.
293
+
The service name for opentelemetry. e.g. `text-embeddings-inference.server`
258
294
259
295
[env: OTLP_SERVICE_NAME=]
260
296
[default: text-embeddings-inference.server]
261
297
262
298
--cors-allow-origin <CORS_ALLOW_ORIGIN>
299
+
Unused for gRPC servers
300
+
263
301
[env: CORS_ALLOW_ORIGIN=]
264
302
```
265
303
@@ -269,13 +307,13 @@ Text Embeddings Inference ships with multiple Docker images that you can use to
As explained here [MPS-Ready, ARM64 Docker Image](https://github.com/pytorch/pytorch/issues/81224), Metal / MPS is not supported via Docker. As such inference will be CPU bound and most likely pretty slow when using this docker image on an M1/M2 ARM CPU.
544
+
545
+
As explained here [MPS-Ready, ARM64 Docker Image](https://github.com/pytorch/pytorch/issues/81224), Metal / MPS is not
546
+
supported via Docker. As such inference will be CPU bound and most likely pretty slow when using this docker image on an
0 commit comments