We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
There was an error while loading. Please reload this page.
1 parent 4236ccc commit 940b1e8Copy full SHA for 940b1e8
llama.cpp
@@ -12618,8 +12618,7 @@ static int llama_decode_internal(
12618
std::vector<llama_seq_id *> seq_id_arr;
12619
std::vector<std::vector<llama_seq_id>> seq_id;
12620
12621
- // this indicates we are doing pooling on an embedding model. non-embedding models always
12622
- // use "output_ids" so we need to preserve all outputs in that case (somewhat inefficiently)
+ // this indicates we are doing pooled embedding, so we ignore batch.logits and output all tokens
12623
bool embed_pooled = cparams.embeddings && cparams.pooling_type != LLAMA_POOLING_TYPE_NONE;
12624
12625
// count outputs
0 commit comments