Skip to content

Commit 9e04102

Browse files
authored
llama : suppress conversion from 'size_t' to 'int' (#9046)
* llama : suppress conversion from 'size_t' to 'int' This commit updates llm_tokenizer_spm.tokenize to suppress/remove the following warnings that are generated on Windows when using MSVC: ```console src\llama-vocab.cpp(211,1): warning C4267: 'argument': conversion from 'size_t' to 'int', possible loss of data src\llama-vocab.cpp(517,1): warning C4267: 'argument': conversion from 'size_t' to 'int', possible loss of data ``` This is done by adding a cast for the size_t returned from symbols.size(). I believe this is safe as it seems unlikely that symbols, which stores an entry for each UTF8 character, would become larger than INT_MAX. The motivation for this change is to reduce the number of warnings that are currently generated when building on Windows. * squash! llama : suppress conversion from 'size_t' to 'int' Move cast into for loop.
1 parent dbf18e4 commit 9e04102

File tree

1 file changed

+2
-2
lines changed

1 file changed

+2
-2
lines changed

src/llama-vocab.cpp

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -221,7 +221,7 @@ struct llm_tokenizer_spm_session {
221221
}
222222

223223
// seed the work queue with all possible 2-character tokens.
224-
for (size_t i = 1; i < symbols.size(); ++i) {
224+
for (int i = 1; i < (int) symbols.size(); ++i) {
225225
try_add_bigram(i - 1, i);
226226
}
227227

@@ -563,7 +563,7 @@ struct llm_tokenizer_bpe_session {
563563
index++;
564564
symbols.emplace_back(sym);
565565
}
566-
for (size_t i = 1; i < symbols.size(); ++i) {
566+
for (int i = 1; i < (int) symbols.size(); ++i) {
567567
add_new_bigram(i - 1, i);
568568
}
569569

0 commit comments

Comments
 (0)