👾Fix chunk count not incrementing when parsing additional chunks in a local file #767

ryan-the-crayon · 2024-06-19T23:49:35Z

When parsing a local GGUF and the metadata section does not fit in the same chunk, the parser would try to load more chunks. However, in the local file implementation range view, the chunk count is not incremented, resulting it not actually loading in more data.

… local file

mishig25 · 2024-06-20T08:53:47Z

cc: @ngxson

ngxson · 2024-06-20T13:34:45Z

LGTM. Thanks! I don't understand why I missed this line of code 👀

mishig25 · 2024-06-20T13:54:06Z

@ryan-the-crayon thanks a lot for the contribution! Anyway, you can add a test case ?

huggingface.js/packages/gguf/src/gguf.spec.ts

Lines 230 to 241 in e2689f8

    
           it("should parse a local file", async () => { 
        
           	// download the file and save to .cache folder 
        
           	if (!fs.existsSync(".cache")) { 
        
           		fs.mkdirSync(".cache"); 
        
           	} 
        
           	const res = await fetch(URL_V1); 
        
           	const arrayBuf = await res.arrayBuffer(); 
        
           	fs.writeFileSync(".cache/model.gguf", Buffer.from(arrayBuf)); 
        
           	const { metadata } = await gguf(".cache/model.gguf", { allowLocalFile: true }); 
        
           	expect(metadata).toMatchObject({ "general.name": "tinyllamas-stories-260k" }); 
        
           });

ngxson · 2024-06-20T17:15:48Z

FYI, I made a gguf with 32MB of metadata (metadata usually takes less than 2MB): https://huggingface.co/ngxson/test_gguf_models/blob/main/gguf_test_big_metadata.gguf

The test would be:

const parsedGguf = await gguf(".cache/model.gguf", { allowLocalFile: true });
const { metadata } = (parsedGguf as GGUFParseOutput<{ strict: false }>); // custom metadata arch, no need for typing
expect(metadata['dummy.1']).toBeDefined(); // first metadata in the list
expect(metadata['dummy.32767']).toBeDefined(); // last metadata in the list

CPP code to generate gguf

#include "ggml.h"
#include <cstring>
#include <string>

int main(int argc, char ** argv) {
    struct gguf_context * ctx = gguf_init_empty();
    gguf_set_val_str(ctx, "general.architecture", "gguf_test_big_metadata");

    char str1kb[1025];
    std::memset(str1kb, 'A', 1024);
    str1kb[1024] = '\0';
    for (int i = 0; i < 32 * 1024; i++) {
        gguf_set_val_str(ctx, ("dummy." + std::to_string(i)).c_str(), str1kb);
    }

    gguf_write_to_file(ctx, "gguf_test_big_metadata.gguf", false);
    gguf_free(ctx);
}

mishig25 · 2024-06-21T09:15:03Z

@ryan-the-crayon @ngxson merged and the new version of @huggingface/gguf.js with the fix is released

Ref comment: #767 (comment) This file have 32MB of just metadata (no tensor), can be useful for testing or benchmarking. Due to its large size, running the download process inside the test case may cause timeout (so I moved it to `beforeAll`)

👾Fix chunk count not incrementing when parsing additional chunks in a…

aaf516e

… local file

ryan-the-crayon requested review from mishig25 and julien-c as code owners June 19, 2024 23:49

mishig25 merged commit 70a27aa into huggingface:main Jun 21, 2024
4 checks passed

ngxson mentioned this pull request Jun 27, 2024

gguf: test local file with big metadata #777

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

👾Fix chunk count not incrementing when parsing additional chunks in a local file #767

👾Fix chunk count not incrementing when parsing additional chunks in a local file #767

Uh oh!

ryan-the-crayon commented Jun 19, 2024

Uh oh!

mishig25 commented Jun 20, 2024

Uh oh!

ngxson commented Jun 20, 2024

Uh oh!

mishig25 commented Jun 20, 2024

Uh oh!

ngxson commented Jun 20, 2024 •

edited

Loading

Uh oh!

Uh oh!

mishig25 commented Jun 21, 2024

Uh oh!

Uh oh!

👾Fix chunk count not incrementing when parsing additional chunks in a local file #767

👾Fix chunk count not incrementing when parsing additional chunks in a local file #767

Uh oh!

Conversation

ryan-the-crayon commented Jun 19, 2024

Uh oh!

mishig25 commented Jun 20, 2024

Uh oh!

ngxson commented Jun 20, 2024

Uh oh!

mishig25 commented Jun 20, 2024

Uh oh!

ngxson commented Jun 20, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

mishig25 commented Jun 21, 2024

Uh oh!

Uh oh!

ngxson commented Jun 20, 2024 •

edited

Loading