Skip to content

issue with (Meta_Synthetic_Data_Llama3_2_(3B).ipynb) #39

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
ramixpe opened this issue May 5, 2025 · 0 comments
Open

issue with (Meta_Synthetic_Data_Llama3_2_(3B).ipynb) #39

ramixpe opened this issue May 5, 2025 · 0 comments

Comments

@ramixpe
Copy link

ramixpe commented May 5, 2025

when opening for more than 3 chunks:
import time

Process 3 chunks for now -> can increase but slower!

for filename in filenames[:5]:
!synthetic-data-kit
-c synthetic_data_kit_config.yaml
create {filename}
--num-pairs 25
--type "qa"
time.sleep(2) # Sleep some time to leave some room for processing

looks like vllm stop responding, or some timeout?!

cell logs:
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using tokenizers before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
[2KProcessing 7 chunks to generate QA pairs...output/unoc_document_0.txt.....
[2KBatch processing complete.ontent from data/output/unoc_document_0.txt...
[2KGenerated 26 QA pairs totalt from data/output/unoc_document_0.txt...
[2KSaving result to data/generated/unoc_document_0_qa_pairs.json.txt...
[2KSuccessfully wrote test file to data/generated/test_write.jsontxt...
[2KSuccessfully wrote result to data/generated/unoc_document_0_qa_pairs.json
[2K[32m⠴[0m Generating qa content from data/output/unoc_document_0.txt...
[1A[2K[32m Content saved to [0m[1;32mdata/generated/unoc_document_0_qa_pairs.json[0m
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using tokenizers before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
[2KProcessing 4 chunks to generate QA pairs...output/unoc_document_1.txt.....
[2KBatch processing complete.ontent from data/output/unoc_document_1.txt...
[2KGenerated 24 QA pairs totalt from data/output/unoc_document_1.txt...
[2KSaving result to data/generated/unoc_document_1_qa_pairs.json.txt...
[2KSuccessfully wrote test file to data/generated/test_write.jsontxt...
[2KSuccessfully wrote result to data/generated/unoc_document_1_qa_pairs.json
[2K[32m⠋[0m Generating qa content from data/output/unoc_document_1.txt...
[1A[2K[32m Content saved to [0m[1;32mdata/generated/unoc_document_1_qa_pairs.json[0m
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using tokenizers before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
[2KProcessing 5 chunks to generate QA pairs...output/unoc_document_2.txt.....
[2KBatch processing complete.ontent from data/output/unoc_document_2.txt...
[2KGenerated 0 QA pairs totalnt from data/output/unoc_document_2.txt...
[2KSaving result to data/generated/unoc_document_2_qa_pairs.json.txt...
[2KSuccessfully wrote test file to data/generated/test_write.jsontxt...
[2KSuccessfully wrote result to data/generated/unoc_document_2_qa_pairs.json
[2K[32m⠧[0m Generating qa content from data/output/unoc_document_2.txt...
[1A[2K[32m Content saved to [0m[1;32mdata/generated/unoc_document_2_qa_pairs.json[0m
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using tokenizers before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
[31mL Error: VLLM server not available at [0m[4;94mhttp://localhost:8000/v1[0m
[33mPlease start the VLLM server with:[0m
[1;34mvllm serve unsloth/Llama-[0m[1;36m3.2[0m[1;34m-3B-Instruct[0m
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using tokenizers before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
[31mL Error: VLLM server not available at [0m[4;94mhttp://localhost:8000/v1[0m
[33mPlease start the VLLM server with:[0m
[1;34mvllm serve unsloth/Llama-[0m[1;36m3.2[0m[1;34m-3B-Instruct[0m

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant