You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Process 3 chunks for now -> can increase but slower!
for filename in filenames[:5]:
!synthetic-data-kit
-c synthetic_data_kit_config.yaml
create {filename}
--num-pairs 25
--type "qa"
time.sleep(2) # Sleep some time to leave some room for processing
looks like vllm stop responding, or some timeout?!
cell logs:
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using tokenizers before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
[2KProcessing 7 chunks to generate QA pairs...output/unoc_document_0.txt.....
[2KBatch processing complete.ontent from data/output/unoc_document_0.txt...
[2KGenerated 26 QA pairs totalt from data/output/unoc_document_0.txt...
[2KSaving result to data/generated/unoc_document_0_qa_pairs.json.txt...
[2KSuccessfully wrote test file to data/generated/test_write.jsontxt...
[2KSuccessfully wrote result to data/generated/unoc_document_0_qa_pairs.json
[2K[32m⠴[0m Generating qa content from data/output/unoc_document_0.txt...
[1A[2K[32m Content saved to [0m[1;32mdata/generated/unoc_document_0_qa_pairs.json[0m
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using tokenizers before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
[2KProcessing 4 chunks to generate QA pairs...output/unoc_document_1.txt.....
[2KBatch processing complete.ontent from data/output/unoc_document_1.txt...
[2KGenerated 24 QA pairs totalt from data/output/unoc_document_1.txt...
[2KSaving result to data/generated/unoc_document_1_qa_pairs.json.txt...
[2KSuccessfully wrote test file to data/generated/test_write.jsontxt...
[2KSuccessfully wrote result to data/generated/unoc_document_1_qa_pairs.json
[2K[32m⠋[0m Generating qa content from data/output/unoc_document_1.txt...
[1A[2K[32m Content saved to [0m[1;32mdata/generated/unoc_document_1_qa_pairs.json[0m
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using tokenizers before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
[2KProcessing 5 chunks to generate QA pairs...output/unoc_document_2.txt.....
[2KBatch processing complete.ontent from data/output/unoc_document_2.txt...
[2KGenerated 0 QA pairs totalnt from data/output/unoc_document_2.txt...
[2KSaving result to data/generated/unoc_document_2_qa_pairs.json.txt...
[2KSuccessfully wrote test file to data/generated/test_write.jsontxt...
[2KSuccessfully wrote result to data/generated/unoc_document_2_qa_pairs.json
[2K[32m⠧[0m Generating qa content from data/output/unoc_document_2.txt...
[1A[2K[32m Content saved to [0m[1;32mdata/generated/unoc_document_2_qa_pairs.json[0m
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using tokenizers before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
[31mL Error: VLLM server not available at [0m[4;94mhttp://localhost:8000/v1[0m
[33mPlease start the VLLM server with:[0m
[1;34mvllm serve unsloth/Llama-[0m[1;36m3.2[0m[1;34m-3B-Instruct[0m
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using tokenizers before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
[31mL Error: VLLM server not available at [0m[4;94mhttp://localhost:8000/v1[0m
[33mPlease start the VLLM server with:[0m
[1;34mvllm serve unsloth/Llama-[0m[1;36m3.2[0m[1;34m-3B-Instruct[0m
The text was updated successfully, but these errors were encountered:
when opening for more than 3 chunks:
import time
Process 3 chunks for now -> can increase but slower!
for filename in filenames[:5]:
!synthetic-data-kit
-c synthetic_data_kit_config.yaml
create {filename}
--num-pairs 25
--type "qa"
time.sleep(2) # Sleep some time to leave some room for processing
looks like vllm stop responding, or some timeout?!
cell logs:
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using
tokenizers
before the fork if possible- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
[2KProcessing 7 chunks to generate QA pairs...output/unoc_document_0.txt.....
[2KBatch processing complete.ontent from data/output/unoc_document_0.txt...
[2KGenerated 26 QA pairs totalt from data/output/unoc_document_0.txt...
[2KSaving result to data/generated/unoc_document_0_qa_pairs.json.txt...
[2KSuccessfully wrote test file to data/generated/test_write.jsontxt...
[2KSuccessfully wrote result to data/generated/unoc_document_0_qa_pairs.json
[2K[32m⠴[0m Generating qa content from data/output/unoc_document_0.txt...
[1A[2K[32m Content saved to [0m[1;32mdata/generated/unoc_document_0_qa_pairs.json[0m
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using
tokenizers
before the fork if possible- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
[2KProcessing 4 chunks to generate QA pairs...output/unoc_document_1.txt.....
[2KBatch processing complete.ontent from data/output/unoc_document_1.txt...
[2KGenerated 24 QA pairs totalt from data/output/unoc_document_1.txt...
[2KSaving result to data/generated/unoc_document_1_qa_pairs.json.txt...
[2KSuccessfully wrote test file to data/generated/test_write.jsontxt...
[2KSuccessfully wrote result to data/generated/unoc_document_1_qa_pairs.json
[2K[32m⠋[0m Generating qa content from data/output/unoc_document_1.txt...
[1A[2K[32m Content saved to [0m[1;32mdata/generated/unoc_document_1_qa_pairs.json[0m
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using
tokenizers
before the fork if possible- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
[2KProcessing 5 chunks to generate QA pairs...output/unoc_document_2.txt.....
[2KBatch processing complete.ontent from data/output/unoc_document_2.txt...
[2KGenerated 0 QA pairs totalnt from data/output/unoc_document_2.txt...
[2KSaving result to data/generated/unoc_document_2_qa_pairs.json.txt...
[2KSuccessfully wrote test file to data/generated/test_write.jsontxt...
[2KSuccessfully wrote result to data/generated/unoc_document_2_qa_pairs.json
[2K[32m⠧[0m Generating qa content from data/output/unoc_document_2.txt...
[1A[2K[32m Content saved to [0m[1;32mdata/generated/unoc_document_2_qa_pairs.json[0m
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using
tokenizers
before the fork if possible- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
[31mL Error: VLLM server not available at [0m[4;94mhttp://localhost:8000/v1[0m
[33mPlease start the VLLM server with:[0m
[1;34mvllm serve unsloth/Llama-[0m[1;36m3.2[0m[1;34m-3B-Instruct[0m
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using
tokenizers
before the fork if possible- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
[31mL Error: VLLM server not available at [0m[4;94mhttp://localhost:8000/v1[0m
[33mPlease start the VLLM server with:[0m
[1;34mvllm serve unsloth/Llama-[0m[1;36m3.2[0m[1;34m-3B-Instruct[0m
The text was updated successfully, but these errors were encountered: