-
Notifications
You must be signed in to change notification settings - Fork 4k
Fix KeyError occurring using fine_tunes.prepare_data #125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix KeyError occurring using fine_tunes.prepare_data #125
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome! Sorry for the delay. A few nits and then I think this is good to go. Can you explain a bit more about why this works?
Hi @hallacy, it works because in the original code when you are presented with the indices to drop for long examples (originally at line 164), at that point in time no recommendations have been applied. For example, a row that is both a long example AND a duplicate may exist at index 4; and the current script, when removing the duplicates, will remove index 4, but index 4 is still 'scheduled' to be deleted when dropping long examples at original line 170. When the By recalculating the long_indexes when actually applying the |
Love it! Thank you! |
* Initial commit * Add fix * Reinstate reset_index() * Add suggestions * Remove print stmt * punctuation * Add test for fine_tunes.prepare_data * Renamed file, added docstrings * Move comment placement
Description
KeyError occurs when running
openai tools fine_tunes.prepare_data -f training_file.jsonl
whentraining_file.jsonl
contains a prompt/completion that is BOTH a duplicate and a long example. Without trying to change too much, this fix would:long_examples
andlong_indexes
into a function, calling it preemptively withinlong_examples_validator
to provide analysis information about how many rows are long examples, then calling it when actually dropping rows withinoptional_fn
and providing info to the user if the keys that are being dropped have changed.Related Issue
Fixes #121
Other Notes
I am also happy to provide a file that can be used to reproduce this error.
Example Output:
When the error would normally occur, instead you would see: