Skip to content

Expose Encoding attributes via the buffer protocol interface #1789

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

mariosasko
Copy link
Contributor

This PR enables access to the underlying buffers of an Encoding object via the buffer protocol interface, allowing for efficient conversion from Rust to Python for types that support that interface (e.g., NumPy, PyTorch, PyArrow).

This can save >20% of time when tokenizing datasets (with longer sequences) based on my benchmarks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant