-
Notifications
You must be signed in to change notification settings - Fork 608
Intorduce XNNPACKHeaderto manage flatbuffer data and constant data #1523
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/1523
Note: Links to docs will display an error until the docs builds have been completed. ✅ You can merge normally! (2 Unrelated Failures)As of commit 694e067 with merge base 428da4f ( FLAKY - The following jobs failed but were likely due to flakiness present on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This pull request was exported from Phabricator. Differential Revision: D52497977 |
ac62d04
to
9c0e41a
Compare
…ytorch#1523) Summary: Introducing the XNNPACKHeader to manage the flatbuffer data and constant data. Previously, we have serialized constant data along with flatbuffer. However, with large weights and large tensors in general, this takes a large amount of time and memory converting our dataclass --> json --> flatbuffer. This has become a blocker on some larger models To fix, we circumvent serializing constant tensors via flatbuffer, by appending the constant data after the flatbuffer payload. In order to do this, we need an XNNPACKHeader which will give us the flatbuffer offset, flatbuffer size, constant data offset, and constant data sizes. It will look something like this: ``` ┌───────────────────────────────────┐ │XNNPACK Header │ ├───────────────────────────────────┤ │Padding for 16 byte alignment │ ├───────────────────────────────────┤ │Flatbuffer-serialized payload data │ │ │ │ │ ├───────────────────────────────────┤ │Padding for 16 byte alignment │ ├───────────────────────────────────┤ │Constant Data │ │ │ │ │ └───────────────────────────────────┘ ``` Within the XNNPACK Header, we hold the following: - 4 bytes to offset the header magic - 4 bytes for the header magic - 4 bytes for the header length - 8 bytes for the flatbuffer offset - 8 bytes for the flatbuffer size - 8 bytes for constant data offset - 8 bytes for constant data size Differential Revision: D52497977
This pull request was exported from Phabricator. Differential Revision: D52497977 |
9c0e41a
to
ea1e619
Compare
…ytorch#1523) Summary: Introducing the XNNPACKHeader to manage the flatbuffer data and constant data. Previously, we have serialized constant data along with flatbuffer. However, with large weights and large tensors in general, this takes a large amount of time and memory converting our dataclass --> json --> flatbuffer. This has become a blocker on some larger models To fix, we circumvent serializing constant tensors via flatbuffer, by appending the constant data after the flatbuffer payload. In order to do this, we need an XNNPACKHeader which will give us the flatbuffer offset, flatbuffer size, constant data offset, and constant data sizes. It will look something like this: ``` ┌───────────────────────────────────┐ │XNNPACK Header │ ├───────────────────────────────────┤ │Padding for 16 byte alignment │ ├───────────────────────────────────┤ │Flatbuffer-serialized payload data │ │ │ │ │ ├───────────────────────────────────┤ │Padding for 16 byte alignment │ ├───────────────────────────────────┤ │Constant Data │ │ │ │ │ └───────────────────────────────────┘ ``` Within the XNNPACK Header, we hold the following: - 4 bytes to offset the header magic - 4 bytes for the header magic - 4 bytes for the header length - 8 bytes for the flatbuffer offset - 8 bytes for the flatbuffer size - 8 bytes for constant data offset - 8 bytes for constant data size Differential Revision: D52497977
This pull request was exported from Phabricator. Differential Revision: D52497977 |
ea1e619
to
3c54713
Compare
…ytorch#1523) Summary: Introducing the XNNPACKHeader to manage the flatbuffer data and constant data. Previously, we have serialized constant data along with flatbuffer. However, with large weights and large tensors in general, this takes a large amount of time and memory converting our dataclass --> json --> flatbuffer. This has become a blocker on some larger models To fix, we circumvent serializing constant tensors via flatbuffer, by appending the constant data after the flatbuffer payload. In order to do this, we need an XNNPACKHeader which will give us the flatbuffer offset, flatbuffer size, constant data offset, and constant data sizes. It will look something like this: ``` ┌───────────────────────────────────┐ │XNNPACK Header │ ├───────────────────────────────────┤ │Padding for 16 byte alignment │ ├───────────────────────────────────┤ │Flatbuffer-serialized payload data │ │ │ │ │ ├───────────────────────────────────┤ │Padding for 16 byte alignment │ ├───────────────────────────────────┤ │Constant Data │ │ │ │ │ └───────────────────────────────────┘ ``` Within the XNNPACK Header, we hold the following: - 4 bytes to offset the header magic - 4 bytes for the header magic - 4 bytes for the header length - 8 bytes for the flatbuffer offset - 8 bytes for the flatbuffer size - 8 bytes for constant data offset - 8 bytes for constant data size Differential Revision: D52497977
This pull request was exported from Phabricator. Differential Revision: D52497977 |
3c54713
to
994536f
Compare
This pull request was exported from Phabricator. Differential Revision: D52497977 |
…ytorch#1523) Summary: Introducing the XNNPACKHeader to manage the flatbuffer data and constant data. Previously, we have serialized constant data along with flatbuffer. However, with large weights and large tensors in general, this takes a large amount of time and memory converting our dataclass --> json --> flatbuffer. This has become a blocker on some larger models To fix, we circumvent serializing constant tensors via flatbuffer, by appending the constant data after the flatbuffer payload. In order to do this, we need an XNNPACKHeader which will give us the flatbuffer offset, flatbuffer size, constant data offset, and constant data sizes. It will look something like this: ``` ┌───────────────────────────────────┐ │XNNPACK Header │ ├───────────────────────────────────┤ │Padding for 16 byte alignment │ ├───────────────────────────────────┤ │Flatbuffer-serialized payload data │ │ │ │ │ ├───────────────────────────────────┤ │Padding for 16 byte alignment │ ├───────────────────────────────────┤ │Constant Data │ │ │ │ │ └───────────────────────────────────┘ ``` Within the XNNPACK Header, we hold the following: - 4 bytes to offset the header magic - 4 bytes for the header magic - 2 bytes for the header length - 4 bytes for the flatbuffer offset - 4 bytes for the flatbuffer size - 4 bytes for constant data offset - 4 bytes for constant data size Differential Revision: D52497977
…ytorch#1523) Summary: Introducing the XNNPACKHeader to manage the flatbuffer data and constant data. Previously, we have serialized constant data along with flatbuffer. However, with large weights and large tensors in general, this takes a large amount of time and memory converting our dataclass --> json --> flatbuffer. This has become a blocker on some larger models To fix, we circumvent serializing constant tensors via flatbuffer, by appending the constant data after the flatbuffer payload. In order to do this, we need an XNNPACKHeader which will give us the flatbuffer offset, flatbuffer size, constant data offset, and constant data sizes. It will look something like this: ``` ┌───────────────────────────────────┐ │XNNPACK Header │ ├───────────────────────────────────┤ │Padding for 16 byte alignment │ ├───────────────────────────────────┤ │Flatbuffer-serialized payload data │ │ │ │ │ ├───────────────────────────────────┤ │Padding for 16 byte alignment │ ├───────────────────────────────────┤ │Constant Data │ │ │ │ │ └───────────────────────────────────┘ ``` Within the XNNPACK Header, we hold the following: - 4 bytes to offset the header magic - 4 bytes for the header magic - 2 bytes for the header length - 4 bytes for the flatbuffer offset - 4 bytes for the flatbuffer size - 4 bytes for constant data offset - 4 bytes for constant data size Differential Revision: D52497977
…ytorch#1523) Summary: Introducing the XNNPACKHeader to manage the flatbuffer data and constant data. Previously, we have serialized constant data along with flatbuffer. However, with large weights and large tensors in general, this takes a large amount of time and memory converting our dataclass --> json --> flatbuffer. This has become a blocker on some larger models To fix, we circumvent serializing constant tensors via flatbuffer, by appending the constant data after the flatbuffer payload. In order to do this, we need an XNNPACKHeader which will give us the flatbuffer offset, flatbuffer size, constant data offset, and constant data sizes. It will look something like this: ``` ┌───────────────────────────────────┐ │XNNPACK Header │ ├───────────────────────────────────┤ │Padding for 16 byte alignment │ ├───────────────────────────────────┤ │Flatbuffer-serialized payload data │ │ │ │ │ ├───────────────────────────────────┤ │Padding for 16 byte alignment │ ├───────────────────────────────────┤ │Constant Data │ │ │ │ │ └───────────────────────────────────┘ ``` Within the XNNPACK Header, we hold the following: - 4 bytes to offset the header magic - 4 bytes for the header magic - 2 bytes for the header length - 4 bytes for the flatbuffer offset - 4 bytes for the flatbuffer size - 4 bytes for constant data offset - 4 bytes for constant data size Differential Revision: D52497977
…ytorch#1523) Summary: Introducing the XNNPACKHeader to manage the flatbuffer data and constant data. Previously, we have serialized constant data along with flatbuffer. However, with large weights and large tensors in general, this takes a large amount of time and memory converting our dataclass --> json --> flatbuffer. This has become a blocker on some larger models To fix, we circumvent serializing constant tensors via flatbuffer, by appending the constant data after the flatbuffer payload. In order to do this, we need an XNNPACKHeader which will give us the flatbuffer offset, flatbuffer size, constant data offset, and constant data sizes. It will look something like this: ``` ┌───────────────────────────────────┐ │XNNPACK Header │ ├───────────────────────────────────┤ │Padding for 16 byte alignment │ ├───────────────────────────────────┤ │Flatbuffer-serialized payload data │ │ │ │ │ ├───────────────────────────────────┤ │Padding for 16 byte alignment │ ├───────────────────────────────────┤ │Constant Data │ │ │ │ │ └───────────────────────────────────┘ ``` Within the XNNPACK Header, we hold the following: - 4 bytes to offset the header magic - 4 bytes for the header magic - 2 bytes for the header length - 4 bytes for the flatbuffer offset - 4 bytes for the flatbuffer size - 4 bytes for constant data offset - 4 bytes for constant data size Differential Revision: D52497977
…ytorch#1523) Summary: Introducing the XNNPACKHeader to manage the flatbuffer data and constant data. Previously, we have serialized constant data along with flatbuffer. However, with large weights and large tensors in general, this takes a large amount of time and memory converting our dataclass --> json --> flatbuffer. This has become a blocker on some larger models To fix, we circumvent serializing constant tensors via flatbuffer, by appending the constant data after the flatbuffer payload. In order to do this, we need an XNNPACKHeader which will give us the flatbuffer offset, flatbuffer size, constant data offset, and constant data sizes. It will look something like this: ``` ┌───────────────────────────────────┐ │XNNPACK Header │ ├───────────────────────────────────┤ │Padding for 16 byte alignment │ ├───────────────────────────────────┤ │Flatbuffer-serialized payload data │ │ │ │ │ ├───────────────────────────────────┤ │Padding for 16 byte alignment │ ├───────────────────────────────────┤ │Constant Data │ │ │ │ │ └───────────────────────────────────┘ ``` Within the XNNPACK Header, we hold the following: - 4 bytes to offset the header magic - 4 bytes for the header magic - 2 bytes for the header length - 4 bytes for the flatbuffer offset - 4 bytes for the flatbuffer size - 4 bytes for constant data offset - 8 bytes for constant data size Differential Revision: D52497977
…ytorch#1523) Summary: Introducing the XNNPACKHeader to manage the flatbuffer data and constant data. Previously, we have serialized constant data along with flatbuffer. However, with large weights and large tensors in general, this takes a large amount of time and memory converting our dataclass --> json --> flatbuffer. This has become a blocker on some larger models To fix, we circumvent serializing constant tensors via flatbuffer, by appending the constant data after the flatbuffer payload. In order to do this, we need an XNNPACKHeader which will give us the flatbuffer offset, flatbuffer size, constant data offset, and constant data sizes. It will look something like this: ``` ┌───────────────────────────────────┐ │XNNPACK Header │ ├───────────────────────────────────┤ │Padding for 16 byte alignment │ ├───────────────────────────────────┤ │Flatbuffer-serialized payload data │ │ │ │ │ ├───────────────────────────────────┤ │Padding for 16 byte alignment │ ├───────────────────────────────────┤ │Constant Data │ │ │ │ │ └───────────────────────────────────┘ ``` Within the XNNPACK Header, we hold the following: - 4 bytes to offset the header magic - 4 bytes for the header magic - 2 bytes for the header length - 4 bytes for the flatbuffer offset - 4 bytes for the flatbuffer size - 4 bytes for constant data offset - 8 bytes for constant data size Differential Revision: D52497977
994536f
to
665195e
Compare
This pull request was exported from Phabricator. Differential Revision: D52497977 |
…ytorch#1523) Summary: Introducing the XNNPACKHeader to manage the flatbuffer data and constant data. Previously, we have serialized constant data along with flatbuffer. However, with large weights and large tensors in general, this takes a large amount of time and memory converting our dataclass --> json --> flatbuffer. This has become a blocker on some larger models To fix, we circumvent serializing constant tensors via flatbuffer, by appending the constant data after the flatbuffer payload. In order to do this, we need an XNNPACKHeader which will give us the flatbuffer offset, flatbuffer size, constant data offset, and constant data sizes. It will look something like this: ``` ┌───────────────────────────────────┐ │XNNPACK Header │ ├───────────────────────────────────┤ │Padding for 16 byte alignment │ ├───────────────────────────────────┤ │Flatbuffer-serialized payload data │ │ │ │ │ ├───────────────────────────────────┤ │Padding for 16 byte alignment │ ├───────────────────────────────────┤ │Constant Data │ │ │ │ │ └───────────────────────────────────┘ ``` Within the XNNPACK Header, we hold the following: - 4 bytes to offset the header magic - 4 bytes for the header magic - 2 bytes for the header length - 4 bytes for the flatbuffer offset - 4 bytes for the flatbuffer size - 4 bytes for constant data offset - 8 bytes for constant data size Reviewed By: digantdesai Differential Revision: D52497977
…ytorch#1523) Summary: Introducing the XNNPACKHeader to manage the flatbuffer data and constant data. Previously, we have serialized constant data along with flatbuffer. However, with large weights and large tensors in general, this takes a large amount of time and memory converting our dataclass --> json --> flatbuffer. This has become a blocker on some larger models To fix, we circumvent serializing constant tensors via flatbuffer, by appending the constant data after the flatbuffer payload. In order to do this, we need an XNNPACKHeader which will give us the flatbuffer offset, flatbuffer size, constant data offset, and constant data sizes. It will look something like this: ``` ┌───────────────────────────────────┐ │XNNPACK Header │ ├───────────────────────────────────┤ │Padding for 16 byte alignment │ ├───────────────────────────────────┤ │Flatbuffer-serialized payload data │ │ │ │ │ ├───────────────────────────────────┤ │Padding for 16 byte alignment │ ├───────────────────────────────────┤ │Constant Data │ │ │ │ │ └───────────────────────────────────┘ ``` Within the XNNPACK Header, we hold the following: - 4 bytes to offset the header magic - 4 bytes for the header magic - 2 bytes for the header length - 4 bytes for the flatbuffer offset - 4 bytes for the flatbuffer size - 4 bytes for constant data offset - 8 bytes for constant data size Reviewed By: digantdesai Differential Revision: D52497977
665195e
to
7dc1eda
Compare
This pull request was exported from Phabricator. Differential Revision: D52497977 |
…ytorch#1523) Summary: Introducing the XNNPACKHeader to manage the flatbuffer data and constant data. Previously, we have serialized constant data along with flatbuffer. However, with large weights and large tensors in general, this takes a large amount of time and memory converting our dataclass --> json --> flatbuffer. This has become a blocker on some larger models To fix, we circumvent serializing constant tensors via flatbuffer, by appending the constant data after the flatbuffer payload. In order to do this, we need an XNNPACKHeader which will give us the flatbuffer offset, flatbuffer size, constant data offset, and constant data sizes. It will look something like this: ``` ┌───────────────────────────────────┐ │XNNPACK Header │ ├───────────────────────────────────┤ │Padding for 16 byte alignment │ ├───────────────────────────────────┤ │Flatbuffer-serialized payload data │ │ │ │ │ ├───────────────────────────────────┤ │Padding for 16 byte alignment │ ├───────────────────────────────────┤ │Constant Data │ │ │ │ │ └───────────────────────────────────┘ ``` Within the XNNPACK Header, we hold the following: - 4 bytes to offset the header magic - 4 bytes for the header magic - 2 bytes for the header length - 4 bytes for the flatbuffer offset - 4 bytes for the flatbuffer size - 4 bytes for constant data offset - 8 bytes for constant data size Reviewed By: digantdesai Differential Revision: D52497977
…ytorch#1523) Summary: Introducing the XNNPACKHeader to manage the flatbuffer data and constant data. Previously, we have serialized constant data along with flatbuffer. However, with large weights and large tensors in general, this takes a large amount of time and memory converting our dataclass --> json --> flatbuffer. This has become a blocker on some larger models To fix, we circumvent serializing constant tensors via flatbuffer, by appending the constant data after the flatbuffer payload. In order to do this, we need an XNNPACKHeader which will give us the flatbuffer offset, flatbuffer size, constant data offset, and constant data sizes. It will look something like this: ``` ┌───────────────────────────────────┐ │XNNPACK Header │ ├───────────────────────────────────┤ │Padding for 16 byte alignment │ ├───────────────────────────────────┤ │Flatbuffer-serialized payload data │ │ │ │ │ ├───────────────────────────────────┤ │Padding for 16 byte alignment │ ├───────────────────────────────────┤ │Constant Data │ │ │ │ │ └───────────────────────────────────┘ ``` Within the XNNPACK Header, we hold the following: - 4 bytes to offset the header magic - 4 bytes for the header magic - 2 bytes for the header length - 4 bytes for the flatbuffer offset - 4 bytes for the flatbuffer size - 4 bytes for constant data offset - 8 bytes for constant data size Reviewed By: digantdesai Differential Revision: D52497977
7dc1eda
to
694e067
Compare
This pull request was exported from Phabricator. Differential Revision: D52497977 |
This pull request has been merged in 4361d62. |
…shader data ## Context This changeset is essentially a mirror of #1523 in the XNNPACK delegate which is the first step to enabling constant weight data to be serialized outside the flatbuffer blob to speed up deserialization time for large models. `VulkanDelegateHeader` is introduced which will be used later on to help interpret what different sections of the serialized binary blob corresponds to. The primary difference compared to `XNNHeader` which was added for the XNNPACK delegate is that fields are added to support custom compute shaders that will be serialized with the model. Differential Revision: [D53957853](https://our.internmc.facebook.com/intern/diff/D53957853/) [ghstack-poisoned]
…t data and shader data" ## Context This changeset is essentially a mirror of #1523 in the XNNPACK delegate which is the first step to enabling constant weight data to be serialized outside the flatbuffer blob to speed up deserialization time for large models. `VulkanDelegateHeader` is introduced which will be used later on to help interpret what different sections of the serialized binary blob corresponds to. The primary difference compared to `XNNHeader` which was added for the XNNPACK delegate is that fields are added to support custom compute shaders that will be serialized with the model. Differential Revision: [D53957853](https://our.internmc.facebook.com/intern/diff/D53957853/) [ghstack-poisoned]
…ata (#2013) Summary: Pull Request resolved: #2013 ## Context This changeset is essentially a mirror of #1523 in the XNNPACK delegate which is the first step to enabling constant weight data to be serialized outside the flatbuffer blob to speed up deserialization time for large models. `VulkanDelegateHeader` is introduced which will be used later on to help interpret what different sections of the serialized binary blob corresponds to. The primary difference compared to `XNNHeader` which was added for the XNNPACK delegate is that fields are added to support custom compute shaders that will be serialized with the model. ghstack-source-id: 215799821 exported-using-ghexport Reviewed By: mcr229 Differential Revision: D53957853 fbshipit-source-id: 233e8e58d5b8fc9777313f188748f5ee60c83726
Summary:
Introducing the XNNPACKHeader to manage the flatbuffer data and constant data.
Previously, we have serialized constant data along with flatbuffer. However, with large weights and large tensors in general, this takes a large amount of time and memory converting our dataclass --> json --> flatbuffer. This has become a blocker on some larger models
To fix, we circumvent serializing constant tensors via flatbuffer, by appending the constant data after the flatbuffer payload. In order to do this, we need an XNNPACKHeader which will give us the flatbuffer offset, flatbuffer size, constant data offset, and constant data sizes.
It will look something like this:
Within the XNNPACK Header, we hold the following:
Differential Revision: D52497977