You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Summary:
Pull Request resolved: #2994
## Overview
Migrated methods from ET libraries to replace our home-brew logics.
- Model and input flat buffer is migrated to bundled program flat buffer (.bpte)
- Jarvis memory allocation in runtime is migrated to executorch memory manager defined by executorch Span
- Input memory allocation is migrated to method-based data pointer assignment.
- Output and debug buffer is **partially** migrated to ETDump.
- Model output validation is **partially** migrated to method-based verification in bundled program.
## Input flow:
- Takes the edge program manager
- Build testsuites from methods. Only FOWARD method is applied and hardcoded.
- Build bundled program
- Serialize and store the bundled program in the flat buffer
## Output flow:
- A bundled program is loaded from the serialized flat buffer
- The program is executed on a selected backend.
- The output is generated.
- Validation: compare the expected with actual output by 1. the original Jarvis compare method (ENABLED), and 2. method-based VerifyResultWithBundledExpectedOutput (DISABLED)
- **Note**: the sink flow was reverted backed to a series of .npy output files and unflatten by `torch.utils._pytree.tree_unflatten` to re-enable legacy tests. ET/Bolt adopted a new flow that save outputs as `.bin` and load by `np.fromfile`. ETDump gets output from debug buffer. **These will be investigate in stage2**
TODO: T185104750 T185106115
## Memory Allocation
Re-abled Jarvis custom memory planning and supported to run on different backends (e.g. HIFI4).
- Enabled alloc_graph_input and output.
- Defined memory in torch::Span.
- **Note**: alloc_graph_output is using deprecated ET APIs: set_data(), mutable_date_ptr(). It has memory misalignment issue when migrating to the new flow. **These will be investigate in stage2**
TODO: T185104439
## Output Validation
Verify output by `torch::executor::bundled_program::VerifyResultWithBundledExpectedOutput`. This is currently a dummy validation for quantized tests which have high rtol. So their error threshold is set to a random large value i.e. 1e5 1e7. **These will be investigate in stage2**
TODO:T180249993 T185104615 T185104862
# Design
Major design decisions (ADR).
## Method 1 [ADOPTED]
Modify executor.cpp to consume a bundled_program flatbuffer and execute on a different BUCK host.
| - Pros: max reuse of existing configuration for custom Jarvis ops.
| - Cons: impact to runtime performance due to starting a new host.
## Method 2 [ABANDONED]
Use ET pybinding APIs to consume bundled program as a input and execute in runtime.
| - Pros: all ET APIs are encapsulated in Pythons that gears well with existing infrastructure
| - Cons: bad extensibility as backend is static (CPU) on start up and cannot be switched on the fly.
| - Cons: missing custom ops in runtime on the same BUCK host. Have to duplicate and hardcode dependencies.
# Progress
Program Injestion (input)
- [x] POC run of aten_relu_out and quantized_linear_out
- [x] Obtain Javis custom ops in runtime
Program Sink (Output)
- [x] Get etdump as etdp
- [x] Get Inspector object from etdump
- [x] Get program output from method
- [x] Re-enable scuba profile
- [x] Get debug buffer binary
- [x] enable dump output from etdump
- [x] get output from etdump
- [ ] migrate sink flow to etdump
- [ ] adjust memory config for dump
Verification
- [x] verify_result_with_bundled_expected_output with rtol and atol. Will set a very large rtol and atol to pass the validation for quantize.
- [x] Compare output with expected_output by original Jarvis compare (RMS)
Memory Planning
- [x] define memory planning input: MemoryConfig
- [x] understand what ET MemoryManager actually takes
- [x] migrate to ET MemoryManager with three new arguments
- [x] Re-enable alloc_graph_input
- [x] Re-enable alloc_graph_output
- [x] update legacy of HierarchicalAllocator
- [x] Verify if the size of planned buffer are correct
Misc.
- [ ] verify if input has been memcpy to a custom input buffer in bundled program when input mem is not allocated. Use set_input
- [ ] investigate if testsuites run in serial or like buck in parallel
- [ ] investigate output.bin workflow. Bolt as reference.
- [ ] Refactor to reuse module.h, module.cpp, data_module.cpp
- [ ] refactor based on TODO
- [x] clean legacy code
Reviewed By: tarun292, skrtskrtfb, mcremon-meta
Differential Revision: D53870154
fbshipit-source-id: 05efdd48da040f089c0cc65ee7ad5f2cb14be5bd
0 commit comments