Skip to content

Commit 290ee8d

Browse files
committed
Update on "[ET][Memory planning] Improve greedy memory planning."
This diff replaces the old greedy algorithm. Older algorithm resulted in 35% worse compared to theoretical optimum. THis matter for long context even more since additional overhead can be few hundred MB. For example the theorical optimial for llama3_2 8B, 4-bit quantized modelw ith context length of 2k needs about 1G of memory. This theoretcial max can be observed by looking at the peaks in memory profile. Current agorithm resulted in about 1.6GB of planned memory. New algorithm reduce that to about 1.1G. Differential Revision: [D68448332](https://our.internmc.facebook.com/intern/diff/D68448332/) cc JacobSzwejbka angelayi [ghstack-poisoned]
2 parents ecb11ed + 2e63ab7 commit 290ee8d

File tree

10 files changed

+361
-524
lines changed

10 files changed

+361
-524
lines changed

backends/cadence/hifi/kernels/targets.bzl

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,12 +2,17 @@ load("@fbsource//tools/build_defs:platform_defs.bzl", "CXX")
22
load("@fbsource//xplat/executorch/build:runtime_wrapper.bzl", "runtime")
33

44
def define_common_targets():
5+
common_deps = [
6+
"//executorch/runtime/kernel:kernel_includes",
7+
]
8+
59
runtime.cxx_library(
610
name = "kernels",
711
srcs = ["kernels.cpp"],
812
exported_headers = [
913
"kernels.h",
1014
],
15+
deps = common_deps,
1116
visibility = [
1217
"//executorch/backends/cadence/...",
1318
],

backends/cadence/hifi/operators/op_where.cpp

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@ namespace impl {
2828
namespace HiFi {
2929
namespace native {
3030

31-
Tensor& where_out(
31+
Tensor& where_self_out(
3232
RuntimeContext& ctx,
3333
const Tensor& cond,
3434
const Tensor& a,

backends/cadence/hifi/third-party/nnlib/xa_nn_elm_clamp_f32_broadcast.c

Lines changed: 5 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -19,12 +19,11 @@
1919
* SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
2020
2121
******************************************************************************/
22-
#include "nnlib-hifi4/xa_nnlib/include/xa_type_def.h"
23-
#include "nnlib-hifi4/xa_nnlib/algo/common/include/xa_nnlib_common_fpu.h"
24-
#include "nnlib-hifi4/xa_nnlib/algo/common/include/xa_nn_common.h"
25-
#include "nnlib-hifi4/xa_nnlib/algo/common/include/xa_nnlib_err_chk.h"
26-
#include "nnlib-hifi4/xa_nnlib/algo/kernels/basic/hifi4/xa_nn_basic_state.h"
27-
#include "nnlib-hifi4/xa_nnlib/include/nnlib/xa_nnlib_kernels_api.h"
22+
#include "xa_type_def.h"
23+
#include "xa_nnlib_common_fpu.h"
24+
#include "xa_nn_common.h"
25+
#include "xa_nnlib_err_chk.h"
26+
#include "xa_nnlib_kernels_api.h"
2827

2928

3029
#if !HAVE_VFPU

0 commit comments

Comments
 (0)