Skip to content

Commit 20ebaf3

Browse files
GregoryComerfacebook-github-bot
authored andcommitted
Default to cores/2 threads in JNI layer
Summary: Default to using cores/2 threadpool threads. The long-term plan is to improve performant core detection in CPUInfo, but for now we can use cores/2 as a sane default. Based on testing, this is almost universally faster than using all cores, as efficiency cores can be quite slow. In extreme cases, using all cores can be 10x slower than using cores/2. This also matches Lite Interpreter's default behavior when it doesn't have a more precise heuristic for the target hardware. Differential Revision: D64107326
1 parent e540bcb commit 20ebaf3

File tree

2 files changed

+23
-0
lines changed

2 files changed

+23
-0
lines changed

extension/android/jni/BUCK

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
load("@fbsource//tools/build_defs/android:fb_android_cxx_library.bzl", "fb_android_cxx_library")
2+
load("@fbsource//xplat/executorch/backends/xnnpack/third-party:third_party_libs.bzl", "third_party_dep")
23
load("@fbsource//xplat/executorch/codegen:codegen.bzl", "executorch_generated_lib")
34

45
oncall("executorch")
@@ -41,6 +42,8 @@ fb_android_cxx_library(
4142
"//xplat/executorch/extension/module:module_static",
4243
"//xplat/executorch/extension/runner_util:inputs_static",
4344
"//xplat/executorch/extension/tensor:tensor_static",
45+
"//xplat/executorch/extension/threadpool:threadpool",
46+
third_party_dep("cpuinfo"),
4447
],
4548
)
4649

extension/android/jni/jni_layer.cpp

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,9 +17,12 @@
1717

1818
#include "jni_layer_constants.h"
1919

20+
#include <cpuinfo.h>
21+
2022
#include <executorch/extension/module/module.h>
2123
#include <executorch/extension/runner_util/inputs.h>
2224
#include <executorch/extension/tensor/tensor.h>
25+
#include <executorch/extension/threadpool/threadpool.h>
2326
#include <executorch/runtime/core/portable_type/tensor_impl.h>
2427
#include <executorch/runtime/platform/log.h>
2528
#include <executorch/runtime/platform/platform.h>
@@ -260,6 +263,23 @@ class ExecuTorchJni : public facebook::jni::HybridClass<ExecuTorchJni> {
260263
}
261264

262265
module_ = std::make_unique<Module>(modelPath->toStdString(), load_mode);
266+
267+
// Default to using cores/2 threadpool threads. The long-term plan is to
268+
// improve performant core detection in CPUInfo, but for now we can use
269+
// cores/2 as a sane default.
270+
//
271+
// Based on testing, this is almost universally faster than using all
272+
// cores, as efficiency cores can be quite slow. In extreme cases, using
273+
// all cores can be 10x slower than using cores/2.
274+
//
275+
// TODO Allow overriding this default from Java.
276+
auto threadpool = executorch::extension::threadpool::get_threadpool();
277+
if (threadpool) {
278+
int thread_count = cpuinfo_get_processors_count() / 2;
279+
if (thread_count > 0) {
280+
threadpool->_unsafe_reset_threadpool(thread_count);
281+
}
282+
}
263283
}
264284

265285
facebook::jni::local_ref<facebook::jni::JArrayClass<JEValue>> forward(

0 commit comments

Comments
 (0)