Skip to content

Commit c49f48a

Browse files
committed
Update on "[ExecuTorch] Dramatically improve op_clamp build time"
Instead of building `O(|CTYPE_IN| * |CTYPE_MIN| * |CTYPE_MAX| * |CTYPE_OUT|)` kernel code (where |T| means the number of possibilities for type T), we build `O((|CTYPE_IN| + |CTYPE_MIN| + |CTYPE_MAX| + |CTYPE_COMMON|) * |CTYPE_OUT|)` kernel code. (Concretely, `ET_SWITCH_REALHB_TYPES` has 9 possibilities, so I estimate that we went from 9**4 = 6561 template instantiations to 9 * 4 * 9 = 324 instantiations, or a 20x reduction.) Differential Revision: [D63681034](https://our.internmc.facebook.com/intern/diff/D63681034/) [ghstack-poisoned]
2 parents efd6c08 + e296b2c commit c49f48a

File tree

1 file changed

+4
-6
lines changed

1 file changed

+4
-6
lines changed

kernels/portable/cpu/util/broadcast_util.h

Lines changed: 4 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -326,18 +326,16 @@ inline void apply_binary_elementwise_fn(
326326
* void(CTYPE_COMMON, void*), convert the given element to CTYPE_OUT,
327327
* and store it to the given location.
328328
*/
329-
template <
330-
typename CTYPE_COMMON,
331-
typename Op>
329+
template <typename CTYPE_COMMON, typename Op>
332330
inline void apply_ternary_elementwise_fn(
333331
const Op& compute_fun,
334332
const Tensor& a,
335333
const Tensor& b,
336334
const Tensor& c,
337335
const Tensor& out,
338-
CTYPE_COMMON(*load_a_to_common)(const void*),
339-
CTYPE_COMMON(*load_b_to_common)(const void*),
340-
CTYPE_COMMON(*load_c_to_common)(const void*),
336+
CTYPE_COMMON (*load_a_to_common)(const void*),
337+
CTYPE_COMMON (*load_b_to_common)(const void*),
338+
CTYPE_COMMON (*load_c_to_common)(const void*),
341339
void (*store_common_to_out)(CTYPE_COMMON, void*)) {
342340
const bool a_is_broadcasted = !out.sizes().equals(a.sizes());
343341
const bool b_is_broadcasted = !out.sizes().equals(b.sizes());

0 commit comments

Comments
 (0)