lint on "Reapply "Add vectorized_math.h (#11204)", "Add optimized_portable_kernels test (#11205)", and "Add vectorization in elementwise_util (#9432)""

swolchok · swolchok · commit 852338df7fb6 · 2025-06-12T10:16:04.000-07:00
Stack was reverted due to internal CI failures. Reapplying as an exported internal diff so that we make sure to catch any more of those. New fixes: - straightforward op_sub build fixes - s/EXPECT_EQ/EXPECT_FLOAT_EQ/ in vectorized_math_test - define ET_USE_PYTORCH_HEADERS to detect whether exceptions are enabled, and use `#if` instead of `#ifdef` to check the macro so that we don't use PyTorch headers if exceptions are disabled. (otherwise, we might have problems with e.g. TORCH_CHECK) Original summary for #11204: Set of math functions that work on both scalars and at::vec::Vectorized, to be used in #9432. Original summary for #11205: Make sure we test the optimized versions of portable kernels even if they are shadowed by optimized implementations. Intended to support #9432. Original summary for #9432: This is a first cut at #9241 . In this PR I've vectorized a small initial set of ops: atan2, clamp, fmod_Scalar, maximum, minimum, mul, pow, and sigmoid. In addition, the following ops should have gotten vectorized automatically because they already used generic lambdas: add, div, rsub, sub. I've left covering ops that use the `unary_ufunc_*` utilities in [pattern.h](https://github.com/pytorch/executorch/blob/main/kernels/portable/cpu/pattern/pattern.h) for a follow-up push, because pattern.h and elementwise_util need some work before we can migrate pattern.h's utilities to be backed by elementwise_util. This PR adds an interesting testing problem: in theory, *all* operators might need test cases long enough to tickle vectorization, because we might accidentally vectorize ops unexpectedly and break their lambdas due to anticipated differences in semantics. I address this issue by using Vectorized for the scalar prologue/epilogue in debug mode (we run tests in both debug and release) so that we can detect broken lambdas. I additionally intentionally introduced a bug in the vectorized path in elementwise_util and manually verified that we saw test failures for each vectorized op called out above. Differential Revision: [D76467389](https://our.internmc.facebook.com/intern/diff/D76467389/) [ghstack-poisoned]
diff --git a/kernels/portable/cpu/op_sub.cpp b/kernels/portable/cpu/op_sub.cpp
@@ -61,7 +61,7 @@ Tensor& sub_out(
         op_name,
         utils::SupportedTensorDtypes::REALHBF16>(
         [val_alpha](const auto val_a, const auto val_b) {
-          return val_a - (decltype(val_b))(val_alpha) * val_b;
+          return val_a - (decltype(val_b))(val_alpha)*val_b;
         },
         ctx,
         a,