Skip to content

[libc] memmove optimizations #70043

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Oct 26, 2023
Merged

Conversation

dvyukov
Copy link
Collaborator

@dvyukov dvyukov commented Oct 24, 2023

  1. Remove is_disjoint check for smaller sizes and reduce code bloat.

inline_memmove may handle some small sizes as efficiently
as inline_memcpy. For these sizes we may not do is_disjoint check.
This both avoids additional code for the most frequent smaller sizes
and removes code bloat (we don't need the memcpy logic for small sizes).
Here we heavily rely on inlining and dead code elimination: from the first
inline_memmove we should get only handling of small sizes, and from
the second inline_memmove and inline_memcpy we should get only handling
of larger sizes.

  1. Use the memcpy thresholds for memmove.
    Memcpy thresholds were more carefully tuned.
    This becomes more important since we use memmove
    for all small sizes always now.

  2. Fix boundary conditions for sizes = 16/32/64.
    See the added comment for explanations.

Memmove function size drops from 885 to 715 bytes
due to removed duplication.

                 │  baseline   │             small-size              │
                 │   sec/op    │   sec/op     vs base                │
memmove/Google_A   3.208n ± 0%   2.911n ± 0%   -9.25% (n=100)
memmove/Google_B   4.113n ± 1%   3.428n ± 0%  -16.65% (n=100)
memmove/Google_D   5.838n ± 0%   4.158n ± 0%  -28.78% (n=100)
memmove/Google_S   4.712n ± 1%   3.899n ± 0%  -17.25% (n=100)
memmove/Google_U   3.609n ± 0%   3.247n ± 1%  -10.02% (n=100)
memmove/0          2.982n ± 0%   2.169n ± 0%  -27.26% (n=50)
memmove/1          3.253n ± 0%   2.168n ± 0%  -33.34% (n=50)
memmove/2          3.255n ± 0%   2.169n ± 0%  -33.38% (n=50)
memmove/3          3.259n ± 2%   2.175n ± 0%  -33.27% (p=0.000 n=50)
memmove/4          3.259n ± 0%   2.168n ± 5%  -33.46% (p=0.000 n=50)
memmove/5          2.488n ± 0%   1.926n ± 0%  -22.57% (p=0.000 n=50)
memmove/6          2.490n ± 0%   1.928n ± 0%  -22.59% (p=0.000 n=50)
memmove/7          2.492n ± 0%   1.927n ± 0%  -22.65% (p=0.000 n=50)
memmove/8          2.737n ± 0%   2.711n ± 0%   -0.97% (p=0.000 n=50)
memmove/9          2.736n ± 0%   2.711n ± 0%   -0.94% (p=0.000 n=50)
memmove/10         2.739n ± 0%   2.711n ± 0%   -1.04% (p=0.000 n=50)
memmove/11         2.740n ± 0%   2.711n ± 0%   -1.07% (p=0.000 n=50)
memmove/12         2.740n ± 0%   2.711n ± 0%   -1.09% (p=0.000 n=50)
memmove/13         2.744n ± 0%   2.711n ± 0%   -1.22% (p=0.000 n=50)
memmove/14         2.742n ± 0%   2.711n ± 0%   -1.14% (p=0.000 n=50)
memmove/15         2.742n ± 0%   2.711n ± 0%   -1.15% (p=0.000 n=50)
memmove/16         2.997n ± 0%   2.981n ± 0%   -0.52% (p=0.000 n=50)
memmove/17         2.998n ± 0%   2.981n ± 0%   -0.55% (p=0.000 n=50)
memmove/18         2.998n ± 0%   2.981n ± 0%   -0.55% (p=0.000 n=50)
memmove/19         2.999n ± 0%   2.982n ± 0%   -0.59% (p=0.000 n=50)
memmove/20         2.998n ± 0%   2.981n ± 0%   -0.55% (p=0.000 n=50)
memmove/21         3.000n ± 0%   2.981n ± 0%   -0.61% (p=0.000 n=50)
memmove/22         3.002n ± 0%   2.981n ± 0%   -0.68% (p=0.000 n=50)
memmove/23         3.002n ± 0%   2.981n ± 0%   -0.67% (p=0.000 n=50)
memmove/24         3.002n ± 0%   2.981n ± 0%   -0.70% (n=50)
memmove/25         3.002n ± 0%   2.981n ± 0%   -0.68% (p=0.000 n=50)
memmove/26         3.004n ± 0%   2.982n ± 0%   -0.74% (p=0.000 n=50)
memmove/27         3.005n ± 0%   2.981n ± 0%   -0.79% (n=50)
memmove/28         3.005n ± 0%   2.982n ± 0%   -0.77% (n=50)
memmove/29         3.009n ± 0%   2.981n ± 0%   -0.92% (n=50)
memmove/30         3.008n ± 0%   2.981n ± 0%   -0.89% (n=50)
memmove/31         3.007n ± 0%   2.982n ± 0%   -0.86% (n=50)
memmove/32         3.540n ± 0%   2.998n ± 0%  -15.31% (p=0.000 n=50)
memmove/33         3.544n ± 0%   2.997n ± 0%  -15.44% (p=0.000 n=50)
memmove/34         3.546n ± 0%   2.999n ± 0%  -15.42% (n=50)
memmove/35         3.545n ± 0%   2.999n ± 0%  -15.40% (n=50)
memmove/36         3.548n ± 0%   2.998n ± 0%  -15.52% (p=0.000 n=50)
memmove/37         3.546n ± 0%   3.000n ± 0%  -15.41% (n=50)
memmove/38         3.549n ± 0%   2.999n ± 0%  -15.49% (p=0.000 n=50)
memmove/39         3.549n ± 0%   2.999n ± 0%  -15.48% (p=0.000 n=50)
memmove/40         3.549n ± 0%   3.000n ± 0%  -15.46% (p=0.000 n=50)
memmove/41         3.550n ± 0%   3.001n ± 0%  -15.47% (n=50)
memmove/42         3.549n ± 0%   3.001n ± 0%  -15.43% (n=50)
memmove/43         3.552n ± 0%   3.001n ± 0%  -15.52% (p=0.000 n=50)
memmove/44         3.552n ± 0%   3.001n ± 0%  -15.51% (n=50)
memmove/45         3.552n ± 0%   3.002n ± 0%  -15.48% (n=50)
memmove/46         3.554n ± 0%   3.001n ± 0%  -15.55% (p=0.000 n=50)
memmove/47         3.556n ± 0%   3.002n ± 0%  -15.58% (p=0.000 n=50)
memmove/48         3.555n ± 0%   3.003n ± 0%  -15.54% (n=50)
memmove/49         3.557n ± 0%   3.002n ± 0%  -15.59% (p=0.000 n=50)
memmove/50         3.557n ± 0%   3.004n ± 0%  -15.55% (p=0.000 n=50)
memmove/51         3.556n ± 0%   3.004n ± 0%  -15.53% (p=0.000 n=50)
memmove/52         3.561n ± 0%   3.004n ± 0%  -15.65% (p=0.000 n=50)
memmove/53         3.558n ± 0%   3.004n ± 0%  -15.57% (p=0.000 n=50)
memmove/54         3.561n ± 0%   3.005n ± 0%  -15.62% (n=50)
memmove/55         3.560n ± 0%   3.006n ± 0%  -15.57% (n=50)
memmove/56         3.562n ± 0%   3.006n ± 0%  -15.60% (p=0.000 n=50)
memmove/57         3.563n ± 0%   3.006n ± 0%  -15.64% (n=50)
memmove/58         3.565n ± 0%   3.007n ± 0%  -15.64% (p=0.000 n=50)
memmove/59         3.564n ± 0%   3.006n ± 0%  -15.66% (p=0.000 n=50)
memmove/60         3.570n ± 0%   3.008n ± 0%  -15.74% (p=0.000 n=50)
memmove/61         3.566n ± 0%   3.009n ± 0%  -15.63% (p=0.000 n=50)
memmove/62         3.567n ± 0%   3.007n ± 0%  -15.70% (p=0.000 n=50)
memmove/63         3.568n ± 0%   3.008n ± 0%  -15.71% (p=0.000 n=50)
memmove/64         4.104n ± 0%   3.008n ± 0%  -26.70% (p=0.000 n=50)
memmove/65         4.126n ± 0%   3.662n ± 0%  -11.26% (p=0.000 n=50)
memmove/66         4.128n ± 0%   3.662n ± 0%  -11.29% (n=50)
memmove/67         4.129n ± 0%   3.662n ± 0%  -11.31% (n=50)
memmove/68         4.129n ± 0%   3.661n ± 0%  -11.33% (p=0.000 n=50)
memmove/69         4.130n ± 0%   3.662n ± 0%  -11.34% (p=0.000 n=50)
memmove/70         4.130n ± 0%   3.662n ± 0%  -11.33% (n=50)
memmove/71         4.132n ± 0%   3.662n ± 0%  -11.38% (p=0.000 n=50)
memmove/72         4.131n ± 0%   3.661n ± 0%  -11.39% (n=50)
memmove/73         4.135n ± 0%   3.661n ± 0%  -11.45% (p=0.000 n=50)
memmove/74         4.137n ± 0%   3.662n ± 0%  -11.49% (n=50)
memmove/75         4.138n ± 0%   3.662n ± 0%  -11.51% (p=0.000 n=50)
memmove/76         4.139n ± 0%   3.661n ± 0%  -11.56% (p=0.000 n=50)
memmove/77         4.136n ± 0%   3.662n ± 0%  -11.47% (p=0.000 n=50)
memmove/78         4.143n ± 0%   3.661n ± 0%  -11.62% (p=0.000 n=50)
memmove/79         4.142n ± 0%   3.661n ± 0%  -11.60% (n=50)
memmove/80         4.142n ± 0%   3.661n ± 0%  -11.62% (p=0.000 n=50)
memmove/81         4.140n ± 0%   3.661n ± 0%  -11.57% (n=50)
memmove/82         4.146n ± 0%   3.661n ± 0%  -11.69% (n=50)
memmove/83         4.143n ± 0%   3.661n ± 0%  -11.63% (p=0.000 n=50)
memmove/84         4.143n ± 0%   3.661n ± 0%  -11.63% (n=50)
memmove/85         4.147n ± 0%   3.661n ± 0%  -11.73% (p=0.000 n=50)
memmove/86         4.142n ± 0%   3.661n ± 0%  -11.62% (p=0.000 n=50)
memmove/87         4.147n ± 0%   3.661n ± 0%  -11.72% (p=0.000 n=50)
memmove/88         4.148n ± 0%   3.661n ± 0%  -11.74% (n=50)
memmove/89         4.152n ± 0%   3.661n ± 0%  -11.84% (n=50)
memmove/90         4.151n ± 0%   3.661n ± 0%  -11.81% (n=50)
memmove/91         4.150n ± 0%   3.661n ± 0%  -11.78% (n=50)
memmove/92         4.153n ± 0%   3.661n ± 0%  -11.86% (n=50)
memmove/93         4.158n ± 0%   3.661n ± 0%  -11.95% (n=50)
memmove/94         4.157n ± 0%   3.661n ± 0%  -11.95% (p=0.000 n=50)
memmove/95         4.155n ± 0%   3.661n ± 0%  -11.90% (p=0.000 n=50)
memmove/96         4.149n ± 0%   3.660n ± 0%  -11.79% (n=50)
memmove/97         4.157n ± 0%   3.661n ± 0%  -11.94% (n=50)
memmove/98         4.157n ± 0%   3.661n ± 0%  -11.94% (n=50)
memmove/99         4.168n ± 0%   3.661n ± 0%  -12.17% (p=0.000 n=50)
memmove/100        4.159n ± 0%   3.660n ± 0%  -12.00% (p=0.000 n=50)
memmove/101        4.161n ± 0%   3.660n ± 0%  -12.03% (p=0.000 n=50)
memmove/102        4.165n ± 0%   3.660n ± 0%  -12.12% (p=0.000 n=50)
memmove/103        4.164n ± 0%   3.661n ± 0%  -12.08% (n=50)
memmove/104        4.164n ± 0%   3.660n ± 0%  -12.11% (n=50)
memmove/105        4.165n ± 0%   3.660n ± 0%  -12.12% (p=0.000 n=50)
memmove/106        4.166n ± 0%   3.660n ± 0%  -12.15% (n=50)
memmove/107        4.171n ± 0%   3.660n ± 1%  -12.26% (p=0.000 n=50)
memmove/108        4.173n ± 0%   3.660n ± 0%  -12.30% (p=0.000 n=50)
memmove/109        4.170n ± 0%   3.660n ± 0%  -12.24% (n=50)
memmove/110        4.174n ± 0%   3.660n ± 0%  -12.31% (n=50)
memmove/111        4.176n ± 0%   3.660n ± 0%  -12.35% (p=0.000 n=50)
memmove/112        4.174n ± 0%   3.659n ± 0%  -12.34% (p=0.000 n=50)
memmove/113        4.176n ± 0%   3.660n ± 0%  -12.35% (n=50)
memmove/114        4.182n ± 0%   3.660n ± 0%  -12.49% (n=50)
memmove/115        4.185n ± 0%   3.660n ± 0%  -12.55% (n=50)
memmove/116        4.184n ± 0%   3.659n ± 0%  -12.54% (n=50)
memmove/117        4.182n ± 0%   3.660n ± 0%  -12.50% (n=50)
memmove/118        4.188n ± 0%   3.660n ± 0%  -12.61% (n=50)
memmove/119        4.186n ± 0%   3.660n ± 0%  -12.57% (p=0.000 n=50)
memmove/120        4.189n ± 0%   3.659n ± 0%  -12.63% (n=50)
memmove/121        4.187n ± 0%   3.660n ± 0%  -12.60% (n=50)
memmove/122        4.186n ± 0%   3.660n ± 0%  -12.58% (n=50)
memmove/123        4.187n ± 0%   3.660n ± 0%  -12.60% (n=50)
memmove/124        4.189n ± 0%   3.659n ± 0%  -12.65% (n=50)
memmove/125        4.195n ± 0%   3.659n ± 0%  -12.78% (n=50)
memmove/126        4.197n ± 0%   3.659n ± 0%  -12.81% (n=50)
memmove/127        4.194n ± 0%   3.659n ± 0%  -12.75% (n=50)
memmove/128        5.035n ± 0%   3.659n ± 0%  -27.32% (n=50)
memmove/129        5.127n ± 0%   5.164n ± 0%   +0.73% (p=0.000 n=50)
memmove/130        5.130n ± 0%   5.176n ± 0%   +0.88% (p=0.000 n=50)
memmove/131        5.127n ± 0%   5.180n ± 0%   +1.05% (p=0.000 n=50)
memmove/132        5.131n ± 0%   5.169n ± 0%   +0.75% (p=0.000 n=50)
memmove/133        5.137n ± 0%   5.179n ± 0%   +0.81% (p=0.000 n=50)
memmove/134        5.140n ± 0%   5.178n ± 0%   +0.74% (p=0.000 n=50)
memmove/135        5.141n ± 0%   5.187n ± 0%   +0.88% (p=0.000 n=50)
memmove/136        5.133n ± 0%   5.184n ± 0%   +0.99% (p=0.000 n=50)
memmove/137        5.148n ± 0%   5.186n ± 0%   +0.73% (p=0.000 n=50)
memmove/138        5.143n ± 0%   5.189n ± 0%   +0.88% (p=0.000 n=50)
memmove/139        5.142n ± 0%   5.192n ± 0%   +0.97% (p=0.000 n=50)
memmove/140        5.141n ± 0%   5.192n ± 0%   +1.01% (p=0.000 n=50)
memmove/141        5.155n ± 0%   5.188n ± 0%   +0.64% (p=0.000 n=50)
memmove/142        5.146n ± 0%   5.192n ± 0%   +0.90% (p=0.000 n=50)
memmove/143        5.142n ± 0%   5.203n ± 0%   +1.19% (p=0.000 n=50)
memmove/144        5.146n ± 0%   5.197n ± 0%   +0.99% (p=0.000 n=50)
memmove/145        5.146n ± 0%   5.196n ± 0%   +0.97% (p=0.000 n=50)
memmove/146        5.151n ± 0%   5.207n ± 0%   +1.10% (p=0.000 n=50)
memmove/147        5.151n ± 0%   5.205n ± 0%   +1.06% (p=0.000 n=50)
memmove/148        5.156n ± 0%   5.190n ± 0%   +0.66% (p=0.000 n=50)
memmove/149        5.158n ± 0%   5.212n ± 0%   +1.04% (p=0.000 n=50)
memmove/150        5.160n ± 0%   5.203n ± 0%   +0.84% (p=0.000 n=50)
memmove/151        5.167n ± 0%   5.210n ± 0%   +0.83% (p=0.000 n=50)
memmove/152        5.157n ± 0%   5.206n ± 0%   +0.94% (p=0.000 n=50)
memmove/153        5.170n ± 0%   5.211n ± 0%   +0.80% (p=0.000 n=50)
memmove/154        5.169n ± 0%   5.222n ± 0%   +1.02% (p=0.000 n=50)
memmove/155        5.171n ± 0%   5.215n ± 0%   +0.87% (p=0.000 n=50)
memmove/156        5.174n ± 0%   5.214n ± 0%   +0.78% (p=0.000 n=50)
memmove/157        5.171n ± 0%   5.218n ± 0%   +0.92% (p=0.000 n=50)
memmove/158        5.168n ± 0%   5.224n ± 0%   +1.09% (p=0.000 n=50)
memmove/159        5.179n ± 0%   5.218n ± 0%   +0.76% (p=0.000 n=50)
memmove/160        5.170n ± 0%   5.219n ± 0%   +0.95% (p=0.000 n=50)
memmove/161        5.187n ± 0%   5.220n ± 0%   +0.64% (p=0.000 n=50)
memmove/162        5.189n ± 0%   5.234n ± 0%   +0.86% (p=0.000 n=50)
memmove/163        5.199n ± 0%   5.250n ± 0%   +0.99% (p=0.000 n=50)
memmove/164        5.205n ± 0%   5.260n ± 0%   +1.04% (p=0.000 n=50)
memmove/165        5.208n ± 0%   5.261n ± 0%   +1.01% (p=0.000 n=50)
memmove/166        5.227n ± 0%   5.275n ± 0%   +0.91% (p=0.000 n=50)
memmove/167        5.233n ± 0%   5.281n ± 0%   +0.92% (p=0.000 n=50)
memmove/168        5.236n ± 0%   5.295n ± 0%   +1.12% (p=0.000 n=50)
memmove/169        5.256n ± 0%   5.297n ± 0%   +0.79% (p=0.000 n=50)
memmove/170        5.259n ± 0%   5.302n ± 0%   +0.80% (p=0.000 n=50)
memmove/171        5.269n ± 0%   5.321n ± 0%   +0.97% (p=0.000 n=50)
memmove/172        5.266n ± 0%   5.318n ± 0%   +0.98% (p=0.000 n=50)
memmove/173        5.272n ± 0%   5.330n ± 0%   +1.09% (p=0.000 n=50)
memmove/174        5.284n ± 0%   5.331n ± 0%   +0.89% (p=0.000 n=50)
memmove/175        5.284n ± 0%   5.322n ± 0%   +0.72% (p=0.000 n=50)
memmove/176        5.298n ± 0%   5.337n ± 0%   +0.74% (p=0.000 n=50)
memmove/177        5.282n ± 0%   5.338n ± 0%   +1.04% (p=0.000 n=50)
memmove/178        5.299n ± 0%   5.337n ± 0%   +0.71% (p=0.000 n=50)
memmove/179        5.296n ± 0%   5.343n ± 0%   +0.88% (p=0.000 n=50)
memmove/180        5.292n ± 0%   5.343n ± 0%   +0.97% (p=0.000 n=50)
memmove/181        5.303n ± 0%   5.335n ± 0%   +0.60% (p=0.000 n=50)
memmove/182        5.305n ± 0%   5.338n ± 0%   +0.62% (p=0.000 n=50)
memmove/183        5.298n ± 0%   5.329n ± 0%   +0.59% (p=0.000 n=50)
memmove/184        5.299n ± 0%   5.333n ± 0%   +0.64% (p=0.000 n=50)
memmove/185        5.291n ± 0%   5.330n ± 0%   +0.73% (p=0.000 n=50)
memmove/186        5.296n ± 0%   5.332n ± 0%   +0.68% (p=0.000 n=50)
memmove/187        5.297n ± 0%   5.320n ± 0%   +0.44% (p=0.000 n=50)
memmove/188        5.286n ± 0%   5.314n ± 0%   +0.53% (p=0.000 n=50)
memmove/189        5.293n ± 0%   5.318n ± 0%   +0.46% (p=0.000 n=50)
memmove/190        5.294n ± 0%   5.318n ± 0%   +0.45% (p=0.000 n=50)
memmove/191        5.292n ± 0%   5.314n ± 0%   +0.40% (p=0.032 n=50)
memmove/192        5.272n ± 0%   5.304n ± 0%   +0.60% (p=0.000 n=50)
memmove/193        5.279n ± 0%   5.310n ± 0%   +0.57% (p=0.000 n=50)
memmove/194        5.294n ± 0%   5.308n ± 0%   +0.26% (p=0.018 n=50)
memmove/195        5.302n ± 0%   5.311n ± 0%   +0.18% (p=0.010 n=50)
memmove/196        5.301n ± 0%   5.316n ± 0%   +0.28% (p=0.023 n=50)
memmove/197        5.302n ± 0%   5.327n ± 0%   +0.47% (p=0.000 n=50)
memmove/198        5.310n ± 0%   5.326n ± 0%   +0.30% (p=0.003 n=50)
memmove/199        5.303n ± 0%   5.319n ± 0%   +0.30% (p=0.009 n=50)
memmove/200        5.312n ± 0%   5.330n ± 0%   +0.35% (p=0.001 n=50)
memmove/201        5.307n ± 0%   5.333n ± 0%   +0.50% (p=0.000 n=50)
memmove/202        5.311n ± 0%   5.334n ± 0%   +0.44% (p=0.000 n=50)
memmove/203        5.313n ± 0%   5.335n ± 0%   +0.41% (p=0.006 n=50)
memmove/204        5.312n ± 0%   5.332n ± 0%   +0.36% (p=0.002 n=50)
memmove/205        5.318n ± 0%   5.345n ± 0%   +0.50% (p=0.000 n=50)
memmove/206        5.311n ± 0%   5.333n ± 0%   +0.42% (p=0.002 n=50)
memmove/207        5.310n ± 0%   5.338n ± 0%   +0.52% (p=0.000 n=50)
memmove/208        5.319n ± 0%   5.341n ± 0%   +0.40% (p=0.004 n=50)
memmove/209        5.330n ± 0%   5.346n ± 0%   +0.30% (p=0.004 n=50)
memmove/210        5.329n ± 0%   5.349n ± 0%   +0.38% (p=0.002 n=50)
memmove/211        5.318n ± 0%   5.340n ± 0%   +0.41% (p=0.000 n=50)
memmove/212        5.339n ± 0%   5.343n ± 0%        ~ (p=0.396 n=50)
memmove/213        5.329n ± 0%   5.343n ± 0%   +0.25% (p=0.017 n=50)
memmove/214        5.339n ± 0%   5.358n ± 0%   +0.35% (p=0.035 n=50)
memmove/215        5.342n ± 0%   5.346n ± 0%        ~ (p=0.063 n=50)
memmove/216        5.338n ± 0%   5.359n ± 0%   +0.39% (p=0.002 n=50)
memmove/217        5.341n ± 0%   5.362n ± 0%   +0.39% (p=0.015 n=50)
memmove/218        5.354n ± 0%   5.373n ± 0%   +0.36% (p=0.041 n=50)
memmove/219        5.352n ± 0%   5.362n ± 0%        ~ (p=0.143 n=50)
memmove/220        5.344n ± 0%   5.370n ± 0%   +0.50% (p=0.001 n=50)
memmove/221        5.345n ± 0%   5.373n ± 0%   +0.53% (p=0.000 n=50)
memmove/222        5.348n ± 0%   5.360n ± 0%   +0.23% (p=0.014 n=50)
memmove/223        5.354n ± 0%   5.377n ± 0%   +0.43% (p=0.024 n=50)
memmove/224        5.352n ± 0%   5.363n ± 0%        ~ (p=0.052 n=50)
memmove/225        5.372n ± 0%   5.380n ± 0%        ~ (p=0.481 n=50)
memmove/226        5.368n ± 0%   5.386n ± 0%   +0.34% (p=0.004 n=50)
memmove/227        5.386n ± 0%   5.402n ± 0%   +0.29% (p=0.028 n=50)
memmove/228        5.400n ± 0%   5.408n ± 0%        ~ (p=0.174 n=50)
memmove/229        5.423n ± 0%   5.427n ± 0%        ~ (p=0.444 n=50)
memmove/230        5.411n ± 0%   5.429n ± 0%   +0.33% (p=0.020 n=50)
memmove/231        5.420n ± 0%   5.433n ± 0%   +0.24% (p=0.034 n=50)
memmove/232        5.435n ± 0%   5.441n ± 0%        ~ (p=0.235 n=50)
memmove/233        5.446n ± 0%   5.462n ± 0%        ~ (p=0.590 n=50)
memmove/234        5.467n ± 0%   5.461n ± 0%        ~ (p=0.921 n=50)
memmove/235        5.472n ± 0%   5.478n ± 0%        ~ (p=0.883 n=50)
memmove/236        5.466n ± 0%   5.478n ± 0%        ~ (p=0.324 n=50)
memmove/237        5.471n ± 0%   5.489n ± 0%        ~ (p=0.132 n=50)
memmove/238        5.485n ± 0%   5.489n ± 0%        ~ (p=0.460 n=50)
memmove/239        5.484n ± 0%   5.488n ± 0%        ~ (p=0.833 n=50)
memmove/240        5.483n ± 0%   5.495n ± 0%        ~ (p=0.095 n=50)
memmove/241        5.498n ± 0%   5.514n ± 0%        ~ (p=0.077 n=50)
memmove/242        5.518n ± 0%   5.517n ± 0%        ~ (p=0.481 n=50)
memmove/243        5.514n ± 0%   5.511n ± 0%        ~ (p=0.503 n=50)
memmove/244        5.510n ± 0%   5.497n ± 0%   -0.24% (p=0.038 n=50)
memmove/245        5.516n ± 0%   5.505n ± 0%        ~ (p=0.317 n=50)
memmove/246        5.513n ± 1%   5.494n ± 0%        ~ (p=0.147 n=50)
memmove/247        5.518n ± 0%   5.499n ± 0%   -0.36% (p=0.011 n=50)
memmove/248        5.503n ± 0%   5.492n ± 0%        ~ (p=0.267 n=50)
memmove/249        5.498n ± 0%   5.497n ± 0%        ~ (p=0.765 n=50)
memmove/250        5.485n ± 0%   5.493n ± 0%        ~ (p=0.348 n=50)
memmove/251        5.503n ± 0%   5.482n ± 0%   -0.37% (p=0.013 n=50)
memmove/252        5.497n ± 0%   5.485n ± 0%        ~ (p=0.077 n=50)
memmove/253        5.489n ± 0%   5.496n ± 0%        ~ (p=0.850 n=50)
memmove/254        5.497n ± 0%   5.491n ± 0%        ~ (p=0.548 n=50)
memmove/255        5.484n ± 1%   5.494n ± 0%        ~ (p=0.888 n=50)
memmove/256        6.952n ± 0%   7.676n ± 0%  +10.41% (p=0.000 n=50)
geomean            4.406n        4.127n        -6.33%

@dvyukov dvyukov requested a review from gchatelet October 24, 2023 13:55
@llvmbot llvmbot added the libc label Oct 24, 2023
@llvmbot
Copy link
Member

llvmbot commented Oct 24, 2023

@llvm/pr-subscribers-libc

Author: Dmitry Vyukov (dvyukov)

Changes

See individual commits for description.

                     │  baseline   │          small-size-check           │
                     │   sec/op    │   sec/op     vs base                │
    memmove/Google_A   3.208n ± 0%   2.909n ± 0%   -9.31% (n=100)
    memmove/0          2.982n ± 0%   2.168n ± 0%  -27.27% (n=50)
    memmove/1          3.253n ± 0%   2.169n ± 0%  -33.34% (n=50)
    memmove/2          3.255n ± 0%   2.168n ± 6%  -33.40% (n=50)
    memmove/3          3.259n ± 2%   2.175n ± 0%  -33.26% (n=50)
    memmove/4          3.259n ± 0%   2.168n ± 0%  -33.45% (p=0.000 n=50)
    memmove/5          2.488n ± 0%   1.926n ± 0%  -22.57% (n=50)
    memmove/6          2.490n ± 0%   1.928n ± 0%  -22.58% (p=0.000 n=50)
    memmove/7          2.492n ± 0%   1.928n ± 0%  -22.63% (n=50)
    memmove/8          2.737n ± 0%   2.711n ± 0%   -0.97% (p=0.000 n=50)
    memmove/9          2.736n ± 0%   2.711n ± 0%   -0.94% (p=0.000 n=50)
    memmove/10         2.739n ± 0%   2.711n ± 0%   -1.04% (p=0.000 n=50)
    memmove/11         2.740n ± 0%   2.711n ± 0%   -1.07% (p=0.000 n=50)
    memmove/12         2.740n ± 0%   2.711n ± 0%   -1.09% (p=0.000 n=50)
    memmove/13         2.744n ± 0%   2.711n ± 0%   -1.22% (p=0.000 n=50)
    memmove/14         2.742n ± 0%   2.711n ± 0%   -1.14% (p=0.000 n=50)
    memmove/15         2.742n ± 0%   2.711n ± 0%   -1.15% (p=0.000 n=50)
    memmove/16         2.997n ± 0%   2.982n ± 0%   -0.52% (p=0.000 n=50)
    memmove/17         2.998n ± 0%   2.982n ± 0%   -0.55% (p=0.000 n=50)
    memmove/18         2.998n ± 0%   2.982n ± 0%   -0.54% (p=0.000 n=50)
    memmove/19         2.999n ± 0%   2.981n ± 0%   -0.59% (p=0.000 n=50)
    memmove/20         2.998n ± 0%   2.982n ± 0%   -0.55% (p=0.000 n=50)
    memmove/21         3.000n ± 0%   2.982n ± 0%   -0.61% (p=0.000 n=50)
    memmove/22         3.002n ± 0%   2.982n ± 0%   -0.68% (p=0.000 n=50)
    memmove/23         3.002n ± 0%   2.981n ± 0%   -0.67% (p=0.000 n=50)
    memmove/24         3.002n ± 0%   2.981n ± 0%   -0.70% (p=0.000 n=50)
    memmove/25         3.002n ± 0%   2.982n ± 0%   -0.68% (p=0.000 n=50)
    memmove/26         3.004n ± 0%   2.982n ± 0%   -0.74% (n=50)
    memmove/27         3.005n ± 0%   2.982n ± 0%   -0.79% (p=0.000 n=50)
    memmove/28         3.005n ± 0%   2.982n ± 0%   -0.77% (p=0.000 n=50)
    memmove/29         3.009n ± 0%   2.982n ± 0%   -0.92% (n=50)
    memmove/30         3.008n ± 0%   2.982n ± 0%   -0.89% (n=50)
    memmove/31         3.007n ± 0%   2.981n ± 0%   -0.86% (n=50)
    memmove/32         3.540n ± 0%   2.999n ± 0%  -15.30% (p=0.000 n=50)
    memmove/33         3.544n ± 0%   2.998n ± 0%  -15.41% (p=0.000 n=50)
    memmove/34         3.546n ± 0%   2.999n ± 0%  -15.42% (n=50)
    memmove/35         3.545n ± 0%   2.999n ± 0%  -15.41% (p=0.000 n=50)
    memmove/36         3.548n ± 0%   2.998n ± 0%  -15.51% (p=0.000 n=50)
    memmove/37         3.546n ± 0%   2.999n ± 0%  -15.44% (n=50)
    memmove/38         3.549n ± 0%   2.999n ± 0%  -15.50% (p=0.000 n=50)
    memmove/39         3.549n ± 0%   2.999n ± 0%  -15.48% (p=0.000 n=50)
    memmove/40         3.549n ± 0%   3.000n ± 0%  -15.48% (p=0.000 n=50)
    memmove/41         3.550n ± 0%   3.000n ± 0%  -15.49% (p=0.000 n=50)
    memmove/42         3.549n ± 0%   3.001n ± 0%  -15.45% (n=50)
    memmove/43         3.552n ± 0%   3.000n ± 0%  -15.54% (p=0.000 n=50)
    memmove/44         3.552n ± 0%   3.002n ± 0%  -15.49% (n=50)
    memmove/45         3.552n ± 0%   3.001n ± 0%  -15.49% (p=0.000 n=50)
    memmove/46         3.554n ± 0%   3.002n ± 0%  -15.52% (p=0.000 n=50)
    memmove/47         3.556n ± 0%   3.003n ± 0%  -15.56% (p=0.000 n=50)
    memmove/48         3.555n ± 0%   3.002n ± 0%  -15.56% (p=0.000 n=50)
    memmove/49         3.557n ± 0%   3.002n ± 0%  -15.58% (p=0.000 n=50)
    memmove/50         3.557n ± 0%   3.003n ± 0%  -15.58% (p=0.000 n=50)
    memmove/51         3.556n ± 0%   3.004n ± 0%  -15.52% (p=0.000 n=50)
    memmove/52         3.561n ± 0%   3.004n ± 0%  -15.63% (p=0.000 n=50)
    memmove/53         3.558n ± 0%   3.004n ± 0%  -15.57% (p=0.000 n=50)
    memmove/54         3.561n ± 0%   3.005n ± 0%  -15.61% (p=0.000 n=50)
    memmove/55         3.560n ± 0%   3.006n ± 0%  -15.58% (n=50)
    memmove/56         3.562n ± 0%   3.006n ± 0%  -15.60% (p=0.000 n=50)
    memmove/57         3.563n ± 0%   3.010n ± 0%  -15.52% (p=0.000 n=50)
    memmove/58         3.565n ± 0%   3.006n ± 0%  -15.66% (p=0.000 n=50)
    memmove/59         3.564n ± 0%   3.006n ± 0%  -15.66% (p=0.000 n=50)
    memmove/60         3.570n ± 0%   3.008n ± 0%  -15.75% (p=0.000 n=50)
    memmove/61         3.566n ± 0%   3.008n ± 0%  -15.67% (p=0.000 n=50)
    memmove/62         3.567n ± 0%   3.008n ± 0%  -15.68% (p=0.000 n=50)
    memmove/63         3.568n ± 0%   3.008n ± 0%  -15.69% (p=0.000 n=50)
    memmove/64         4.104n ± 0%   3.008n ± 0%  -26.70% (p=0.000 n=50)
    memmove/65         4.126n ± 0%   3.662n ± 0%  -11.25% (p=0.000 n=50)
    memmove/66         4.128n ± 0%   3.662n ± 0%  -11.28% (n=50)
    memmove/67         4.129n ± 0%   3.662n ± 0%  -11.31% (p=0.000 n=50)
    memmove/68         4.129n ± 0%   3.661n ± 0%  -11.32% (p=0.000 n=50)
    memmove/69         4.130n ± 0%   3.662n ± 0%  -11.35% (n=50)
    memmove/70         4.130n ± 0%   3.662n ± 0%  -11.34% (p=0.000 n=50)
    memmove/71         4.132n ± 0%   3.662n ± 0%  -11.37% (p=0.000 n=50)
    memmove/72         4.131n ± 0%   3.662n ± 0%  -11.37% (p=0.000 n=50)
    memmove/73         4.135n ± 0%   3.662n ± 0%  -11.43% (p=0.000 n=50)
    memmove/74         4.137n ± 0%   3.662n ± 0%  -11.49% (n=50)
    memmove/75         4.138n ± 0%   3.662n ± 0%  -11.51% (n=50)
    memmove/76         4.139n ± 0%   3.661n ± 0%  -11.55% (p=0.000 n=50)
    memmove/77         4.136n ± 0%   3.661n ± 0%  -11.48% (n=50)
    memmove/78         4.143n ± 0%   3.661n ± 0%  -11.62% (n=50)
    memmove/79         4.142n ± 0%   3.661n ± 0%  -11.60% (n=50)
    memmove/80         4.142n ± 0%   3.661n ± 0%  -11.61% (p=0.000 n=50)
    memmove/81         4.140n ± 0%   3.661n ± 0%  -11.56% (n=50)
    memmove/82         4.146n ± 0%   3.661n ± 0%  -11.68% (p=0.000 n=50)
    memmove/83         4.143n ± 0%   3.661n ± 0%  -11.63% (p=0.000 n=50)
    memmove/84         4.143n ± 0%   3.661n ± 0%  -11.62% (n=50)
    memmove/85         4.147n ± 0%   3.661n ± 0%  -11.72% (n=50)
    memmove/86         4.142n ± 0%   3.661n ± 0%  -11.62% (p=0.000 n=50)
    memmove/87         4.147n ± 0%   3.661n ± 0%  -11.73% (p=0.000 n=50)
    memmove/88         4.148n ± 0%   3.661n ± 0%  -11.75% (p=0.000 n=50)
    memmove/89         4.152n ± 0%   3.661n ± 0%  -11.83% (n=50)
    memmove/90         4.151n ± 0%   3.661n ± 0%  -11.82% (n=50)
    memmove/91         4.150n ± 0%   3.661n ± 0%  -11.78% (p=0.000 n=50)
    memmove/92         4.153n ± 0%   3.660n ± 0%  -11.87% (p=0.000 n=50)
    memmove/93         4.158n ± 0%   3.661n ± 0%  -11.95% (n=50)
    memmove/94         4.157n ± 0%   3.661n ± 0%  -11.95% (p=0.000 n=50)
    memmove/95         4.155n ± 0%   3.661n ± 0%  -11.90% (p=0.000 n=50)
    memmove/96         4.149n ± 0%   3.660n ± 0%  -11.80% (p=0.000 n=50)
    memmove/97         4.157n ± 0%   3.661n ± 0%  -11.94% (n=50)
    memmove/98         4.157n ± 0%   3.661n ± 0%  -11.94% (p=0.000 n=50)
    memmove/99         4.168n ± 0%   3.661n ± 0%  -12.17% (n=50)
    memmove/100        4.159n ± 0%   3.660n ± 0%  -12.00% (n=50)
    memmove/101        4.161n ± 0%   3.661n ± 0%  -12.03% (n=50)
    memmove/102        4.165n ± 0%   3.660n ± 0%  -12.12% (n=50)
    memmove/103        4.164n ± 0%   3.661n ± 0%  -12.08% (p=0.000 n=50)
    memmove/104        4.164n ± 0%   3.660n ± 0%  -12.11% (p=0.000 n=50)
    memmove/105        4.165n ± 0%   3.660n ± 0%  -12.12% (n=50)
    memmove/106        4.166n ± 0%   3.660n ± 0%  -12.15% (n=50)
    memmove/107        4.171n ± 0%   3.660n ± 0%  -12.25% (p=0.000 n=50)
    memmove/108        4.173n ± 0%   3.660n ± 0%  -12.29% (p=0.000 n=50)
    memmove/109        4.170n ± 0%   3.660n ± 0%  -12.24% (p=0.000 n=50)
    memmove/110        4.174n ± 0%   3.660n ± 0%  -12.31% (p=0.000 n=50)
    memmove/111        4.176n ± 0%   3.660n ± 0%  -12.35% (n=50)
    memmove/112        4.174n ± 0%   3.660n ± 0%  -12.33% (p=0.000 n=50)
    memmove/113        4.176n ± 0%   3.660n ± 0%  -12.35% (p=0.000 n=50)
    memmove/114        4.182n ± 0%   3.660n ± 0%  -12.48% (p=0.000 n=50)
    memmove/115        4.185n ± 0%   3.660n ± 0%  -12.55% (n=50)
    memmove/116        4.184n ± 0%   3.660n ± 0%  -12.54% (n=50)
    memmove/117        4.182n ± 0%   3.660n ± 0%  -12.49% (n=50)
    memmove/118        4.188n ± 0%   3.660n ± 0%  -12.60% (n=50)
    memmove/119        4.186n ± 0%   3.660n ± 0%  -12.56% (p=0.000 n=50)
    memmove/120        4.189n ± 0%   3.660n ± 0%  -12.62% (n=50)
    memmove/121        4.187n ± 0%   3.668n ± 0%  -12.40% (n=50)
    memmove/122        4.186n ± 0%   3.667n ± 0%  -12.39% (p=0.000 n=50)
    memmove/123        4.187n ± 0%   3.668n ± 0%  -12.41% (p=0.000 n=50)
    memmove/124        4.189n ± 0%   3.667n ± 0%  -12.46% (n=50)
    memmove/125        4.195n ± 0%   3.662n ± 1%  -12.72% (p=0.000 n=50)
    memmove/126        4.197n ± 0%   3.669n ± 0%  -12.59% (n=50)
    memmove/127        4.194n ± 0%   3.668n ± 0%  -12.53% (p=0.000 n=50)
    memmove/128        5.035n ± 0%   3.656n ± 2%  -27.38% (p=0.000 n=50)

Full diff: https://github.com/llvm/llvm-project/pull/70043.diff

8 Files Affected:

  • (modified) libc/benchmarks/LibcMemoryBenchmarkMain.cpp (+10-4)
  • (modified) libc/src/string/memmove.cpp (+15-4)
  • (modified) libc/src/string/memory_utils/aarch64/inline_memmove.h (+4-1)
  • (modified) libc/src/string/memory_utils/generic/builtin.h (+2-2)
  • (modified) libc/src/string/memory_utils/generic/byte_per_byte.h (+1-1)
  • (modified) libc/src/string/memory_utils/inline_memmove.h (+11-2)
  • (modified) libc/src/string/memory_utils/riscv/inline_memmove.h (+5-3)
  • (modified) libc/src/string/memory_utils/x86_64/inline_memmove.h (+28-7)
diff --git a/libc/benchmarks/LibcMemoryBenchmarkMain.cpp b/libc/benchmarks/LibcMemoryBenchmarkMain.cpp
index acd7c30717597a1..bc6fd8b38cb6ddc 100644
--- a/libc/benchmarks/LibcMemoryBenchmarkMain.cpp
+++ b/libc/benchmarks/LibcMemoryBenchmarkMain.cpp
@@ -42,9 +42,15 @@ static cl::opt<std::string>
     SizeDistributionName("size-distribution-name",
                          cl::desc("The name of the distribution to use"));
 
-static cl::opt<bool>
-    SweepMode("sweep-mode",
-              cl::desc("If set, benchmark all sizes from 0 to sweep-max-size"));
+static cl::opt<bool> SweepMode(
+    "sweep-mode",
+    cl::desc(
+        "If set, benchmark all sizes from sweep-min-size to sweep-max-size"));
+
+static cl::opt<uint32_t>
+    SweepMinSize("sweep-min-size",
+                 cl::desc("The minimum size to use in sweep-mode"),
+                 cl::init(0));
 
 static cl::opt<uint32_t>
     SweepMaxSize("sweep-max-size",
@@ -185,7 +191,7 @@ struct MemfunctionBenchmarkSweep final : public MemfunctionBenchmarkBase {
     BO.InitialIterations = 100;
     auto &Measurements = Study.Measurements;
     Measurements.reserve(NumTrials * SweepMaxSize);
-    for (size_t Size = 0; Size <= SweepMaxSize; ++Size) {
+    for (size_t Size = SweepMinSize; Size <= SweepMaxSize; ++Size) {
       CurrentSweepSize = Size;
       runTrials(BO, Measurements);
     }
diff --git a/libc/src/string/memmove.cpp b/libc/src/string/memmove.cpp
index 7d473afc0b42ee7..a6478629d514027 100644
--- a/libc/src/string/memmove.cpp
+++ b/libc/src/string/memmove.cpp
@@ -15,10 +15,21 @@ namespace LIBC_NAMESPACE {
 
 LLVM_LIBC_FUNCTION(void *, memmove,
                    (void *dst, const void *src, size_t count)) {
-  if (is_disjoint(dst, src, count))
-    inline_memcpy(dst, src, count);
-  else
-    inline_memmove(dst, src, count);
+  // inline_memmove may handle some small sizes as efficiently
+  // as inline_memcpy. For these sizes we may not do is_disjoint check.
+  // This both avoids additional code for the most frequent smaller sizes
+  // and removes code bloat (we don't need the memcpy logic for small sizes).
+  // Here we heavily rely on inlining and dead code elimination: from the first
+  // inline_memmove we should get only handling of small sizes, and from
+  // the second inline_memmove and inline_memcpy we should get only handling
+  // of larger sizes.
+  inline_memmove(dst, src, count, true);
+  if (count >= LIBC_SRC_STRING_MEMORY_UTILS_MEMMOVE_SLOW_SIZE) {
+    if (is_disjoint(dst, src, count))
+      inline_memcpy(dst, src, count);
+    else
+      inline_memmove(dst, src, count);
+  }
   return dst;
 }
 
diff --git a/libc/src/string/memory_utils/aarch64/inline_memmove.h b/libc/src/string/memory_utils/aarch64/inline_memmove.h
index ca28655c916820c..5e5c23e34be62d1 100644
--- a/libc/src/string/memory_utils/aarch64/inline_memmove.h
+++ b/libc/src/string/memory_utils/aarch64/inline_memmove.h
@@ -18,7 +18,8 @@
 
 namespace LIBC_NAMESPACE {
 
-LIBC_INLINE void inline_memmove_aarch64(Ptr dst, CPtr src, size_t count) {
+LIBC_INLINE void inline_memmove_aarch64(Ptr dst, CPtr src, size_t count,
+                                        bool fast_only) {
   static_assert(aarch64::kNeon, "aarch64 supports vector types");
   using uint128_t = generic_v128;
   using uint256_t = generic_v256;
@@ -39,6 +40,8 @@ LIBC_INLINE void inline_memmove_aarch64(Ptr dst, CPtr src, size_t count) {
     return generic::Memmove<uint256_t>::head_tail(dst, src, count);
   if (count <= 128)
     return generic::Memmove<uint512_t>::head_tail(dst, src, count);
+  if (fast_only)
+    return;
   if (dst < src) {
     generic::Memmove<uint256_t>::align_forward<Arg::Src>(dst, src, count);
     return generic::Memmove<uint512_t>::loop_and_tail_forward(dst, src, count);
diff --git a/libc/src/string/memory_utils/generic/builtin.h b/libc/src/string/memory_utils/generic/builtin.h
index 5239329f653b341..1dabc856053d191 100644
--- a/libc/src/string/memory_utils/generic/builtin.h
+++ b/libc/src/string/memory_utils/generic/builtin.h
@@ -26,8 +26,8 @@ inline_memcpy_builtin(Ptr dst, CPtr src, size_t count, size_t offset = 0) {
   __builtin_memcpy(dst + offset, src + offset, count);
 }
 
-[[maybe_unused]] LIBC_INLINE void inline_memmove_builtin(Ptr dst, CPtr src,
-                                                         size_t count) {
+[[maybe_unused]] LIBC_INLINE void
+inline_memmove_builtin(Ptr dst, CPtr src, size_t count, bool fast_only) {
   __builtin_memmove(dst, src, count);
 }
 
diff --git a/libc/src/string/memory_utils/generic/byte_per_byte.h b/libc/src/string/memory_utils/generic/byte_per_byte.h
index a666c5da3136041..89497382aede338 100644
--- a/libc/src/string/memory_utils/generic/byte_per_byte.h
+++ b/libc/src/string/memory_utils/generic/byte_per_byte.h
@@ -29,7 +29,7 @@ inline_memcpy_byte_per_byte(Ptr dst, CPtr src, size_t count,
 }
 
 [[maybe_unused]] LIBC_INLINE void
-inline_memmove_byte_per_byte(Ptr dst, CPtr src, size_t count) {
+inline_memmove_byte_per_byte(Ptr dst, CPtr src, size_t count, bool fast_only) {
   if (count == 0 || dst == src)
     return;
   if (dst < src) {
diff --git a/libc/src/string/memory_utils/inline_memmove.h b/libc/src/string/memory_utils/inline_memmove.h
index f72ea24ab538d69..0440bbe94d542e9 100644
--- a/libc/src/string/memory_utils/inline_memmove.h
+++ b/libc/src/string/memory_utils/inline_memmove.h
@@ -14,27 +14,36 @@
 #if defined(LIBC_TARGET_ARCH_IS_X86)
 #include "src/string/memory_utils/x86_64/inline_memmove.h"
 #define LIBC_SRC_STRING_MEMORY_UTILS_MEMMOVE inline_memmove_x86
+#define LIBC_SRC_STRING_MEMORY_UTILS_MEMMOVE_SLOW_SIZE 129
 #elif defined(LIBC_TARGET_ARCH_IS_AARCH64)
 #include "src/string/memory_utils/aarch64/inline_memmove.h"
 #define LIBC_SRC_STRING_MEMORY_UTILS_MEMMOVE inline_memmove_aarch64
+#define LIBC_SRC_STRING_MEMORY_UTILS_MEMMOVE_SLOW_SIZE 129
 #elif defined(LIBC_TARGET_ARCH_IS_ANY_RISCV)
 #include "src/string/memory_utils/riscv/inline_memmove.h"
 #define LIBC_SRC_STRING_MEMORY_UTILS_MEMMOVE inline_memmove_riscv
+#define LIBC_SRC_STRING_MEMORY_UTILS_MEMMOVE_SLOW_SIZE 0
 #elif defined(LIBC_TARGET_ARCH_IS_ARM)
 #include "src/string/memory_utils/generic/byte_per_byte.h"
 #define LIBC_SRC_STRING_MEMORY_UTILS_MEMMOVE inline_memmove_byte_per_byte
+#define LIBC_SRC_STRING_MEMORY_UTILS_MEMMOVE_SLOW_SIZE 0
 #elif defined(LIBC_TARGET_ARCH_IS_GPU)
 #include "src/string/memory_utils/generic/builtin.h"
 #define LIBC_SRC_STRING_MEMORY_UTILS_MEMMOVE inline_memmove_builtin
+#define LIBC_SRC_STRING_MEMORY_UTILS_MEMMOVE_SLOW_SIZE 0
 #else
 #error "Unsupported architecture"
 #endif
 
 namespace LIBC_NAMESPACE {
 
-LIBC_INLINE void inline_memmove(void *dst, const void *src, size_t count) {
+LIBC_INLINE void inline_memmove(void *dst, const void *src, size_t count,
+                                bool fast_only = false) {
+  if (LIBC_SRC_STRING_MEMORY_UTILS_MEMMOVE_SLOW_SIZE == 0 && fast_only)
+    return;
   LIBC_SRC_STRING_MEMORY_UTILS_MEMMOVE(reinterpret_cast<Ptr>(dst),
-                                       reinterpret_cast<CPtr>(src), count);
+                                       reinterpret_cast<CPtr>(src), count,
+                                       fast_only);
 }
 
 } // namespace LIBC_NAMESPACE
diff --git a/libc/src/string/memory_utils/riscv/inline_memmove.h b/libc/src/string/memory_utils/riscv/inline_memmove.h
index 1c26917a96d9d18..5e34b2817729972 100644
--- a/libc/src/string/memory_utils/riscv/inline_memmove.h
+++ b/libc/src/string/memory_utils/riscv/inline_memmove.h
@@ -17,9 +17,11 @@
 
 namespace LIBC_NAMESPACE {
 
-[[maybe_unused]] LIBC_INLINE void
-inline_memmove_riscv(Ptr __restrict dst, CPtr __restrict src, size_t count) {
-  return inline_memmove_byte_per_byte(dst, src, count);
+[[maybe_unused]] LIBC_INLINE void inline_memmove_riscv(Ptr __restrict dst,
+                                                       CPtr __restrict src,
+                                                       size_t count,
+                                                       bool fast_only) {
+  return inline_memmove_byte_per_byte(dst, src, count, fast_only);
 }
 
 } // namespace LIBC_NAMESPACE
diff --git a/libc/src/string/memory_utils/x86_64/inline_memmove.h b/libc/src/string/memory_utils/x86_64/inline_memmove.h
index 95ad07f75219581..ee397c63471f1ad 100644
--- a/libc/src/string/memory_utils/x86_64/inline_memmove.h
+++ b/libc/src/string/memory_utils/x86_64/inline_memmove.h
@@ -18,40 +18,61 @@
 
 namespace LIBC_NAMESPACE {
 
-LIBC_INLINE void inline_memmove_x86(Ptr dst, CPtr src, size_t count) {
+LIBC_INLINE void inline_memmove_x86(Ptr dst, CPtr src, size_t count,
+                                    bool fast_only) {
 #if defined(__AVX512F__)
+  constexpr size_t vector_size = 64;
   using uint128_t = generic_v128;
   using uint256_t = generic_v256;
   using uint512_t = generic_v512;
 #elif defined(__AVX__)
+  constexpr size_t vector_size = 32;
   using uint128_t = generic_v128;
   using uint256_t = generic_v256;
   using uint512_t = cpp::array<generic_v256, 2>;
 #elif defined(__SSE2__)
+  constexpr size_t vector_size = 16;
   using uint128_t = generic_v128;
   using uint256_t = cpp::array<generic_v128, 2>;
   using uint512_t = cpp::array<generic_v128, 4>;
 #else
+  constexpr size_t vector_size = 8;
   using uint128_t = cpp::array<uint64_t, 2>;
   using uint256_t = cpp::array<uint64_t, 4>;
   using uint512_t = cpp::array<uint64_t, 8>;
 #endif
+  (void)vector_size;
   if (count == 0)
     return;
   if (count == 1)
     return generic::Memmove<uint8_t>::block(dst, src);
-  if (count <= 4)
-    return generic::Memmove<uint16_t>::head_tail(dst, src, count);
-  if (count <= 8)
+  if (count == 2)
+    return generic::Memmove<uint16_t>::block(dst, src);
+  if (count == 3)
+    return generic::Memmove<cpp::array<uint8_t, 3>>::block(dst, src);
+  if (count == 4)
+    return generic::Memmove<uint32_t>::block(dst, src);
+  if (count < 8)
     return generic::Memmove<uint32_t>::head_tail(dst, src, count);
-  if (count <= 16)
+  // If count is equal to a power of 2, we can handle it as head-tail
+  // of both smaller size and larger size (head-tail are either
+  // non-overlapping for smaller size, or completely collapsed
+  // for larger size). It seems to be more profitable to do the copy
+  // with the larger size, if it's natively supported (e.g. doing
+  // 2 collapsed 32-byte moves for count=64 if AVX2 is supported).
+  // But it's not profitable to use larger size if it's not natively
+  // supported: we will both use more instructions and handle fewer
+  // sizes in earlier branches.
+  if (count < 16 + (vector_size <= sizeof(uint64_t)))
     return generic::Memmove<uint64_t>::head_tail(dst, src, count);
-  if (count <= 32)
+  if (count < 32 + (vector_size <= sizeof(uint128_t)))
     return generic::Memmove<uint128_t>::head_tail(dst, src, count);
-  if (count <= 64)
+  if (count < 64 + (vector_size <= sizeof(uint256_t)))
     return generic::Memmove<uint256_t>::head_tail(dst, src, count);
   if (count <= 128)
     return generic::Memmove<uint512_t>::head_tail(dst, src, count);
+  if (fast_only)
+    return;
   if (dst < src) {
     generic::Memmove<uint256_t>::align_forward<Arg::Src>(dst, src, count);
     return generic::Memmove<uint512_t>::loop_and_tail_forward(dst, src, count);

@dvyukov dvyukov force-pushed the dvyukov-memmove-small-size branch from 9914fbe to 9e2694f Compare October 24, 2023 13:56
@dvyukov
Copy link
Collaborator Author

dvyukov commented Oct 24, 2023

Oh, "Rebase and merge" is not enabled, that's sad.
Do you know if I am supposed to apply this to main branch locally and try to push until it succeeds? Or there is a better way?

@lntue lntue changed the title memmove optimizations [libc] memmove optimizations Oct 24, 2023
@gchatelet
Copy link
Contributor

Oh, "Rebase and merge" is not enabled, that's sad. Do you know if I am supposed to apply this to main branch locally and try to push until it succeeds? Or there is a better way?

Yeah we'll need to push individual commits as separate PRs.

The first one -concerning the benchmark- LGTM can go in independently. A word of caution though on the sweeping benchmark. It exercises the same size many times in a row to accumulate measurement precision, as a consequence it will benefit from branch prediction and gives idealized results that may not be realized in production. It gives an upper bound though.

For the second one I think it's a good idea to factor in small sizes. The code duplication is suboptimal. But I'm not sure if introducing the fast_only argument is the way to go, it makes the logic spread across several files and is harder to reason about. I need to think about it.

The third patch looks good on the intent but I don't get the boolean addition in the ifs. e.g.,

if (count < 16 + (vector_size <= sizeof(uint64_t)))

This is either if (count < 16) or if (count < 17) depending on vector size. Is this the logic you meant to implement? If so, it's a bit too cryptic as-is and would benefit from more comments or more self-explanatory code.

@dvyukov
Copy link
Collaborator Author

dvyukov commented Oct 25, 2023

But I'm not sure if introducing the fast_only argument is the way to go, it makes the logic spread across several files and is harder to reason about. I need to think about it.

A slightly cleaner way to do it may be to return true/false from inline_memmove instead of the separate LIBC_SRC_STRING_MEMORY_UTILS_MEMMOVE_SLOW_SIZE.

bool inline_memmove(void *dst, const void *src, size_t count, bool fast_only) {
 if (size == 0) return true;
 ...
 if (fast_only) return false;
 ...
}

void memmove(...) {
  if (inline_memmove(..., true))
    return;
}

I did not do it only because it breaks all of the nice returns of voids :)

  if (count == 1)
    return generic::Memmove<uint8_t>::block(dst, src);

@dvyukov
Copy link
Collaborator Author

dvyukov commented Oct 25, 2023

A slightly cleaner way to do it may be to return true/false from inline_memmove

Does it look better?

Is this the logic you meant to implement?

Yes.

Does the following look better?

  if (vector_size >= 32 ? count < 32 : count <= 32)
    return generic::Memmove<uint128_t>::head_tail(dst, src, count);

@gchatelet
Copy link
Contributor

And how about splitting inline_memmove into several functions:

  • bool inline_memmove_small_size() returning whether the size was handled
  • inline_memmove_follow_up (or inline_memmove_larger_sizes)

We'd have the following code

LLVM_LIBC_FUNCTION(void *, memmove, (void *dst, const void *src, size_t count)) {
  if (inline_memmove_small_size(dst, src, count))
    return dst;
  if (is_disjoint(dst, src, count))
    inline_memcpy(dst, src, count);
  else 
    inline_memmove_follow_up(dst, src, count);
  return dst;
}

Each arch would be responsible for providing the implementations side by side so we can visually check they complement each other. For arch where we don't have a fast path inline_memmove_small_size would be

constexpr bool inline_memmove_small_size(void *dst, const void *src, size_t count) {
  return false;
}

Inlining would remove the branch altogether.

WDTY?

@gchatelet
Copy link
Contributor

I missed your reply before sending mine...

Does the following look better?

  if (vector_size >= 32 ? count < 32 : count <= 32)
    return generic::Memmove<uint128_t>::head_tail(dst, src, count);

It's more explicit yes.

@dvyukov
Copy link
Collaborator Author

dvyukov commented Oct 26, 2023

Moved the benchmarks change to #70302.
And I am going to squash the 2 remaining memmove changes into 1 for now.

@dvyukov dvyukov force-pushed the dvyukov-memmove-small-size branch from 9e2694f to 65dfb6d Compare October 26, 2023 09:16
@dvyukov
Copy link
Collaborator Author

dvyukov commented Oct 26, 2023

All done. PTAL.

@github-actions
Copy link

github-actions bot commented Oct 26, 2023

✅ With the latest revision this PR passed the C/C++ code formatter.

1. Remove is_disjoint check for smaller sizes and reduce code bloat.

inline_memmove may handle some small sizes as efficiently
as inline_memcpy. For these sizes we may not do is_disjoint check.
This both avoids additional code for the most frequent smaller sizes
and removes code bloat (we don't need the memcpy logic for small sizes).
Here we heavily rely on inlining and dead code elimination: from the first
inline_memmove we should get only handling of small sizes, and from
the second inline_memmove and inline_memcpy we should get only handling
of larger sizes.

2. Use the memcpy thresholds for memmove.
Memcpy thresholds were more carefully tuned.
This becomes more important since we use memmove
for all small sizes always now.

3. Fix boundary conditions for sizes = 16/32/64.
See the added comment for explanations.

Memmove function size drops from 885 to 715 bytes
due to removed duplication.

                 │  baseline   │             small-size              │
                 │   sec/op    │   sec/op     vs base                │
memmove/Google_A   3.208n ± 0%   2.911n ± 0%   -9.25% (n=100)
memmove/Google_B   4.113n ± 1%   3.428n ± 0%  -16.65% (n=100)
memmove/Google_D   5.838n ± 0%   4.158n ± 0%  -28.78% (n=100)
memmove/Google_S   4.712n ± 1%   3.899n ± 0%  -17.25% (n=100)
memmove/Google_U   3.609n ± 0%   3.247n ± 1%  -10.02% (n=100)
memmove/0          2.982n ± 0%   2.169n ± 0%  -27.26% (n=50)
memmove/1          3.253n ± 0%   2.168n ± 0%  -33.34% (n=50)
memmove/2          3.255n ± 0%   2.169n ± 0%  -33.38% (n=50)
memmove/3          3.259n ± 2%   2.175n ± 0%  -33.27% (p=0.000 n=50)
memmove/4          3.259n ± 0%   2.168n ± 5%  -33.46% (p=0.000 n=50)
memmove/5          2.488n ± 0%   1.926n ± 0%  -22.57% (p=0.000 n=50)
memmove/6          2.490n ± 0%   1.928n ± 0%  -22.59% (p=0.000 n=50)
memmove/7          2.492n ± 0%   1.927n ± 0%  -22.65% (p=0.000 n=50)
memmove/8          2.737n ± 0%   2.711n ± 0%   -0.97% (p=0.000 n=50)
memmove/9          2.736n ± 0%   2.711n ± 0%   -0.94% (p=0.000 n=50)
memmove/10         2.739n ± 0%   2.711n ± 0%   -1.04% (p=0.000 n=50)
memmove/11         2.740n ± 0%   2.711n ± 0%   -1.07% (p=0.000 n=50)
memmove/12         2.740n ± 0%   2.711n ± 0%   -1.09% (p=0.000 n=50)
memmove/13         2.744n ± 0%   2.711n ± 0%   -1.22% (p=0.000 n=50)
memmove/14         2.742n ± 0%   2.711n ± 0%   -1.14% (p=0.000 n=50)
memmove/15         2.742n ± 0%   2.711n ± 0%   -1.15% (p=0.000 n=50)
memmove/16         2.997n ± 0%   2.981n ± 0%   -0.52% (p=0.000 n=50)
memmove/17         2.998n ± 0%   2.981n ± 0%   -0.55% (p=0.000 n=50)
memmove/18         2.998n ± 0%   2.981n ± 0%   -0.55% (p=0.000 n=50)
memmove/19         2.999n ± 0%   2.982n ± 0%   -0.59% (p=0.000 n=50)
memmove/20         2.998n ± 0%   2.981n ± 0%   -0.55% (p=0.000 n=50)
memmove/21         3.000n ± 0%   2.981n ± 0%   -0.61% (p=0.000 n=50)
memmove/22         3.002n ± 0%   2.981n ± 0%   -0.68% (p=0.000 n=50)
memmove/23         3.002n ± 0%   2.981n ± 0%   -0.67% (p=0.000 n=50)
memmove/24         3.002n ± 0%   2.981n ± 0%   -0.70% (n=50)
memmove/25         3.002n ± 0%   2.981n ± 0%   -0.68% (p=0.000 n=50)
memmove/26         3.004n ± 0%   2.982n ± 0%   -0.74% (p=0.000 n=50)
memmove/27         3.005n ± 0%   2.981n ± 0%   -0.79% (n=50)
memmove/28         3.005n ± 0%   2.982n ± 0%   -0.77% (n=50)
memmove/29         3.009n ± 0%   2.981n ± 0%   -0.92% (n=50)
memmove/30         3.008n ± 0%   2.981n ± 0%   -0.89% (n=50)
memmove/31         3.007n ± 0%   2.982n ± 0%   -0.86% (n=50)
memmove/32         3.540n ± 0%   2.998n ± 0%  -15.31% (p=0.000 n=50)
memmove/33         3.544n ± 0%   2.997n ± 0%  -15.44% (p=0.000 n=50)
memmove/34         3.546n ± 0%   2.999n ± 0%  -15.42% (n=50)
memmove/35         3.545n ± 0%   2.999n ± 0%  -15.40% (n=50)
memmove/36         3.548n ± 0%   2.998n ± 0%  -15.52% (p=0.000 n=50)
memmove/37         3.546n ± 0%   3.000n ± 0%  -15.41% (n=50)
memmove/38         3.549n ± 0%   2.999n ± 0%  -15.49% (p=0.000 n=50)
memmove/39         3.549n ± 0%   2.999n ± 0%  -15.48% (p=0.000 n=50)
memmove/40         3.549n ± 0%   3.000n ± 0%  -15.46% (p=0.000 n=50)
memmove/41         3.550n ± 0%   3.001n ± 0%  -15.47% (n=50)
memmove/42         3.549n ± 0%   3.001n ± 0%  -15.43% (n=50)
memmove/43         3.552n ± 0%   3.001n ± 0%  -15.52% (p=0.000 n=50)
memmove/44         3.552n ± 0%   3.001n ± 0%  -15.51% (n=50)
memmove/45         3.552n ± 0%   3.002n ± 0%  -15.48% (n=50)
memmove/46         3.554n ± 0%   3.001n ± 0%  -15.55% (p=0.000 n=50)
memmove/47         3.556n ± 0%   3.002n ± 0%  -15.58% (p=0.000 n=50)
memmove/48         3.555n ± 0%   3.003n ± 0%  -15.54% (n=50)
memmove/49         3.557n ± 0%   3.002n ± 0%  -15.59% (p=0.000 n=50)
memmove/50         3.557n ± 0%   3.004n ± 0%  -15.55% (p=0.000 n=50)
memmove/51         3.556n ± 0%   3.004n ± 0%  -15.53% (p=0.000 n=50)
memmove/52         3.561n ± 0%   3.004n ± 0%  -15.65% (p=0.000 n=50)
memmove/53         3.558n ± 0%   3.004n ± 0%  -15.57% (p=0.000 n=50)
memmove/54         3.561n ± 0%   3.005n ± 0%  -15.62% (n=50)
memmove/55         3.560n ± 0%   3.006n ± 0%  -15.57% (n=50)
memmove/56         3.562n ± 0%   3.006n ± 0%  -15.60% (p=0.000 n=50)
memmove/57         3.563n ± 0%   3.006n ± 0%  -15.64% (n=50)
memmove/58         3.565n ± 0%   3.007n ± 0%  -15.64% (p=0.000 n=50)
memmove/59         3.564n ± 0%   3.006n ± 0%  -15.66% (p=0.000 n=50)
memmove/60         3.570n ± 0%   3.008n ± 0%  -15.74% (p=0.000 n=50)
memmove/61         3.566n ± 0%   3.009n ± 0%  -15.63% (p=0.000 n=50)
memmove/62         3.567n ± 0%   3.007n ± 0%  -15.70% (p=0.000 n=50)
memmove/63         3.568n ± 0%   3.008n ± 0%  -15.71% (p=0.000 n=50)
memmove/64         4.104n ± 0%   3.008n ± 0%  -26.70% (p=0.000 n=50)
memmove/65         4.126n ± 0%   3.662n ± 0%  -11.26% (p=0.000 n=50)
memmove/66         4.128n ± 0%   3.662n ± 0%  -11.29% (n=50)
memmove/67         4.129n ± 0%   3.662n ± 0%  -11.31% (n=50)
memmove/68         4.129n ± 0%   3.661n ± 0%  -11.33% (p=0.000 n=50)
memmove/69         4.130n ± 0%   3.662n ± 0%  -11.34% (p=0.000 n=50)
memmove/70         4.130n ± 0%   3.662n ± 0%  -11.33% (n=50)
memmove/71         4.132n ± 0%   3.662n ± 0%  -11.38% (p=0.000 n=50)
memmove/72         4.131n ± 0%   3.661n ± 0%  -11.39% (n=50)
memmove/73         4.135n ± 0%   3.661n ± 0%  -11.45% (p=0.000 n=50)
memmove/74         4.137n ± 0%   3.662n ± 0%  -11.49% (n=50)
memmove/75         4.138n ± 0%   3.662n ± 0%  -11.51% (p=0.000 n=50)
memmove/76         4.139n ± 0%   3.661n ± 0%  -11.56% (p=0.000 n=50)
memmove/77         4.136n ± 0%   3.662n ± 0%  -11.47% (p=0.000 n=50)
memmove/78         4.143n ± 0%   3.661n ± 0%  -11.62% (p=0.000 n=50)
memmove/79         4.142n ± 0%   3.661n ± 0%  -11.60% (n=50)
memmove/80         4.142n ± 0%   3.661n ± 0%  -11.62% (p=0.000 n=50)
memmove/81         4.140n ± 0%   3.661n ± 0%  -11.57% (n=50)
memmove/82         4.146n ± 0%   3.661n ± 0%  -11.69% (n=50)
memmove/83         4.143n ± 0%   3.661n ± 0%  -11.63% (p=0.000 n=50)
memmove/84         4.143n ± 0%   3.661n ± 0%  -11.63% (n=50)
memmove/85         4.147n ± 0%   3.661n ± 0%  -11.73% (p=0.000 n=50)
memmove/86         4.142n ± 0%   3.661n ± 0%  -11.62% (p=0.000 n=50)
memmove/87         4.147n ± 0%   3.661n ± 0%  -11.72% (p=0.000 n=50)
memmove/88         4.148n ± 0%   3.661n ± 0%  -11.74% (n=50)
memmove/89         4.152n ± 0%   3.661n ± 0%  -11.84% (n=50)
memmove/90         4.151n ± 0%   3.661n ± 0%  -11.81% (n=50)
memmove/91         4.150n ± 0%   3.661n ± 0%  -11.78% (n=50)
memmove/92         4.153n ± 0%   3.661n ± 0%  -11.86% (n=50)
memmove/93         4.158n ± 0%   3.661n ± 0%  -11.95% (n=50)
memmove/94         4.157n ± 0%   3.661n ± 0%  -11.95% (p=0.000 n=50)
memmove/95         4.155n ± 0%   3.661n ± 0%  -11.90% (p=0.000 n=50)
memmove/96         4.149n ± 0%   3.660n ± 0%  -11.79% (n=50)
memmove/97         4.157n ± 0%   3.661n ± 0%  -11.94% (n=50)
memmove/98         4.157n ± 0%   3.661n ± 0%  -11.94% (n=50)
memmove/99         4.168n ± 0%   3.661n ± 0%  -12.17% (p=0.000 n=50)
memmove/100        4.159n ± 0%   3.660n ± 0%  -12.00% (p=0.000 n=50)
memmove/101        4.161n ± 0%   3.660n ± 0%  -12.03% (p=0.000 n=50)
memmove/102        4.165n ± 0%   3.660n ± 0%  -12.12% (p=0.000 n=50)
memmove/103        4.164n ± 0%   3.661n ± 0%  -12.08% (n=50)
memmove/104        4.164n ± 0%   3.660n ± 0%  -12.11% (n=50)
memmove/105        4.165n ± 0%   3.660n ± 0%  -12.12% (p=0.000 n=50)
memmove/106        4.166n ± 0%   3.660n ± 0%  -12.15% (n=50)
memmove/107        4.171n ± 0%   3.660n ± 1%  -12.26% (p=0.000 n=50)
memmove/108        4.173n ± 0%   3.660n ± 0%  -12.30% (p=0.000 n=50)
memmove/109        4.170n ± 0%   3.660n ± 0%  -12.24% (n=50)
memmove/110        4.174n ± 0%   3.660n ± 0%  -12.31% (n=50)
memmove/111        4.176n ± 0%   3.660n ± 0%  -12.35% (p=0.000 n=50)
memmove/112        4.174n ± 0%   3.659n ± 0%  -12.34% (p=0.000 n=50)
memmove/113        4.176n ± 0%   3.660n ± 0%  -12.35% (n=50)
memmove/114        4.182n ± 0%   3.660n ± 0%  -12.49% (n=50)
memmove/115        4.185n ± 0%   3.660n ± 0%  -12.55% (n=50)
memmove/116        4.184n ± 0%   3.659n ± 0%  -12.54% (n=50)
memmove/117        4.182n ± 0%   3.660n ± 0%  -12.50% (n=50)
memmove/118        4.188n ± 0%   3.660n ± 0%  -12.61% (n=50)
memmove/119        4.186n ± 0%   3.660n ± 0%  -12.57% (p=0.000 n=50)
memmove/120        4.189n ± 0%   3.659n ± 0%  -12.63% (n=50)
memmove/121        4.187n ± 0%   3.660n ± 0%  -12.60% (n=50)
memmove/122        4.186n ± 0%   3.660n ± 0%  -12.58% (n=50)
memmove/123        4.187n ± 0%   3.660n ± 0%  -12.60% (n=50)
memmove/124        4.189n ± 0%   3.659n ± 0%  -12.65% (n=50)
memmove/125        4.195n ± 0%   3.659n ± 0%  -12.78% (n=50)
memmove/126        4.197n ± 0%   3.659n ± 0%  -12.81% (n=50)
memmove/127        4.194n ± 0%   3.659n ± 0%  -12.75% (n=50)
memmove/128        5.035n ± 0%   3.659n ± 0%  -27.32% (n=50)
memmove/129        5.127n ± 0%   5.164n ± 0%   +0.73% (p=0.000 n=50)
memmove/130        5.130n ± 0%   5.176n ± 0%   +0.88% (p=0.000 n=50)
memmove/131        5.127n ± 0%   5.180n ± 0%   +1.05% (p=0.000 n=50)
memmove/132        5.131n ± 0%   5.169n ± 0%   +0.75% (p=0.000 n=50)
memmove/133        5.137n ± 0%   5.179n ± 0%   +0.81% (p=0.000 n=50)
memmove/134        5.140n ± 0%   5.178n ± 0%   +0.74% (p=0.000 n=50)
memmove/135        5.141n ± 0%   5.187n ± 0%   +0.88% (p=0.000 n=50)
memmove/136        5.133n ± 0%   5.184n ± 0%   +0.99% (p=0.000 n=50)
memmove/137        5.148n ± 0%   5.186n ± 0%   +0.73% (p=0.000 n=50)
memmove/138        5.143n ± 0%   5.189n ± 0%   +0.88% (p=0.000 n=50)
memmove/139        5.142n ± 0%   5.192n ± 0%   +0.97% (p=0.000 n=50)
memmove/140        5.141n ± 0%   5.192n ± 0%   +1.01% (p=0.000 n=50)
memmove/141        5.155n ± 0%   5.188n ± 0%   +0.64% (p=0.000 n=50)
memmove/142        5.146n ± 0%   5.192n ± 0%   +0.90% (p=0.000 n=50)
memmove/143        5.142n ± 0%   5.203n ± 0%   +1.19% (p=0.000 n=50)
memmove/144        5.146n ± 0%   5.197n ± 0%   +0.99% (p=0.000 n=50)
memmove/145        5.146n ± 0%   5.196n ± 0%   +0.97% (p=0.000 n=50)
memmove/146        5.151n ± 0%   5.207n ± 0%   +1.10% (p=0.000 n=50)
memmove/147        5.151n ± 0%   5.205n ± 0%   +1.06% (p=0.000 n=50)
memmove/148        5.156n ± 0%   5.190n ± 0%   +0.66% (p=0.000 n=50)
memmove/149        5.158n ± 0%   5.212n ± 0%   +1.04% (p=0.000 n=50)
memmove/150        5.160n ± 0%   5.203n ± 0%   +0.84% (p=0.000 n=50)
memmove/151        5.167n ± 0%   5.210n ± 0%   +0.83% (p=0.000 n=50)
memmove/152        5.157n ± 0%   5.206n ± 0%   +0.94% (p=0.000 n=50)
memmove/153        5.170n ± 0%   5.211n ± 0%   +0.80% (p=0.000 n=50)
memmove/154        5.169n ± 0%   5.222n ± 0%   +1.02% (p=0.000 n=50)
memmove/155        5.171n ± 0%   5.215n ± 0%   +0.87% (p=0.000 n=50)
memmove/156        5.174n ± 0%   5.214n ± 0%   +0.78% (p=0.000 n=50)
memmove/157        5.171n ± 0%   5.218n ± 0%   +0.92% (p=0.000 n=50)
memmove/158        5.168n ± 0%   5.224n ± 0%   +1.09% (p=0.000 n=50)
memmove/159        5.179n ± 0%   5.218n ± 0%   +0.76% (p=0.000 n=50)
memmove/160        5.170n ± 0%   5.219n ± 0%   +0.95% (p=0.000 n=50)
memmove/161        5.187n ± 0%   5.220n ± 0%   +0.64% (p=0.000 n=50)
memmove/162        5.189n ± 0%   5.234n ± 0%   +0.86% (p=0.000 n=50)
memmove/163        5.199n ± 0%   5.250n ± 0%   +0.99% (p=0.000 n=50)
memmove/164        5.205n ± 0%   5.260n ± 0%   +1.04% (p=0.000 n=50)
memmove/165        5.208n ± 0%   5.261n ± 0%   +1.01% (p=0.000 n=50)
memmove/166        5.227n ± 0%   5.275n ± 0%   +0.91% (p=0.000 n=50)
memmove/167        5.233n ± 0%   5.281n ± 0%   +0.92% (p=0.000 n=50)
memmove/168        5.236n ± 0%   5.295n ± 0%   +1.12% (p=0.000 n=50)
memmove/169        5.256n ± 0%   5.297n ± 0%   +0.79% (p=0.000 n=50)
memmove/170        5.259n ± 0%   5.302n ± 0%   +0.80% (p=0.000 n=50)
memmove/171        5.269n ± 0%   5.321n ± 0%   +0.97% (p=0.000 n=50)
memmove/172        5.266n ± 0%   5.318n ± 0%   +0.98% (p=0.000 n=50)
memmove/173        5.272n ± 0%   5.330n ± 0%   +1.09% (p=0.000 n=50)
memmove/174        5.284n ± 0%   5.331n ± 0%   +0.89% (p=0.000 n=50)
memmove/175        5.284n ± 0%   5.322n ± 0%   +0.72% (p=0.000 n=50)
memmove/176        5.298n ± 0%   5.337n ± 0%   +0.74% (p=0.000 n=50)
memmove/177        5.282n ± 0%   5.338n ± 0%   +1.04% (p=0.000 n=50)
memmove/178        5.299n ± 0%   5.337n ± 0%   +0.71% (p=0.000 n=50)
memmove/179        5.296n ± 0%   5.343n ± 0%   +0.88% (p=0.000 n=50)
memmove/180        5.292n ± 0%   5.343n ± 0%   +0.97% (p=0.000 n=50)
memmove/181        5.303n ± 0%   5.335n ± 0%   +0.60% (p=0.000 n=50)
memmove/182        5.305n ± 0%   5.338n ± 0%   +0.62% (p=0.000 n=50)
memmove/183        5.298n ± 0%   5.329n ± 0%   +0.59% (p=0.000 n=50)
memmove/184        5.299n ± 0%   5.333n ± 0%   +0.64% (p=0.000 n=50)
memmove/185        5.291n ± 0%   5.330n ± 0%   +0.73% (p=0.000 n=50)
memmove/186        5.296n ± 0%   5.332n ± 0%   +0.68% (p=0.000 n=50)
memmove/187        5.297n ± 0%   5.320n ± 0%   +0.44% (p=0.000 n=50)
memmove/188        5.286n ± 0%   5.314n ± 0%   +0.53% (p=0.000 n=50)
memmove/189        5.293n ± 0%   5.318n ± 0%   +0.46% (p=0.000 n=50)
memmove/190        5.294n ± 0%   5.318n ± 0%   +0.45% (p=0.000 n=50)
memmove/191        5.292n ± 0%   5.314n ± 0%   +0.40% (p=0.032 n=50)
memmove/192        5.272n ± 0%   5.304n ± 0%   +0.60% (p=0.000 n=50)
memmove/193        5.279n ± 0%   5.310n ± 0%   +0.57% (p=0.000 n=50)
memmove/194        5.294n ± 0%   5.308n ± 0%   +0.26% (p=0.018 n=50)
memmove/195        5.302n ± 0%   5.311n ± 0%   +0.18% (p=0.010 n=50)
memmove/196        5.301n ± 0%   5.316n ± 0%   +0.28% (p=0.023 n=50)
memmove/197        5.302n ± 0%   5.327n ± 0%   +0.47% (p=0.000 n=50)
memmove/198        5.310n ± 0%   5.326n ± 0%   +0.30% (p=0.003 n=50)
memmove/199        5.303n ± 0%   5.319n ± 0%   +0.30% (p=0.009 n=50)
memmove/200        5.312n ± 0%   5.330n ± 0%   +0.35% (p=0.001 n=50)
memmove/201        5.307n ± 0%   5.333n ± 0%   +0.50% (p=0.000 n=50)
memmove/202        5.311n ± 0%   5.334n ± 0%   +0.44% (p=0.000 n=50)
memmove/203        5.313n ± 0%   5.335n ± 0%   +0.41% (p=0.006 n=50)
memmove/204        5.312n ± 0%   5.332n ± 0%   +0.36% (p=0.002 n=50)
memmove/205        5.318n ± 0%   5.345n ± 0%   +0.50% (p=0.000 n=50)
memmove/206        5.311n ± 0%   5.333n ± 0%   +0.42% (p=0.002 n=50)
memmove/207        5.310n ± 0%   5.338n ± 0%   +0.52% (p=0.000 n=50)
memmove/208        5.319n ± 0%   5.341n ± 0%   +0.40% (p=0.004 n=50)
memmove/209        5.330n ± 0%   5.346n ± 0%   +0.30% (p=0.004 n=50)
memmove/210        5.329n ± 0%   5.349n ± 0%   +0.38% (p=0.002 n=50)
memmove/211        5.318n ± 0%   5.340n ± 0%   +0.41% (p=0.000 n=50)
memmove/212        5.339n ± 0%   5.343n ± 0%        ~ (p=0.396 n=50)
memmove/213        5.329n ± 0%   5.343n ± 0%   +0.25% (p=0.017 n=50)
memmove/214        5.339n ± 0%   5.358n ± 0%   +0.35% (p=0.035 n=50)
memmove/215        5.342n ± 0%   5.346n ± 0%        ~ (p=0.063 n=50)
memmove/216        5.338n ± 0%   5.359n ± 0%   +0.39% (p=0.002 n=50)
memmove/217        5.341n ± 0%   5.362n ± 0%   +0.39% (p=0.015 n=50)
memmove/218        5.354n ± 0%   5.373n ± 0%   +0.36% (p=0.041 n=50)
memmove/219        5.352n ± 0%   5.362n ± 0%        ~ (p=0.143 n=50)
memmove/220        5.344n ± 0%   5.370n ± 0%   +0.50% (p=0.001 n=50)
memmove/221        5.345n ± 0%   5.373n ± 0%   +0.53% (p=0.000 n=50)
memmove/222        5.348n ± 0%   5.360n ± 0%   +0.23% (p=0.014 n=50)
memmove/223        5.354n ± 0%   5.377n ± 0%   +0.43% (p=0.024 n=50)
memmove/224        5.352n ± 0%   5.363n ± 0%        ~ (p=0.052 n=50)
memmove/225        5.372n ± 0%   5.380n ± 0%        ~ (p=0.481 n=50)
memmove/226        5.368n ± 0%   5.386n ± 0%   +0.34% (p=0.004 n=50)
memmove/227        5.386n ± 0%   5.402n ± 0%   +0.29% (p=0.028 n=50)
memmove/228        5.400n ± 0%   5.408n ± 0%        ~ (p=0.174 n=50)
memmove/229        5.423n ± 0%   5.427n ± 0%        ~ (p=0.444 n=50)
memmove/230        5.411n ± 0%   5.429n ± 0%   +0.33% (p=0.020 n=50)
memmove/231        5.420n ± 0%   5.433n ± 0%   +0.24% (p=0.034 n=50)
memmove/232        5.435n ± 0%   5.441n ± 0%        ~ (p=0.235 n=50)
memmove/233        5.446n ± 0%   5.462n ± 0%        ~ (p=0.590 n=50)
memmove/234        5.467n ± 0%   5.461n ± 0%        ~ (p=0.921 n=50)
memmove/235        5.472n ± 0%   5.478n ± 0%        ~ (p=0.883 n=50)
memmove/236        5.466n ± 0%   5.478n ± 0%        ~ (p=0.324 n=50)
memmove/237        5.471n ± 0%   5.489n ± 0%        ~ (p=0.132 n=50)
memmove/238        5.485n ± 0%   5.489n ± 0%        ~ (p=0.460 n=50)
memmove/239        5.484n ± 0%   5.488n ± 0%        ~ (p=0.833 n=50)
memmove/240        5.483n ± 0%   5.495n ± 0%        ~ (p=0.095 n=50)
memmove/241        5.498n ± 0%   5.514n ± 0%        ~ (p=0.077 n=50)
memmove/242        5.518n ± 0%   5.517n ± 0%        ~ (p=0.481 n=50)
memmove/243        5.514n ± 0%   5.511n ± 0%        ~ (p=0.503 n=50)
memmove/244        5.510n ± 0%   5.497n ± 0%   -0.24% (p=0.038 n=50)
memmove/245        5.516n ± 0%   5.505n ± 0%        ~ (p=0.317 n=50)
memmove/246        5.513n ± 1%   5.494n ± 0%        ~ (p=0.147 n=50)
memmove/247        5.518n ± 0%   5.499n ± 0%   -0.36% (p=0.011 n=50)
memmove/248        5.503n ± 0%   5.492n ± 0%        ~ (p=0.267 n=50)
memmove/249        5.498n ± 0%   5.497n ± 0%        ~ (p=0.765 n=50)
memmove/250        5.485n ± 0%   5.493n ± 0%        ~ (p=0.348 n=50)
memmove/251        5.503n ± 0%   5.482n ± 0%   -0.37% (p=0.013 n=50)
memmove/252        5.497n ± 0%   5.485n ± 0%        ~ (p=0.077 n=50)
memmove/253        5.489n ± 0%   5.496n ± 0%        ~ (p=0.850 n=50)
memmove/254        5.497n ± 0%   5.491n ± 0%        ~ (p=0.548 n=50)
memmove/255        5.484n ± 1%   5.494n ± 0%        ~ (p=0.888 n=50)
memmove/256        6.952n ± 0%   7.676n ± 0%  +10.41% (p=0.000 n=50)
geomean            4.406n        4.127n        -6.33%
@dvyukov dvyukov force-pushed the dvyukov-memmove-small-size branch from 65dfb6d to 7a694d9 Compare October 26, 2023 09:21
Copy link
Contributor

@gchatelet gchatelet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx!

@dvyukov dvyukov merged commit 0e110fb into llvm:main Oct 26, 2023
zahiraam pushed a commit to zahiraam/llvm-project that referenced this pull request Oct 26, 2023
1. Remove is_disjoint check for smaller sizes and reduce code bloat.

inline_memmove may handle some small sizes as efficiently
as inline_memcpy. For these sizes we may not do is_disjoint check.
This both avoids additional code for the most frequent smaller sizes
and removes code bloat (we don't need the memcpy logic for small sizes).
Here we heavily rely on inlining and dead code elimination: from the
first
inline_memmove we should get only handling of small sizes, and from
the second inline_memmove and inline_memcpy we should get only handling
of larger sizes.

2. Use the memcpy thresholds for memmove.
Memcpy thresholds were more carefully tuned.
This becomes more important since we use memmove
for all small sizes always now.

3. Fix boundary conditions for sizes = 16/32/64.
See the added comment for explanations.

Memmove function size drops from 885 to 715 bytes
due to removed duplication.

```
                 │  baseline   │             small-size              │
                 │   sec/op    │   sec/op     vs base                │
memmove/Google_A   3.208n ± 0%   2.911n ± 0%   -9.25% (n=100)
memmove/Google_B   4.113n ± 1%   3.428n ± 0%  -16.65% (n=100)
memmove/Google_D   5.838n ± 0%   4.158n ± 0%  -28.78% (n=100)
memmove/Google_S   4.712n ± 1%   3.899n ± 0%  -17.25% (n=100)
memmove/Google_U   3.609n ± 0%   3.247n ± 1%  -10.02% (n=100)
memmove/0          2.982n ± 0%   2.169n ± 0%  -27.26% (n=50)
memmove/1          3.253n ± 0%   2.168n ± 0%  -33.34% (n=50)
memmove/2          3.255n ± 0%   2.169n ± 0%  -33.38% (n=50)
memmove/3          3.259n ± 2%   2.175n ± 0%  -33.27% (p=0.000 n=50)
memmove/4          3.259n ± 0%   2.168n ± 5%  -33.46% (p=0.000 n=50)
memmove/5          2.488n ± 0%   1.926n ± 0%  -22.57% (p=0.000 n=50)
memmove/6          2.490n ± 0%   1.928n ± 0%  -22.59% (p=0.000 n=50)
memmove/7          2.492n ± 0%   1.927n ± 0%  -22.65% (p=0.000 n=50)
memmove/8          2.737n ± 0%   2.711n ± 0%   -0.97% (p=0.000 n=50)
memmove/9          2.736n ± 0%   2.711n ± 0%   -0.94% (p=0.000 n=50)
memmove/10         2.739n ± 0%   2.711n ± 0%   -1.04% (p=0.000 n=50)
memmove/11         2.740n ± 0%   2.711n ± 0%   -1.07% (p=0.000 n=50)
memmove/12         2.740n ± 0%   2.711n ± 0%   -1.09% (p=0.000 n=50)
memmove/13         2.744n ± 0%   2.711n ± 0%   -1.22% (p=0.000 n=50)
memmove/14         2.742n ± 0%   2.711n ± 0%   -1.14% (p=0.000 n=50)
memmove/15         2.742n ± 0%   2.711n ± 0%   -1.15% (p=0.000 n=50)
memmove/16         2.997n ± 0%   2.981n ± 0%   -0.52% (p=0.000 n=50)
memmove/17         2.998n ± 0%   2.981n ± 0%   -0.55% (p=0.000 n=50)
memmove/18         2.998n ± 0%   2.981n ± 0%   -0.55% (p=0.000 n=50)
memmove/19         2.999n ± 0%   2.982n ± 0%   -0.59% (p=0.000 n=50)
memmove/20         2.998n ± 0%   2.981n ± 0%   -0.55% (p=0.000 n=50)
memmove/21         3.000n ± 0%   2.981n ± 0%   -0.61% (p=0.000 n=50)
memmove/22         3.002n ± 0%   2.981n ± 0%   -0.68% (p=0.000 n=50)
memmove/23         3.002n ± 0%   2.981n ± 0%   -0.67% (p=0.000 n=50)
memmove/24         3.002n ± 0%   2.981n ± 0%   -0.70% (n=50)
memmove/25         3.002n ± 0%   2.981n ± 0%   -0.68% (p=0.000 n=50)
memmove/26         3.004n ± 0%   2.982n ± 0%   -0.74% (p=0.000 n=50)
memmove/27         3.005n ± 0%   2.981n ± 0%   -0.79% (n=50)
memmove/28         3.005n ± 0%   2.982n ± 0%   -0.77% (n=50)
memmove/29         3.009n ± 0%   2.981n ± 0%   -0.92% (n=50)
memmove/30         3.008n ± 0%   2.981n ± 0%   -0.89% (n=50)
memmove/31         3.007n ± 0%   2.982n ± 0%   -0.86% (n=50)
memmove/32         3.540n ± 0%   2.998n ± 0%  -15.31% (p=0.000 n=50)
memmove/33         3.544n ± 0%   2.997n ± 0%  -15.44% (p=0.000 n=50)
memmove/34         3.546n ± 0%   2.999n ± 0%  -15.42% (n=50)
memmove/35         3.545n ± 0%   2.999n ± 0%  -15.40% (n=50)
memmove/36         3.548n ± 0%   2.998n ± 0%  -15.52% (p=0.000 n=50)
memmove/37         3.546n ± 0%   3.000n ± 0%  -15.41% (n=50)
memmove/38         3.549n ± 0%   2.999n ± 0%  -15.49% (p=0.000 n=50)
memmove/39         3.549n ± 0%   2.999n ± 0%  -15.48% (p=0.000 n=50)
memmove/40         3.549n ± 0%   3.000n ± 0%  -15.46% (p=0.000 n=50)
memmove/41         3.550n ± 0%   3.001n ± 0%  -15.47% (n=50)
memmove/42         3.549n ± 0%   3.001n ± 0%  -15.43% (n=50)
memmove/43         3.552n ± 0%   3.001n ± 0%  -15.52% (p=0.000 n=50)
memmove/44         3.552n ± 0%   3.001n ± 0%  -15.51% (n=50)
memmove/45         3.552n ± 0%   3.002n ± 0%  -15.48% (n=50)
memmove/46         3.554n ± 0%   3.001n ± 0%  -15.55% (p=0.000 n=50)
memmove/47         3.556n ± 0%   3.002n ± 0%  -15.58% (p=0.000 n=50)
memmove/48         3.555n ± 0%   3.003n ± 0%  -15.54% (n=50)
memmove/49         3.557n ± 0%   3.002n ± 0%  -15.59% (p=0.000 n=50)
memmove/50         3.557n ± 0%   3.004n ± 0%  -15.55% (p=0.000 n=50)
memmove/51         3.556n ± 0%   3.004n ± 0%  -15.53% (p=0.000 n=50)
memmove/52         3.561n ± 0%   3.004n ± 0%  -15.65% (p=0.000 n=50)
memmove/53         3.558n ± 0%   3.004n ± 0%  -15.57% (p=0.000 n=50)
memmove/54         3.561n ± 0%   3.005n ± 0%  -15.62% (n=50)
memmove/55         3.560n ± 0%   3.006n ± 0%  -15.57% (n=50)
memmove/56         3.562n ± 0%   3.006n ± 0%  -15.60% (p=0.000 n=50)
memmove/57         3.563n ± 0%   3.006n ± 0%  -15.64% (n=50)
memmove/58         3.565n ± 0%   3.007n ± 0%  -15.64% (p=0.000 n=50)
memmove/59         3.564n ± 0%   3.006n ± 0%  -15.66% (p=0.000 n=50)
memmove/60         3.570n ± 0%   3.008n ± 0%  -15.74% (p=0.000 n=50)
memmove/61         3.566n ± 0%   3.009n ± 0%  -15.63% (p=0.000 n=50)
memmove/62         3.567n ± 0%   3.007n ± 0%  -15.70% (p=0.000 n=50)
memmove/63         3.568n ± 0%   3.008n ± 0%  -15.71% (p=0.000 n=50)
memmove/64         4.104n ± 0%   3.008n ± 0%  -26.70% (p=0.000 n=50)
memmove/65         4.126n ± 0%   3.662n ± 0%  -11.26% (p=0.000 n=50)
memmove/66         4.128n ± 0%   3.662n ± 0%  -11.29% (n=50)
memmove/67         4.129n ± 0%   3.662n ± 0%  -11.31% (n=50)
memmove/68         4.129n ± 0%   3.661n ± 0%  -11.33% (p=0.000 n=50)
memmove/69         4.130n ± 0%   3.662n ± 0%  -11.34% (p=0.000 n=50)
memmove/70         4.130n ± 0%   3.662n ± 0%  -11.33% (n=50)
memmove/71         4.132n ± 0%   3.662n ± 0%  -11.38% (p=0.000 n=50)
memmove/72         4.131n ± 0%   3.661n ± 0%  -11.39% (n=50)
memmove/73         4.135n ± 0%   3.661n ± 0%  -11.45% (p=0.000 n=50)
memmove/74         4.137n ± 0%   3.662n ± 0%  -11.49% (n=50)
memmove/75         4.138n ± 0%   3.662n ± 0%  -11.51% (p=0.000 n=50)
memmove/76         4.139n ± 0%   3.661n ± 0%  -11.56% (p=0.000 n=50)
memmove/77         4.136n ± 0%   3.662n ± 0%  -11.47% (p=0.000 n=50)
memmove/78         4.143n ± 0%   3.661n ± 0%  -11.62% (p=0.000 n=50)
memmove/79         4.142n ± 0%   3.661n ± 0%  -11.60% (n=50)
memmove/80         4.142n ± 0%   3.661n ± 0%  -11.62% (p=0.000 n=50)
memmove/81         4.140n ± 0%   3.661n ± 0%  -11.57% (n=50)
memmove/82         4.146n ± 0%   3.661n ± 0%  -11.69% (n=50)
memmove/83         4.143n ± 0%   3.661n ± 0%  -11.63% (p=0.000 n=50)
memmove/84         4.143n ± 0%   3.661n ± 0%  -11.63% (n=50)
memmove/85         4.147n ± 0%   3.661n ± 0%  -11.73% (p=0.000 n=50)
memmove/86         4.142n ± 0%   3.661n ± 0%  -11.62% (p=0.000 n=50)
memmove/87         4.147n ± 0%   3.661n ± 0%  -11.72% (p=0.000 n=50)
memmove/88         4.148n ± 0%   3.661n ± 0%  -11.74% (n=50)
memmove/89         4.152n ± 0%   3.661n ± 0%  -11.84% (n=50)
memmove/90         4.151n ± 0%   3.661n ± 0%  -11.81% (n=50)
memmove/91         4.150n ± 0%   3.661n ± 0%  -11.78% (n=50)
memmove/92         4.153n ± 0%   3.661n ± 0%  -11.86% (n=50)
memmove/93         4.158n ± 0%   3.661n ± 0%  -11.95% (n=50)
memmove/94         4.157n ± 0%   3.661n ± 0%  -11.95% (p=0.000 n=50)
memmove/95         4.155n ± 0%   3.661n ± 0%  -11.90% (p=0.000 n=50)
memmove/96         4.149n ± 0%   3.660n ± 0%  -11.79% (n=50)
memmove/97         4.157n ± 0%   3.661n ± 0%  -11.94% (n=50)
memmove/98         4.157n ± 0%   3.661n ± 0%  -11.94% (n=50)
memmove/99         4.168n ± 0%   3.661n ± 0%  -12.17% (p=0.000 n=50)
memmove/100        4.159n ± 0%   3.660n ± 0%  -12.00% (p=0.000 n=50)
memmove/101        4.161n ± 0%   3.660n ± 0%  -12.03% (p=0.000 n=50)
memmove/102        4.165n ± 0%   3.660n ± 0%  -12.12% (p=0.000 n=50)
memmove/103        4.164n ± 0%   3.661n ± 0%  -12.08% (n=50)
memmove/104        4.164n ± 0%   3.660n ± 0%  -12.11% (n=50)
memmove/105        4.165n ± 0%   3.660n ± 0%  -12.12% (p=0.000 n=50)
memmove/106        4.166n ± 0%   3.660n ± 0%  -12.15% (n=50)
memmove/107        4.171n ± 0%   3.660n ± 1%  -12.26% (p=0.000 n=50)
memmove/108        4.173n ± 0%   3.660n ± 0%  -12.30% (p=0.000 n=50)
memmove/109        4.170n ± 0%   3.660n ± 0%  -12.24% (n=50)
memmove/110        4.174n ± 0%   3.660n ± 0%  -12.31% (n=50)
memmove/111        4.176n ± 0%   3.660n ± 0%  -12.35% (p=0.000 n=50)
memmove/112        4.174n ± 0%   3.659n ± 0%  -12.34% (p=0.000 n=50)
memmove/113        4.176n ± 0%   3.660n ± 0%  -12.35% (n=50)
memmove/114        4.182n ± 0%   3.660n ± 0%  -12.49% (n=50)
memmove/115        4.185n ± 0%   3.660n ± 0%  -12.55% (n=50)
memmove/116        4.184n ± 0%   3.659n ± 0%  -12.54% (n=50)
memmove/117        4.182n ± 0%   3.660n ± 0%  -12.50% (n=50)
memmove/118        4.188n ± 0%   3.660n ± 0%  -12.61% (n=50)
memmove/119        4.186n ± 0%   3.660n ± 0%  -12.57% (p=0.000 n=50)
memmove/120        4.189n ± 0%   3.659n ± 0%  -12.63% (n=50)
memmove/121        4.187n ± 0%   3.660n ± 0%  -12.60% (n=50)
memmove/122        4.186n ± 0%   3.660n ± 0%  -12.58% (n=50)
memmove/123        4.187n ± 0%   3.660n ± 0%  -12.60% (n=50)
memmove/124        4.189n ± 0%   3.659n ± 0%  -12.65% (n=50)
memmove/125        4.195n ± 0%   3.659n ± 0%  -12.78% (n=50)
memmove/126        4.197n ± 0%   3.659n ± 0%  -12.81% (n=50)
memmove/127        4.194n ± 0%   3.659n ± 0%  -12.75% (n=50)
memmove/128        5.035n ± 0%   3.659n ± 0%  -27.32% (n=50)
memmove/129        5.127n ± 0%   5.164n ± 0%   +0.73% (p=0.000 n=50)
memmove/130        5.130n ± 0%   5.176n ± 0%   +0.88% (p=0.000 n=50)
memmove/131        5.127n ± 0%   5.180n ± 0%   +1.05% (p=0.000 n=50)
memmove/132        5.131n ± 0%   5.169n ± 0%   +0.75% (p=0.000 n=50)
memmove/133        5.137n ± 0%   5.179n ± 0%   +0.81% (p=0.000 n=50)
memmove/134        5.140n ± 0%   5.178n ± 0%   +0.74% (p=0.000 n=50)
memmove/135        5.141n ± 0%   5.187n ± 0%   +0.88% (p=0.000 n=50)
memmove/136        5.133n ± 0%   5.184n ± 0%   +0.99% (p=0.000 n=50)
memmove/137        5.148n ± 0%   5.186n ± 0%   +0.73% (p=0.000 n=50)
memmove/138        5.143n ± 0%   5.189n ± 0%   +0.88% (p=0.000 n=50)
memmove/139        5.142n ± 0%   5.192n ± 0%   +0.97% (p=0.000 n=50)
memmove/140        5.141n ± 0%   5.192n ± 0%   +1.01% (p=0.000 n=50)
memmove/141        5.155n ± 0%   5.188n ± 0%   +0.64% (p=0.000 n=50)
memmove/142        5.146n ± 0%   5.192n ± 0%   +0.90% (p=0.000 n=50)
memmove/143        5.142n ± 0%   5.203n ± 0%   +1.19% (p=0.000 n=50)
memmove/144        5.146n ± 0%   5.197n ± 0%   +0.99% (p=0.000 n=50)
memmove/145        5.146n ± 0%   5.196n ± 0%   +0.97% (p=0.000 n=50)
memmove/146        5.151n ± 0%   5.207n ± 0%   +1.10% (p=0.000 n=50)
memmove/147        5.151n ± 0%   5.205n ± 0%   +1.06% (p=0.000 n=50)
memmove/148        5.156n ± 0%   5.190n ± 0%   +0.66% (p=0.000 n=50)
memmove/149        5.158n ± 0%   5.212n ± 0%   +1.04% (p=0.000 n=50)
memmove/150        5.160n ± 0%   5.203n ± 0%   +0.84% (p=0.000 n=50)
memmove/151        5.167n ± 0%   5.210n ± 0%   +0.83% (p=0.000 n=50)
memmove/152        5.157n ± 0%   5.206n ± 0%   +0.94% (p=0.000 n=50)
memmove/153        5.170n ± 0%   5.211n ± 0%   +0.80% (p=0.000 n=50)
memmove/154        5.169n ± 0%   5.222n ± 0%   +1.02% (p=0.000 n=50)
memmove/155        5.171n ± 0%   5.215n ± 0%   +0.87% (p=0.000 n=50)
memmove/156        5.174n ± 0%   5.214n ± 0%   +0.78% (p=0.000 n=50)
memmove/157        5.171n ± 0%   5.218n ± 0%   +0.92% (p=0.000 n=50)
memmove/158        5.168n ± 0%   5.224n ± 0%   +1.09% (p=0.000 n=50)
memmove/159        5.179n ± 0%   5.218n ± 0%   +0.76% (p=0.000 n=50)
memmove/160        5.170n ± 0%   5.219n ± 0%   +0.95% (p=0.000 n=50)
memmove/161        5.187n ± 0%   5.220n ± 0%   +0.64% (p=0.000 n=50)
memmove/162        5.189n ± 0%   5.234n ± 0%   +0.86% (p=0.000 n=50)
memmove/163        5.199n ± 0%   5.250n ± 0%   +0.99% (p=0.000 n=50)
memmove/164        5.205n ± 0%   5.260n ± 0%   +1.04% (p=0.000 n=50)
memmove/165        5.208n ± 0%   5.261n ± 0%   +1.01% (p=0.000 n=50)
memmove/166        5.227n ± 0%   5.275n ± 0%   +0.91% (p=0.000 n=50)
memmove/167        5.233n ± 0%   5.281n ± 0%   +0.92% (p=0.000 n=50)
memmove/168        5.236n ± 0%   5.295n ± 0%   +1.12% (p=0.000 n=50)
memmove/169        5.256n ± 0%   5.297n ± 0%   +0.79% (p=0.000 n=50)
memmove/170        5.259n ± 0%   5.302n ± 0%   +0.80% (p=0.000 n=50)
memmove/171        5.269n ± 0%   5.321n ± 0%   +0.97% (p=0.000 n=50)
memmove/172        5.266n ± 0%   5.318n ± 0%   +0.98% (p=0.000 n=50)
memmove/173        5.272n ± 0%   5.330n ± 0%   +1.09% (p=0.000 n=50)
memmove/174        5.284n ± 0%   5.331n ± 0%   +0.89% (p=0.000 n=50)
memmove/175        5.284n ± 0%   5.322n ± 0%   +0.72% (p=0.000 n=50)
memmove/176        5.298n ± 0%   5.337n ± 0%   +0.74% (p=0.000 n=50)
memmove/177        5.282n ± 0%   5.338n ± 0%   +1.04% (p=0.000 n=50)
memmove/178        5.299n ± 0%   5.337n ± 0%   +0.71% (p=0.000 n=50)
memmove/179        5.296n ± 0%   5.343n ± 0%   +0.88% (p=0.000 n=50)
memmove/180        5.292n ± 0%   5.343n ± 0%   +0.97% (p=0.000 n=50)
memmove/181        5.303n ± 0%   5.335n ± 0%   +0.60% (p=0.000 n=50)
memmove/182        5.305n ± 0%   5.338n ± 0%   +0.62% (p=0.000 n=50)
memmove/183        5.298n ± 0%   5.329n ± 0%   +0.59% (p=0.000 n=50)
memmove/184        5.299n ± 0%   5.333n ± 0%   +0.64% (p=0.000 n=50)
memmove/185        5.291n ± 0%   5.330n ± 0%   +0.73% (p=0.000 n=50)
memmove/186        5.296n ± 0%   5.332n ± 0%   +0.68% (p=0.000 n=50)
memmove/187        5.297n ± 0%   5.320n ± 0%   +0.44% (p=0.000 n=50)
memmove/188        5.286n ± 0%   5.314n ± 0%   +0.53% (p=0.000 n=50)
memmove/189        5.293n ± 0%   5.318n ± 0%   +0.46% (p=0.000 n=50)
memmove/190        5.294n ± 0%   5.318n ± 0%   +0.45% (p=0.000 n=50)
memmove/191        5.292n ± 0%   5.314n ± 0%   +0.40% (p=0.032 n=50)
memmove/192        5.272n ± 0%   5.304n ± 0%   +0.60% (p=0.000 n=50)
memmove/193        5.279n ± 0%   5.310n ± 0%   +0.57% (p=0.000 n=50)
memmove/194        5.294n ± 0%   5.308n ± 0%   +0.26% (p=0.018 n=50)
memmove/195        5.302n ± 0%   5.311n ± 0%   +0.18% (p=0.010 n=50)
memmove/196        5.301n ± 0%   5.316n ± 0%   +0.28% (p=0.023 n=50)
memmove/197        5.302n ± 0%   5.327n ± 0%   +0.47% (p=0.000 n=50)
memmove/198        5.310n ± 0%   5.326n ± 0%   +0.30% (p=0.003 n=50)
memmove/199        5.303n ± 0%   5.319n ± 0%   +0.30% (p=0.009 n=50)
memmove/200        5.312n ± 0%   5.330n ± 0%   +0.35% (p=0.001 n=50)
memmove/201        5.307n ± 0%   5.333n ± 0%   +0.50% (p=0.000 n=50)
memmove/202        5.311n ± 0%   5.334n ± 0%   +0.44% (p=0.000 n=50)
memmove/203        5.313n ± 0%   5.335n ± 0%   +0.41% (p=0.006 n=50)
memmove/204        5.312n ± 0%   5.332n ± 0%   +0.36% (p=0.002 n=50)
memmove/205        5.318n ± 0%   5.345n ± 0%   +0.50% (p=0.000 n=50)
memmove/206        5.311n ± 0%   5.333n ± 0%   +0.42% (p=0.002 n=50)
memmove/207        5.310n ± 0%   5.338n ± 0%   +0.52% (p=0.000 n=50)
memmove/208        5.319n ± 0%   5.341n ± 0%   +0.40% (p=0.004 n=50)
memmove/209        5.330n ± 0%   5.346n ± 0%   +0.30% (p=0.004 n=50)
memmove/210        5.329n ± 0%   5.349n ± 0%   +0.38% (p=0.002 n=50)
memmove/211        5.318n ± 0%   5.340n ± 0%   +0.41% (p=0.000 n=50)
memmove/212        5.339n ± 0%   5.343n ± 0%        ~ (p=0.396 n=50)
memmove/213        5.329n ± 0%   5.343n ± 0%   +0.25% (p=0.017 n=50)
memmove/214        5.339n ± 0%   5.358n ± 0%   +0.35% (p=0.035 n=50)
memmove/215        5.342n ± 0%   5.346n ± 0%        ~ (p=0.063 n=50)
memmove/216        5.338n ± 0%   5.359n ± 0%   +0.39% (p=0.002 n=50)
memmove/217        5.341n ± 0%   5.362n ± 0%   +0.39% (p=0.015 n=50)
memmove/218        5.354n ± 0%   5.373n ± 0%   +0.36% (p=0.041 n=50)
memmove/219        5.352n ± 0%   5.362n ± 0%        ~ (p=0.143 n=50)
memmove/220        5.344n ± 0%   5.370n ± 0%   +0.50% (p=0.001 n=50)
memmove/221        5.345n ± 0%   5.373n ± 0%   +0.53% (p=0.000 n=50)
memmove/222        5.348n ± 0%   5.360n ± 0%   +0.23% (p=0.014 n=50)
memmove/223        5.354n ± 0%   5.377n ± 0%   +0.43% (p=0.024 n=50)
memmove/224        5.352n ± 0%   5.363n ± 0%        ~ (p=0.052 n=50)
memmove/225        5.372n ± 0%   5.380n ± 0%        ~ (p=0.481 n=50)
memmove/226        5.368n ± 0%   5.386n ± 0%   +0.34% (p=0.004 n=50)
memmove/227        5.386n ± 0%   5.402n ± 0%   +0.29% (p=0.028 n=50)
memmove/228        5.400n ± 0%   5.408n ± 0%        ~ (p=0.174 n=50)
memmove/229        5.423n ± 0%   5.427n ± 0%        ~ (p=0.444 n=50)
memmove/230        5.411n ± 0%   5.429n ± 0%   +0.33% (p=0.020 n=50)
memmove/231        5.420n ± 0%   5.433n ± 0%   +0.24% (p=0.034 n=50)
memmove/232        5.435n ± 0%   5.441n ± 0%        ~ (p=0.235 n=50)
memmove/233        5.446n ± 0%   5.462n ± 0%        ~ (p=0.590 n=50)
memmove/234        5.467n ± 0%   5.461n ± 0%        ~ (p=0.921 n=50)
memmove/235        5.472n ± 0%   5.478n ± 0%        ~ (p=0.883 n=50)
memmove/236        5.466n ± 0%   5.478n ± 0%        ~ (p=0.324 n=50)
memmove/237        5.471n ± 0%   5.489n ± 0%        ~ (p=0.132 n=50)
memmove/238        5.485n ± 0%   5.489n ± 0%        ~ (p=0.460 n=50)
memmove/239        5.484n ± 0%   5.488n ± 0%        ~ (p=0.833 n=50)
memmove/240        5.483n ± 0%   5.495n ± 0%        ~ (p=0.095 n=50)
memmove/241        5.498n ± 0%   5.514n ± 0%        ~ (p=0.077 n=50)
memmove/242        5.518n ± 0%   5.517n ± 0%        ~ (p=0.481 n=50)
memmove/243        5.514n ± 0%   5.511n ± 0%        ~ (p=0.503 n=50)
memmove/244        5.510n ± 0%   5.497n ± 0%   -0.24% (p=0.038 n=50)
memmove/245        5.516n ± 0%   5.505n ± 0%        ~ (p=0.317 n=50)
memmove/246        5.513n ± 1%   5.494n ± 0%        ~ (p=0.147 n=50)
memmove/247        5.518n ± 0%   5.499n ± 0%   -0.36% (p=0.011 n=50)
memmove/248        5.503n ± 0%   5.492n ± 0%        ~ (p=0.267 n=50)
memmove/249        5.498n ± 0%   5.497n ± 0%        ~ (p=0.765 n=50)
memmove/250        5.485n ± 0%   5.493n ± 0%        ~ (p=0.348 n=50)
memmove/251        5.503n ± 0%   5.482n ± 0%   -0.37% (p=0.013 n=50)
memmove/252        5.497n ± 0%   5.485n ± 0%        ~ (p=0.077 n=50)
memmove/253        5.489n ± 0%   5.496n ± 0%        ~ (p=0.850 n=50)
memmove/254        5.497n ± 0%   5.491n ± 0%        ~ (p=0.548 n=50)
memmove/255        5.484n ± 1%   5.494n ± 0%        ~ (p=0.888 n=50)
memmove/256        6.952n ± 0%   7.676n ± 0%  +10.41% (p=0.000 n=50)
geomean            4.406n        4.127n        -6.33%
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants