Skip to content

Add ffmpeg as potential external project #188

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

mstorsjo
Copy link
Member

To add the external source, run
"git clone -b release/7.1 https://git.ffmpeg.org/ffmpeg.git" to clone the ffmpeg sources in the
llvm-test-suite/test-suite-externals directory, or configure with TEST_SUITE_FFMPEG_ROOT pointing at a directory containing this source.

(As this points to a branch, not a tag, it may be required to check out the known matching commit 10aaf84f855dbcedb8ee2e3fce307e9b98320946.)

This builds two targets; the ffmpeg command line executable (which isn't executed as a test) and the ffmpeg checkasm test executable (which runs compiler generated functions and compares them with handwritten assembly versions of them). The checkasm execuable can also be run manually to microbenchmark functions, e.g. "External/ffmpeg/ffmpeg_checkasm --test=vp9dsp --bench=vp9_avg_8tap_smooth* 0".

It is not very meaningful to benchmark the execution of the whole checkasm executable, as it runs a different numbers of functions depending on the number of SIMD extensions available on the target CPU.

(Benchmarking on aarch64 on Linux currently requires direct access to the pmccntr_el0 register. To currently use a different timer register, edit ffmpeg/libavutil/aarch64/timer.h and change it to use cntvct_el0 instead of pmccntr_el0 on Linux too.)

This uses a static configuration of the project (when building the upstream project with their own build system, there's a number of build options that can be configured). Assembly is hooked up and enabled on i386, x86_64, arm and aarch64.

For architectures other than those, the checkasm test won't have any reference for detecting e.g. miscompilations of functions, but building it can still be meaningful (for testing compilation, or benchmarking the execution of the C version of functions).

Instead of having the CMake configuration check for a large number of things and generate a config.h file at configure time, things are either hardcoded, or set up with preprocessor conditions within the hand-tweaked config.h file.

@mstorsjo
Copy link
Member Author

This is modelled and set up in exactly the same way as for dav1d. Most of it is the same - the main key differences are:

  • ffmpeg is a much much larger codebase, with around 2200 files to be built, instead of a bit over 100. This provides much more surface for hitting miscompilations that e.g. trigger asserts in LLVM
  • While it contains more code, the fraction of the code that has assembly variants obviously is smaller, but it still has a fair amount of assembly code, and corresponding checkasm tests. Unfortunately, not all assembly code has got corresponding checkasm tests (so not everything that should be possible to microbenchmark is set up for that), but it still has a quite large amount of functions hooked up.
  • There are minor details that differ between the implementations of checkasm, in how to pick a subset of functions to benchmark (--bench --function=<pattern> in dav1d, --bench=<pattern> in ffmpeg), how to reduce the number of iterations to benchmark (--runs=8 in ffmpeg, edit checkasm.h in dav1d), and which file to edit to opt out of pmccntr_el0 on Linux. Most of it is very similar though.

I've kept the README.md separate from the dav1d one but rewritten the bits that differ - including finding other examples for good/bad autovectorization, and a similar small guide for mapping a checkasm benchmark back to the C source; I think it's more user friendly to have it that way than to have one readme refer to the other, even if there's some duplication in the bulk text in it.

There's no tag that contains exactly what we need here, but it is usable on a recent release branch, so the git clone instructions check out that branch (and directions to a recent known-good commit on it). I.e., if there'd be larger changes on the branch, we'd need to update the CMakeLists.txt accordingly (adding/removing source files). Once there's a new tag on that branch (potentially in a month or two), I'll update the instructions to clone and point to that instead.

@sjoerdmeijer
Copy link
Contributor

sjoerdmeijer commented Dec 11, 2024

This looks a very good idea to me: it's completely optional, and a very lightweight way to optionally pull in a relatively big and real life workload that can increase our test coverage.

My very first question about how this is different from dav1d is mostly answered here I think:

ffmpeg is a much much larger codebase, with around 2200 files to be built, instead of a bit over 100. This provides much more surface for hitting miscompilations that e.g. trigger asserts in LLVM

This is a good thing, but just out of curiousity I wanted to ask if you can tell if there are other differences? For example, are these codes very similar, or are they different in any way? More testing is good enough on itself, but just trying to get a sense of the added value.

I will give this a try to see how this works and will report back soon.

@mstorsjo
Copy link
Member Author

My very first question about how this is different from dav1d is mostly answered here I think:

ffmpeg is a much much larger codebase, with around 2200 files to be built, instead of a bit over 100. This provides much more surface for hitting miscompilations that e.g. trigger asserts in LLVM

This is a good thing, but just out of curiousity I wanted to ask if you can tell if there are other differences? For example, are these codes very similar, or are they different in any way? More testing is good enough on itself, but just trying to get a sense of the added value.

That's probably the main difference indeed.

dav1d is mostly developed by longtime ffmpeg contributors, so the code style and structure, etc, are very similar. dav1d was developed externally, outside of ffmpeg for license/process reasons as a clean-slate project - dav1d is simplified BSD, while ffmpeg is mostly LGPL (with some minor bits being GPL and BSD or similar). When dav1d was started, some minor changes were made, like using meson as the upstream build system, rather than a custom configure shell script+gnu make, but from the point of view of llvm-test-suite it's quite much the same.

From the point of view of higher level benchmarking, ffmpeg is much more complex though :-)

Outside of microbenchmarking with checkasm, it's kinda easy to benchmark decoding with dav1d; use one single input video clip that exercises most video format features, decode it and measure the time. It's very easy to get a number of e.g. the effect on real-world decoding performance from autovectorization, like measured in https://discourse.llvm.org/t/autovectorization-measurements-from-dav1d-and-ffmpeg/82623. E.g. built with clang without vectorization, decodes at 118 fps, vectorization takes it to 177 fps. Built with GCC without vectorization gives 128 fps, with vectorization 185 fps. And with all handwritten assembly, it's 450 fps.

With ffmpeg supporting hundreds of different input formats, and also supporting encoding many dozens of formats, it'd require a quite significant test matrix, where benchmarking usually wouldn't see much differences, unless you targetedly know that changes that were made specifically affect one specific input/output format. One can of course benchmark the performance of one or a couple specific known important cases though, e.g. decoding H264 or HEVC or similar.

@fhahn
Copy link
Contributor

fhahn commented Dec 17, 2024

Just had a chance to give this a try, but it looks like it traps in memcpy_chk on macOS. macOS uses _chk variants of memory function by default while most other platforms don't, so maybe the issue doesn't show up on other platforms?

frame #2: ffmpeg_checkasm`check_loopfilter_16y + 560
frame #1: libsystem_c.dylib`__memcpy_chk + 40
frame #0: libsystem_c.dylib`__chk_fail_overflow + 24
libsystem_c.dylib`__chk_fail_overflow:
->  0x1817cbd70 <+24>: brk    #0x1

@mstorsjo
Copy link
Member Author

Just had a chance to give this a try, but it looks like it traps in memcpy_chk on macOS. macOS uses _chk variants of memory function by default while most other platforms don't, so maybe the issue doesn't show up on other platforms?

frame #2: ffmpeg_checkasm`check_loopfilter_16y + 560
frame #1: libsystem_c.dylib`__memcpy_chk + 40
frame #0: libsystem_c.dylib`__chk_fail_overflow + 24
libsystem_c.dylib`__chk_fail_overflow:
->  0x1817cbd70 <+24>: brk    #0x1

That's odd - this code does run and gets tested very regularly on macOS.

I just retested this on macOS, built via the llvm-test-suite wrapping, and ran ffmpeg_checkasm without hitting this. I built with stock Clang from Xcode 16.2, but I don't think that bit should matter. I ran it on macOS 14.6.1; I haven't yet upgraded to Sequoia though - that could possibly make for some differences, but it also sounds kinda implausible (any overflows or similar should be kinda consistent across OSes)?

Does this happen for any run, or only with certain input seeds?

If the issue seems to be within check_loopfilter_16y, that's in tests/checkasm/vp8dsp.c; there is one memcpy there, but as far as I can see, it should be entirely within its boundaries. Also, all the ways I try to compile this file, the one memcpy in check_loopfilter_16y just gets expanded to inline ldp/stp instructions as well, no actual function call that can get handled by __memcpy_chk.

I also tested building this with address sanitizer, and it ran checkasm fine without complaining about any of these memcpies.

@sjoerdmeijer
Copy link
Contributor

Forgot to post my results, but I've tried it too and found that it all works OK on our AArch64 linux platforms.

To add the external source, run
"git clone -b release/7.1 https://git.ffmpeg.org/ffmpeg.git"
to clone the ffmpeg sources in the
llvm-test-suite/test-suite-externals directory, or configure with
TEST_SUITE_FFMPEG_ROOT pointing at a directory containing this source.

(As this points to a branch, not a tag, it may be required to check
out the known matching commit 10aaf84f855dbcedb8ee2e3fce307e9b98320946.)

This builds two targets; the ffmpeg command line executable (which
isn't executed as a test) and the ffmpeg checkasm test executable
(which runs compiler generated functions and compares them with
handwritten assembly versions of them). The checkasm execuable
can also be run manually to microbenchmark functions, e.g.
"External/ffmpeg/ffmpeg_checkasm --test=vp9dsp --bench=vp9_avg_8tap_smooth* 0".

It is not very meaningful to benchmark the execution of the whole
checkasm executable, as it runs a different numbers of functions
depending on the number of SIMD extensions available on the target
CPU.

(Benchmarking on aarch64 on Linux currently requires direct access to
the pmccntr_el0 register. To currently use a different
timer register, edit ffmpeg/libavutil/aarch64/timer.h and change it
to use cntvct_el0 instead of pmccntr_el0 on Linux too.)

This uses a static configuration of the project (when building the
upstream project with their own build system, there's a number of
build options that can be configured). Assembly is hooked up and
enabled on i386, x86_64, arm and aarch64.

For architectures other than those, the checkasm test won't have
any reference for detecting e.g. miscompilations of functions,
but building it can still be meaningful (for testing compilation,
or benchmarking the execution of the C version of functions).

Instead of having the CMake configuration check for a large number
of things and generate a config.h file at configure time, things are
either hardcoded, or set up with preprocessor conditions within
the hand-tweaked config.h file.
@mstorsjo
Copy link
Member Author

Forgot to post my results, but I've tried it too and found that it all works OK on our AArch64 linux platforms.

Thanks for checking!

BTW, I noticed that I had forgot to actually commit the final amendments, so the cmakelists and readme pointed to the wrong commit to check out (which lacked a recent commit to make sure it prints benchmarks for all functions, also functions without corresponding assembly implementations) - if you checked out a specific commit manually, check that you're on 10aaf84f855dbcedb8ee2e3fce307e9b98320946 or later. I included that in the update to the PR here as well now.

@mstorsjo
Copy link
Member Author

Just had a chance to give this a try, but it looks like it traps in memcpy_chk on macOS. macOS uses _chk variants of memory function by default while most other platforms don't, so maybe the issue doesn't show up on other platforms?

frame #2: ffmpeg_checkasm`check_loopfilter_16y + 560
frame #1: libsystem_c.dylib`__memcpy_chk + 40
frame #0: libsystem_c.dylib`__chk_fail_overflow + 24
libsystem_c.dylib`__chk_fail_overflow:
->  0x1817cbd70 <+24>: brk    #0x1

Hmm, now I rebuilt with a fresh upstream llvm.org Clang, and I did manage to hit this issue; will look into it!

@fhahn
Copy link
Contributor

fhahn commented Dec 17, 2024

Just had a chance to give this a try, but it looks like it traps in memcpy_chk on macOS. macOS uses _chk variants of memory function by default while most other platforms don't, so maybe the issue doesn't show up on other platforms?

frame #2: ffmpeg_checkasm`check_loopfilter_16y + 560
frame #1: libsystem_c.dylib`__memcpy_chk + 40
frame #0: libsystem_c.dylib`__chk_fail_overflow + 24
libsystem_c.dylib`__chk_fail_overflow:
->  0x1817cbd70 <+24>: brk    #0x1

Hmm, now I rebuilt with a fresh upstream llvm.org Clang, and I did manage to hit this issue; will look into it!

Yep I am also hitting it with a Clang built from main. Might be that the object size computation changed somehow and is now incorrect?

@mstorsjo
Copy link
Member Author

Hmm, now I rebuilt with a fresh upstream llvm.org Clang, and I did manage to hit this issue; will look into it!

Yep I am also hitting it with a Clang built from main. Might be that the object size computation changed somehow and is now incorrect?

Possibly, maybe... But when building ffmpeg's checkasm with the original upstream ffmpeg build system, with the same version of Clang, I'm not hitting this issue, so it's also possible that it's caused by something in the fixed static configuration used here in the llvm-test-suite integration (which I use rather than running one million configure checks for various OS/toolchain features which we usually don't care about) - or if it is an upstream Clang bug which only happens to get uncovered in specific circumstances.

I'm not hitting the same issue when building with a Clang 19.1.x release, so it seems to be related to something that has happened on git main since the last few months, so I can bisect that as well.

@mstorsjo
Copy link
Member Author

Yep I am also hitting it with a Clang built from main. Might be that the object size computation changed somehow and is now incorrect?

Yes, you were right on the money here. I bisected the regression to llvm/llvm-project@f8c1a22 / llvm/llvm-project#117849.

But when building ffmpeg's checkasm with the original upstream ffmpeg build system, with the same version of Clang, I'm not hitting this issue, so it's also possible that it's caused by something in the fixed static configuration used here in the llvm-test-suite integration

Yes, this also was what was happening. Upstream ffmpeg only assumes __attribute__((align())) to work for local variables in some architecture/compiler combinations, and it turns out that nobody has enabled that (to trust it) for aarch64 targets. In the config.h I'm adding here, I have that option hardcoded to #define HAVE_LOCAL_ALIGNED 1; as we're supposed to test the compiler within llvm-test-suite, I assumed we can always trust the compiler to be able to manage it. (Without that, the codebase overallocates local arrays and manually aligns the pointer within them; this throws off the object size calculation.) If I add a suitable enable local_aligned in ffmpeg/configure for aarch64, I also reproduce the same issue with the upstream build system. I can also reproduce the same LLVM bug for x86_64 Linux, if I add -D_FORTIFY_SOURCE=2 while building ffmpeg there.

So, apparently a legitimate compiler bug, caught by the integration of ffmpeg into llvm-test-suite, even before it's merged :-)

With llvm/llvm-project@f8c1a22 reverted, it builds and runs fine on macOS for me. Alternatively, you can also edit the hardcoded External/ffmpeg/config.h included here in llvm-test-suite, and change HAVE_LOCAL_ALIGNED to 0 - that also confuses the object size calculation enough to not make any false assumptions :-)

Copy link
Contributor

@sjoerdmeijer sjoerdmeijer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am happy with this, thanks for doing this.

serge-sans-paille added a commit to serge-sans-paille/llvm-project that referenced this pull request Dec 18, 2024
@mstorsjo
Copy link
Member Author

Seems like this already was merged in 1cea1a9 :-) Closing the PR then.

That commit does seem to lack a couple of the later updates to this PR though (a typo fix in the readme, and fixes for the pinned ffmpeg commit), I'll push fixup commits for that.

@mstorsjo mstorsjo closed this Dec 19, 2024
mstorsjo added a commit that referenced this pull request Dec 19, 2024
1cea1a9 was committed without the
last updates from the originating PR, #188 - fix the references
to the known-good git commits (which were correct in the
commit message, but had been missed to be updated in the files)
and fix a typo.
mstorsjo added a commit to mstorsjo/llvm-project that referenced this pull request Dec 19, 2024
Since llvm/llvm-test-suite#182 and
llvm/llvm-test-suite#188, these projects
can now be added as external projects within llvm-test-suite.
mstorsjo added a commit to mstorsjo/llvm-project that referenced this pull request Dec 19, 2024
Since llvm/llvm-test-suite#182 and
llvm/llvm-test-suite#188, these projects
can now be added as external projects within llvm-test-suite.
serge-sans-paille added a commit to llvm/llvm-project that referenced this pull request Dec 20, 2024
…ated mode instead of … (#120424)

…generating empty location

Fix the regression detected by
llvm/llvm-test-suite#188
mstorsjo added a commit to llvm/llvm-project that referenced this pull request Dec 20, 2024
Since llvm/llvm-test-suite#182 and
llvm/llvm-test-suite#188, these projects can now
be added as external projects within llvm-test-suite.
@mstorsjo mstorsjo deleted the ffmpeg branch December 20, 2024 12:39
@mstorsjo
Copy link
Member Author

For reference here, the fix for the crash in checkasm on macos was fixed in llvm/llvm-project#120424 / llvm/llvm-project@e4db3f0 - and now the ffmpeg checkasm test runs fine on macOS when built through llvm-test-suite.

@fhahn
Copy link
Contributor

fhahn commented Dec 20, 2024

Great news, thanks!

github-actions bot pushed a commit to arm/arm-toolchain that referenced this pull request Jan 10, 2025
…in approximated mode instead of … (#120424)

…generating empty location

Fix the regression detected by
llvm/llvm-test-suite#188
github-actions bot pushed a commit to arm/arm-toolchain that referenced this pull request Jan 10, 2025
Since llvm/llvm-test-suite#182 and
llvm/llvm-test-suite#188, these projects can now
be added as external projects within llvm-test-suite.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants