Skip to content

Add flag to build compiler options with G++ on macOS/MacPorts for Tesseract 4.x. #1474

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Apr 13, 2018

Conversation

ic
Copy link
Contributor

@ic ic commented Apr 13, 2018

Building with G++ on Darwin breaks with the MacPorts toolchain, when either AVX, AVX2, or SSE4.1 compiler option is set, unless G++ is actually CLANG.

This PR allows building with G++, by asking G++ to delegate assembly to the clang integrated assembler, instead of the GNU one.

Situation

GNU compilers choke when assembling with AS. After autogen.sh and configure, make ends quickly with such error trace:

# ... Cut the beginning of the trace
libtool: compile:  g++ -DHAVE_CONFIG_H -I. -I.. -O2 -DNDEBUG -I../ccstruct -I../ccutil -I../viewer -I/opt/local/include/leptonica -I/opt/local/include/pango-1.0 -I/opt/local/include/glib-2.0 -I/opt/local/lib/glib-2.0/include -I/opt/local/include -I/opt/local/include/cairo -I/opt/local/include/glib-2.0 -I/opt/local/lib/glib-2.0/include -I/opt/local/include -I/opt/local/include/pixman-1 -I/opt/local/include -I/opt/local/include/freetype2 -I/opt/local/include -I/opt/local/include/freetype2 -I/opt/local/include/libpng16 -I/opt/local/include -mavx -g -O2 -std=c++11 -MT libtesseract_avx_la-dotproductavx.lo -MD -MP -MF .deps/libtesseract_avx_la-dotproductavx.Tpo -c dotproductavx.cpp  -fno-common -DPIC -o .libs/libtesseract_avx_la-dotproductavx.o
/var/folders/dy/7c6v04gn4x96qz9w1xrl_6r00000gn/T//cc9mtjB1.s:46:no such instruction: `vmovapd (%rdi), %ymm0'
/var/folders/dy/7c6v04gn4x96qz9w1xrl_6r00000gn/T//cc9mtjB1.s:47:no such instruction: `vmulpd (%rsi), %ymm0,%ymm0'
/var/folders/dy/7c6v04gn4x96qz9w1xrl_6r00000gn/T//cc9mtjB1.s:61:no such instruction: `vmovapd (%rdi,%rax,8), %ymm1'
# ... several more missing instructions
make[3]: *** [libtesseract_avx_la-dotproductavx.lo] Error 1
make[2]: *** [all-recursive] Error 1
make[1]: *** [all-recursive] Error 1
make: *** [all] Error 2

This is a common issue on macOS, where we need to ask the compiler to use the clang integrated assembler by passing -Wa,-q.

The problem happens at least from tag 4.0.0-beta.1 to the present master at 742a0875077e16f0010e99f1bf05823cd9bb725d.

Tests conducted

Configuration

  • Commit: 742a0875077e16f0010e99f1bf05823cd9bb725d (current master at this time)
  • macOS 10.12.6
  • MacPorts 2.4.2
  • Build with /opt/local/bin/g++, itself built with MacPorts (g++ (MacPorts gcc6 6.4.0_0) 6.4.0)
  • Leptonica 1.75.3
  • Machine with AVX, AVX2, SSE4.1 support.

Tests after the fix

Simple make

  • Builds fine until completion.
  • CLI trial, from the repository home:
export TEST_IMG=/tmp/tesseract_test.jpg
curl -o $TEST_IMG https://i.ytimg.com/vi/4KhEQKyByXs/maxresdefault.jpg
export TESSDATA_PREFIX=/path/to/4.0/tessdata
./api/.libs/tesseract $TEST_IMG stdout -l eng

Results in:

Warning. Invalid resolution 0 dpi. Using 70 instead.
Estimating resolution as 2122
SAMPLE TEXT

make training

  • Builds fine until completion.
  • Same CLI trial as above.

Extra Notes

  • I am aware of this issue/documentation where the build seems to work fine on macOS 10.13 with Homebrew. Here I build with MacPorts, so the problem may be due to this toolchain.

ic added 2 commits April 13, 2018 09:39
Building with G++ on Darwin breaks when either AVX, AVX2, or SSE4.1
compiler option is set, unless G++ is actually CLANG.

This commit allows to build with G++, by asking G++ to delegate assembly
to the clang integrated assembler, instead of the GNU one.
@zdenop zdenop merged commit 1d4959c into tesseract-ocr:master Apr 13, 2018
@amitdo
Copy link
Collaborator

amitdo commented Apr 13, 2018

@stweil
Copy link
Member

stweil commented Apr 20, 2018

This change introduced a regression: now building with the standard compiler (clang++) no longer works on macOS.

if $avx -o $avx2 -o $sse41; then
case "${host_os}" in
*darwin* | *-macos10*)
if test -z "$CLANG"; then
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

$CLANG is always undefined here, so the condition is always true.

# When using AVX, AVX2, or SSE4.1:
# Must tell AS to use clang integrated assembler,
# instead of the GNU based system assembler.
CXXFLAGS="$CXXFLAGS -Wa,-q"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang++ fails with those flags which are always set.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CLANG is defined here:
[CLANG=yes], [CLANG=no])

Your code is placed above it.

@amitdo
Copy link
Collaborator

amitdo commented Apr 20, 2018

.travis.yml:

os:
- linux
#- osx

Why?

@stweil
Copy link
Member

stweil commented Apr 20, 2018

It was removed in commit 63e1862.

@stweil
Copy link
Member

stweil commented Apr 20, 2018

As Travis uses cmake, it would not have found the regression which was only for autoconf.

@amitdo
Copy link
Collaborator

amitdo commented Apr 20, 2018

Do you know how to change travis.yml to use autotools on macOS?

@stweil
Copy link
Member

stweil commented Apr 20, 2018

Yes, I think so. At least for Linux it should be easy:‌ just replace the build commands for Leptonica and Tesseract.

@amitdo
Copy link
Collaborator

amitdo commented Apr 20, 2018

I didn't realize it uses CMake on Linux too.

@ic
Copy link
Contributor Author

ic commented Apr 22, 2018 via email

@amitdo
Copy link
Collaborator

amitdo commented Apr 22, 2018

The other places don't have this issue.

@stweil
Copy link
Member

stweil commented Apr 22, 2018

@ic, I was not able to successfully build git master with x86_64-apple-darwin17-g++ from MacPorts. It did not find a clang assember in /opt/local/bin. What about reverting your commit until there is a better solution? That would at least fix the default build with Apple's clang compiler.

@stweil
Copy link
Member

stweil commented Apr 22, 2018

I wonder whether we really have to support a g++ which cannot handle AVX instructions. Isn't that a buggy installation which has to be fixed in MacPorts?

And if we work around that problem, it should be sufficient to add -Wa,q for those files in arch which really use AVX (and maybe also SSE4.1) instructions.

@ic
Copy link
Contributor Author

ic commented Apr 23, 2018

@stweil I have tried building from master now with x86_64-apple-darwin17-g++ from MacPorts (I was on darwin16 so far). It worked, but I already have clang installed with MacPorts too. Would that be a hidden requirement? I could not reproduce your error.

Fixing the default build is top priority, though. So I have followed the lead from @amitdo and removed the dependency to CLANG. The result for now:

  • It builds fine on the latest 4.x tag (40f4311) with g++ and clang++ (Autoconf).
  • It now builds fine on master It does not work on master with clang++ under Autoconf, ending with a ld: symbol(s) not found for architecture x86_64 (clang version 6.0.0 (tags/RELEASE_600/final), x86_64-apple-darwin17.5.0).
  • It builds on master with g++ under Autoconf, and with clang++ under Cmake.

Build environments:

> g++ --version
g++ (MacPorts gcc6 6.4.0_0) 6.4.0
> clang++ --version
clang version 6.0.0 (tags/RELEASE_600/final)
Target: x86_64-apple-darwin17.5.0
Thread model: posix
InstalledDir: /opt/local/libexec/llvm-6.0/bin

The patch, before any tentative PR or rollback:

diff --git a/configure.ac b/configure.ac
index 6e5673ef..8f6bbce0 100644
--- a/configure.ac
+++ b/configure.ac
@@ -135,8 +135,8 @@ fi
 if $avx -o $avx2 -o $sse41; then
   case "${host_os}" in
     *darwin* | *-macos10*)
-       if test -z "$CLANG"; then
-         # When using AVX, AVX2, or SSE4.1:
+       if test g++ = "$CXX"; then
+         # When using AVX, AVX2, or SSE4.1 with g++:
          # Must tell AS to use clang integrated assembler,
          # instead of the GNU based system assembler.
          CXXFLAGS="$CXXFLAGS -Wa,-q"

On g++ from MacPorts (and HomeBrew):
The as documentation on Darwin provides a flag to switch between clang and GNU assemblers (see also @amitdo link to Homebrew/legacy-homebrew#28984). Is it a bug or just how Apple's stack has been designed? No idea, but the Apple ecosystem seems to lead to using that flag for AVX and the likes low-level optimizations. The baseline to me is it would be great the build script just deals with the different scenarios.

@ic
Copy link
Contributor Author

ic commented Apr 23, 2018

For the record, if this PR is rolled back, here is how I was proceeding before:

./autogen.sh && ./configure CXXFLAGS=-Wa,-q

When building with g++.

Please note that the configure.ac commands default to g++:

AC_PROG_CXX([g++ clang++])

On line 13 at this time.

@ic
Copy link
Contributor Author

ic commented Apr 23, 2018

For TravisCI to test building with Autotools on macOS, any interest in scripts like this one ? This is to build Tesseract with Autotools, and I have another one for Leptonica (both used to packaged for running locally on Darwin and remotely on AWS Lambda).

@stweil
Copy link
Member

stweil commented Apr 23, 2018

Should we change the preferred compiler order for macOS?

@ic
Copy link
Contributor Author

ic commented Apr 23, 2018

@stweil Anyone deeper into the build system to ask to?

Simple opinion: Given the situation and cmake using clang anyway, it makes sense.

ic added a commit to ic/tesseract that referenced this pull request Apr 23, 2018
g++ builds require extra flags rejected by clang++. The bug is that the
flags are actually added unconditionally. This commit fixes the
condition.

See tesseract-ocr#1474
@ic
Copy link
Contributor Author

ic commented Apr 23, 2018

#1516 is a proposal based on the discussion here. It just addresses fixing the build with clang. The other discussion items should probably be different PRs.

noahmetzger pushed a commit to noahmetzger/tesseract that referenced this pull request Jul 31, 2018
g++ builds require extra flags rejected by clang++. The bug is that the
flags are actually added unconditionally. This commit fixes the
condition.

See tesseract-ocr#1474
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants