Skip to content

Avoid branches in ZEND_BOOL/ZEND_BOOL_NOT #4912

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

TysonAndre
Copy link
Contributor

@TysonAndre TysonAndre commented Nov 15, 2019

gcc 5/9 will add 2 (IS_FALSE) to the boolean result of 1/0 to compute
the value to store for the ZVAL_BOOL call.

Additionally, this puts all of the reads of Z_TYPE_INFO_P before writes
to memory (ZVAL_BOOL),
so that php doesn't have to load from memory into a register again.

(gcc 5 would use branches for evaluating ZVAL_BOOL if ZVAL_BOOL was
executed before comparing against IS_UNDEF)

A contrived example heavily using ZVAL_BOOL with poor branch prediction

// With gcc 9, cpufreq 2.5GZ
// before: 0.305 seconds with 1000, 10000
// after   0.286 seconds with 1000, 10000
function loop_bool_not(int $a, int $b) {
  srand($a);
  $values = [];
  for ($i = 0; $i < $a; $i++) {
    $values[] = (bool)(\rand() & 1);
  }
  $count = [];
  for ($i = 0; $i < $b; $i++) {
    foreach ($values as $v) {
      $count[!$v]++;
    }
  }
  return $count;
}

The true then falsey check was first added in
e6180eb

Miscellaneous thoughts:

  • extremely predictable loops might perform worse due to extra instructions - real use cases are hopefully not like that
  • this should benefit non-bool cases such as return !$int slightly due to one less comparison (SCCP and opcache help remove this opcode for obvious booleans such as (bool)is_string($x))

Before:

00000000006b1600 <ZEND_BOOL_SPEC_CV_HANDLER>:
  6b1600:	53                   	push   rbx
  6b1601:	49 63 7f 08          	movsxd rdi,DWORD PTR [r15+0x8]
  6b1605:	4c 01 f7             	add    rdi,r14
  6b1608:	83 7f 08 03          	cmp    DWORD PTR [rdi+0x8],0x3 -- This loads Z_TYPE_INFO_P the first time
  6b160c:	74 22                	je     6b1630 <ZEND_BOOL_SPEC_CV_HANDLER+0x30> -- this compares == IS_TRUE and jumps
  6b160e:	77 30                	ja     6b1640 <ZEND_BOOL_SPEC_CV_HANDLER+0x40> -- this jumps for > IS_TRUE
  6b1610:	49 63 47 10          	movsxd rax,DWORD PTR [r15+0x10]
  6b1614:	41 c7 44 06 08 02 00 	mov    DWORD PTR [r14+rax*1+0x8],0x2 -- This stores IS_FALSE into the return value
  6b161b:	00 00 
  6b161d:	8b 4f 08             	mov    ecx,DWORD PTR [rdi+0x8] -- This loads Z_TYPE_INFO_P from RAM a second time
  6b1620:	85 c9                	test   ecx,ecx
  6b1622:	74 44                	je     6b1668 <ZEND_BOOL_SPEC_CV_HANDLER+0x68>
  6b1624:	49 83 c7 20          	add    r15,0x20
  6b1628:	5b                   	pop    rbx
  6b1629:	c3                   	ret 
... the rest of the instructions are left out - 

After:

00000000006b1660 <ZEND_BOOL_SPEC_CV_HANDLER>:
  6b1660:	55                   	push   rbp
  6b1661:	53                   	push   rbx
  6b1662:	48 83 ec 08          	sub    rsp,0x8
  6b1666:	49 63 7f 08          	movsxd rdi,DWORD PTR [r15+0x8]
  6b166a:	4c 01 f7             	add    rdi,r14
  6b166d:	8b 47 08             	mov    eax,DWORD PTR [rdi+0x8]
  6b1670:	83 f8 03             	cmp    eax,0x3
  6b1673:	77 2b                	ja     6b16a0 <ZEND_BOOL_SPEC_CV_HANDLER+0x40>  -- this jumps for > IS_BOOL
  6b1675:	85 c0                	test   eax,eax
  6b1677:	74 4f                	je     6b16c8 <ZEND_BOOL_SPEC_CV_HANDLER+0x68> -- this jumps for IS_UNDEF
  6b1679:	49 63 4f 10          	movsxd rcx,DWORD PTR [r15+0x10]
  6b167d:	83 f8 03             	cmp    eax,0x3 -- this computes IS_FALSE or IS_TRUE without branching
  6b1680:	be 02 00 00 00       	mov    esi,0x2
  6b1685:	0f 45 c6             	cmovne eax,esi
  6b1688:	4d 8d 7f 20          	lea    r15,[r15+0x20]
  6b168c:	41 89 44 0e 08       	mov    DWORD PTR [r14+rcx*1+0x8],eax
  6b1691:	48 83 c4 08          	add    rsp,0x8
  6b1695:	5b                   	pop    rbx
  6b1696:	5d                   	pop    rbp
  6b1697:	c3                   	ret    
... the rest of the instructions are left out - 

gcc 5/9 will add 2 (IS_FALSE) to the boolean result of 1/2 to compute
the value to store for the ZVAL_BOOL call.

Additionally, this puts all of the reads of Z_TYPE_INFO_P before writes
to memory (ZVAL_BOOL),
so that php doesn't have to load from memory into a register again.

(gcc 5 would use branches for evaluating ZVAL_BOOL if ZVAL_BOOL was
executed before comparing against IS_UNDEF)

A contrived example heavily using ZVAL_BOOL with poor branch prediction

```php
// With gcc 9, cpufreq 2.5GZ
// before: 0.305 seconds with 1000, 10000
// after   0.286 seconds with 1000, 10000
function loop_bool_not(int $a, int $b) {
  srand($a);
  $values = [];
  for ($i = 0; $i < $a; $i++) {
    $values[] = (bool)(\rand() & 1);
  }
  $count = [];
  for ($i = 0; $i < $b; $i++) {
    foreach ($values as $v) {
      $count[!$v]++;
    }
  }
  return $count;
}
```

The true then falsey check was first added in
e6180eb
@dstogov
Copy link
Member

dstogov commented Nov 15, 2019

@TysonAndre I don't think this micro-optimization makes a lot of sense.

It may improve some special cases in hot loops, but also may lead to degradation, if CPU can't predict two conditional branches before the one that you optimize.

Reloading type from L1 cache is less expensive, than complication of prologue/epilogue.

Anyway, the effect it too small even on synthetic test.

@TysonAndre
Copy link
Contributor Author

TysonAndre commented Nov 15, 2019

If nobody thinks this refactoring makes sense, there's still the below refactoring in BOOL_NOT and BOOL I could make. This avoids the second read from memory into the register of Z_TYPE_INFO_P, because the compiler can't tell that ZVAL_TRUE isn't writing to the same byte as the return value's type.

--- a/Zend/zend_vm_def.h
+++ b/Zend/zend_vm_def.h
@@ -940,8 +940,9 @@ ZEND_VM_COLD_CONST_HANDLER(14, ZEND_BOOL_NOT, CONST|TMPVAR|CV, ANY)
        if (Z_TYPE_INFO_P(val) == IS_TRUE) {
                ZVAL_FALSE(EX_VAR(opline->result.var));
        } else if (EXPECTED(Z_TYPE_INFO_P(val) <= IS_TRUE)) {
+               const uint32_t val_type = Z_TYPE_INFO_P(val);
                ZVAL_TRUE(EX_VAR(opline->result.var));
-               if (OP1_TYPE == IS_CV && UNEXPECTED(Z_TYPE_INFO_P(val) == IS_UNDEF)) {
+               if (OP1_TYPE == IS_CV && UNEXPECTED(val_type == IS_UNDEF)) {

@dstogov
Copy link
Member

dstogov commented Nov 15, 2019

The VM optimization become an endless fight with C compilers long time ago :)
I see your point, but I don't think we should go so deep without good results (may be go much deeper and reimplement VM on lower level, like LuaJIT does).
It's a question, what is better reload from L1 or add instruction to load value into register.
In any case, this is not a bottleneck for PHP typical loads: ~70% CPU frontend cycles idle, I suppose, mainly because of poor branch prediction of indirect branches and I-cache misses, because of code bloat and bad code locality (PGO makes ~10% improvement on real-life apps).

TysonAndre added a commit to TysonAndre/php-src that referenced this pull request Nov 15, 2019
And `$x = !$x`

Noticed while working on phpGH-4912

The included test would not emit undefined variable errors in php 8.0
with opcache enabled. The command used:

```
php -d zend_extension=opcache.so --no-php-ini -d error_reporting=E_ALL \
    -d opcache.file_cache= -d opcache.enable_cli=1  test.php
```
TysonAndre added a commit to TysonAndre/php-src that referenced this pull request Nov 15, 2019
And `$x = !$x`

Noticed while working on phpGH-4912

The included test would not emit undefined variable errors in php 8.0
with opcache enabled. The command used:

```
php -d zend_extension=opcache.so --no-php-ini -d error_reporting=E_ALL \
    -d opcache.file_cache= -d opcache.enable_cli=1  test.php
```
TysonAndre added a commit to TysonAndre/php-src that referenced this pull request Nov 15, 2019
And `$x = !$x`

Noticed while working on phpGH-4912

The included test would not emit undefined variable errors in php 8.0
with opcache enabled. The command used:

```
php -d zend_extension=opcache.so --no-php-ini -d error_reporting=E_ALL \
    -d opcache.file_cache= -d opcache.enable_cli=1  test.php
```
TysonAndre added a commit to TysonAndre/php-src that referenced this pull request Nov 15, 2019
And `$x = !$x`

Noticed while working on phpGH-4912

The included test would not emit undefined variable errors in php 8.0
with opcache enabled. The command used:

```
php -d zend_extension=opcache.so --no-php-ini -d error_reporting=E_ALL \
    -d opcache.file_cache= -d opcache.enable_cli=1  test.php
```
@TysonAndre
Copy link
Contributor Author

Closing in favor of #4914 - this gets called far less often than I thought.

@TysonAndre TysonAndre closed this Nov 16, 2019
php-pulls pushed a commit that referenced this pull request Nov 18, 2019
And `$x = !$x`

Noticed while working on GH-4912

The included test would not emit undefined variable errors in php 8.0
with opcache enabled. The command used:

```
php -d zend_extension=opcache.so --no-php-ini -d error_reporting=E_ALL \
    -d opcache.file_cache= -d opcache.enable_cli=1  test.php
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants