Skip to content

Operator Precedence

R. Bernstein edited this page Jan 11, 2022 · 5 revisions

What is Operator Precedence?

In Python you can write expressions, sometimes in various ways. Consider:

a*x**2 + b*x + c

This is a concise way of writing:

(a*(x**2)) + (b*x) + c

In the more concise expression, parentheses do not appear. It is understood that when an expression with exponentiation (**) occurs combined with an expression with multiplication (*), do the exponentiation first and then combine with the multiplication. Similarly, in the absense of parentheses dictating otherwise, perform multiplication (*) before addition (+).

Decompilation outputs expressions with parenthesis omitted when they are not necessary. The process by which the semantic actions, or tree phase of the decompiler figures this out is by using a precedence table which specifies which operator to perform when it is appears next to another operator.

Again, operator precedence is used to decide whether or not to add parenthesis around an expression. Note that an operator like multiplication is involved here and in the context of another expression. In the expression:

sin(2 * theta) + pi

the multiplication appears in the expression 2 * theta, but that expression doesn't butt up with another expression. Instead the expression is used as the argument in a function call. The call is used as the first operand of an addition operation, but the important thing is that two expressions with operators don't appear next to each other as was the case in a*x**2 + b*x + c.

How does operator precedence work?

To recap, in a*x**2 the expression with a multiplication operator appears next to an expression with an exponentiation operator. How can we specify whether this means (a*x)**2 or a*(x**2) to someone who does not know this?

In Python's the reference manual operators are separated into a hierarchy. Here is an excerpt of a much larger table:

operator Notes
.. . ...
** Exponentiation
+x, -x, ~x Positive, negative, bitwise NOT
*, @, /, //, % Multiplication, matrix multiplication, division, floor division, remainder
... ...

In this table, we see that ** appears in a table entry line before *. And that means to put the expression with ** in a group before grouping an expression with * when the two expressions are adjacent. Using the table above, grouping behavior would be identical if we had written a@x**2, a/x**2, etc.

The way this is specified in the decompiler though is to give for each operator a numeric value. Here is the corresponding code in consts.py that captures the above:

PRECEDENCE = {
    # ...
    "BINARY_POWER":           4,  # Exponentiation: **
    "unary_op":               6,  # Positive, negative, bitwise NOT: +x, -x, ~x
    "BINARY_MULTIPLY":        8,  # *
    "BINARY_MATRIX_MULTIPLY": 8,  # @
    "BINARY_DIVIDE":          8,  # /
    "BINARY_FLOOR_DIVIDE":    8,  # //
    "BINARY_MODULO":          8,  # Remainder, %
	# ...
	}

In the actual dictionary in the Python code, we list things from high number (8) to low (4) rather than as shown above. I ordered things this way just to make the correspondences with the Python reference manual more clear.

The thing to note is that lower numbers, like 4 group first, and higher numbers like 8, group last.

Examples

Above, we described how we see things from the human side, and how to specify precedence in a way the decompiler can act on. But from the standpoint of the decompiler, how does this all work?

We will describe this in terms of some examples before summarizing with the general process. Using the example a*x**2, The decompiler in the semantic or tree phase sees:

 1  expr
 2  bin_op (BINARY_MULTIPLY, precedence 8)
 3       0. expr
 4            L.   1        0  LOAD_NAME                a
 5       1. expr
 6          bin_op (BINARY_POWER, precedence 4)
 7               0. expr
 8                          2  LOAD_NAME                x
 9               1. expr
10                   constant
11                          4  LOAD_CONST               2
12               2. binary_operator
13                          6  BINARY_POWER
14       2. binary_operator
15                          8  BINARY_MULTIPLY
...

As we traverse from the top-level tree for expr at line 1 down, four more exprs are encountered below it. The expr at line 3 we don't have to worry about since that covers LOAD_NAME at line 4 and LOAD_NAME is not an operator. However, the tree node for the expr at line 5 we do have to take into account operator precedence.

At the expr tree node at line 5, the precedence value set from line 2 in BINARY_MULTIPLY is passed down. The expr first child in line 6 is bin_op, an operator which has a precedence of 4. Since parent precedence value 8 is greater than child precedence value 4, the expression with the parent should group first. When the parent groups first, we don't have to add parenthesis.

The other exprs under the second bin_op we don't have to worry about either since, again, those involve LOAD_NAME or LOAD_CONST which are not operators.

Now let's compare that with (a*x)**2:

The decompiler in the semantic or tree phase sees:

 1 expr
 2  bin_op (BINARY_POWER, precedence 4)
 3       0. expr
 4          bin_op (BINARY_MULTIPLY, precedence 8)
 5               0. expr
 6                   L.   1         0  LOAD_NAME                a
 7               1. expr
 8                                  2  LOAD_NAME                x
 9               2. binary_operator
10                                  4  BINARY_MULTIPLY
11       1. expr
12           constant
13                                  6  LOAD_CONST               2
14       2. binary_operator
15                                  8  BINARY_POWER

Again, as we descend the top-level tree for expr at line 1, our more exprs are seen: the first expr at line 3 involving another bin_op at line 4. Passed down to the expr at line 3 is the precedence BINARY_POWER with value 4. But in contrast to its bin_op parent at line 2, the bin_op child at line 4 has a precedence value 8. Since the parent value 4 is less than the child value 8, the child expression needs to group first and that is done by adding a parentheses in the handling of the expr at line 3.

In general, in the function that handles expressions, n_expr, a precedence value is passed to it via self.prec, That is compared to the expression's single child, if that happens to be an operator with a precedence value. If the precedence value of the child is less than the precedence value of the parent, then a set parenthesis needs to be added.

%p and %P specifiers

to be continued...

Clone this wiki locally