-
-
Notifications
You must be signed in to change notification settings - Fork 434
Operator Precedence
In Python you can write expressions, sometimes in various ways. Consider:
a*x**2 + b*x + c
This is a concise way of writing:
(a*(x**2)) + (b*x) + c
In the more concise expression, parentheses do not appear. It is understood that when an expression with exponentiation (**
) occurs combined with an expression with multiplication (*
), do the exponentiation first and then combine with the multiplication. Similarly, in the absense of parentheses dictating otherwise, perform multiplication (*
) before addition (+
).
Decompilation outputs expressions with parenthesis omitted when they are not necessary. The process by which the semantic actions, or tree phase of the decompiler figures this out is by using a precedence table which specifies which operator to perform when it is appears next to another operator.
Again, operator precedence is used to decide whether or not to add parenthesis around an expression. Note that an operator like multiplication is involved here and in the context of another expression. In the expression:
sin(2 * theta) + pi
the multiplication appears in the expression 2 * theta
, but that expression doesn't butt up with another expression. Instead the expression is used as the argument in a function call. The call is used as the first operand of an addition operation, but the important thing is that two expressions with operators don't appear next to each other as was the case in a*x**2 + b*x + c
.
To recap, in a*x**2
the expression with a multiplication operator appears next to an expression with an exponentiation operator. How can we specify whether this means (a*x)**2
or a*(x**2)
to someone who does not know this?
In Python's the reference manual operators are separated into a hierarchy. Here is an excerpt of a much larger table:
operator | Notes |
---|---|
.. . | ... |
** |
Exponentiation |
+x , -x , ~x
|
Positive, negative, bitwise NOT |
* , @ , / , // , %
|
Multiplication, matrix multiplication, division, floor division, remainder |
... | ... |
In this table, we see that **
appears in a table entry line before *
. And that means to put the expression with **
in a group before grouping an expression with *
when the two expressions are adjacent.
Using the table above, grouping behavior would be identical if we had written a@x**2
, a/x**2
, etc.
The way this is specified in the decompiler though is to give for each operator a numeric value. Here is the corresponding code in consts.py
that captures the above:
PRECEDENCE = {
# ...
"BINARY_POWER": 4, # Exponentiation: **
"unary_op": 6, # Positive, negative, bitwise NOT: +x, -x, ~x
"BINARY_MULTIPLY": 8, # *
"BINARY_MATRIX_MULTIPLY": 8, # @
"BINARY_DIVIDE": 8, # /
"BINARY_FLOOR_DIVIDE": 8, # //
"BINARY_MODULO": 8, # Remainder, %
# ...
}
In the actual dictionary in the Python code, we list things from high number (8) to low (4) rather than as shown above. I ordered things this way just to make the correspondences with the Python reference manual more clear.
The thing to note is that lower numbers, like 4 group first, and higher numbers like 8, group last.
Above, we described how we see things from the human side, and how to specify precedence in a way the decompiler can act on. But from the standpoint of the decompiler, how does this all work?
We will describe this in terms of some examples before summarizing with the general process. Using the example a*x**2
, The decompiler in the semantic or tree phase sees:
1 expr
2 bin_op (BINARY_MULTIPLY, precedence 8)
3 0. expr
4 L. 1 0 LOAD_NAME a
5 1. expr
6 bin_op (BINARY_POWER, precedence 4)
7 0. expr
8 2 LOAD_NAME x
9 1. expr
10 constant
11 4 LOAD_CONST 2
12 2. binary_operator
13 6 BINARY_POWER
14 2. binary_operator
15 8 BINARY_MULTIPLY
...
As we traverse from the top-level tree for expr
at line 1 down, four more expr
s are encountered below it. The expr
at line 3 we don't have to worry about since that covers LOAD_NAME
at line 4 and LOAD_NAME
is not an operator. However, the tree node for the expr
at line 5 we do have to take into account operator precedence.
At the expr
tree node at line 5, the precedence value set from line 2 in BINARY_MULTIPLY
is passed down. The expr
first child in line 6 is bin_op
, an operator which has a precedence of 4. Since parent precedence value 8 is greater than child precedence value 4, the expression with the parent should group first. When the parent groups first, we don't have to add parenthesis.
The other expr
s under the second bin_op
we don't have to worry about either since, again, those involve LOAD_NAME
or LOAD_CONST
which are not operators.
Now let's compare that with (a*x)**2
:
The decompiler in the semantic or tree phase sees:
1 expr
2 bin_op (BINARY_POWER, precedence 4)
3 0. expr
4 bin_op (BINARY_MULTIPLY, precedence 8)
5 0. expr
6 L. 1 0 LOAD_NAME a
7 1. expr
8 2 LOAD_NAME x
9 2. binary_operator
10 4 BINARY_MULTIPLY
11 1. expr
12 constant
13 6 LOAD_CONST 2
14 2. binary_operator
15 8 BINARY_POWER
Again, as we descend the top-level tree for expr
at line 1, our more expr
s are seen: the first expr
at line 3 involving another bin_op
at line 4.
Passed down to the expr
at line 3 is the precedence BINARY_POWER
with value 4. But in contrast to its bin_op
parent at line 2, the bin_op
child at line 4 has a precedence value 8. Since the parent value 4 is less than the child value 8, the child expression needs to group first and that is done by adding a parentheses in the handling of the expr
at line 3.
In general, in the function that handles expressions, n_expr
, a precedence value is passed to it via self.prec
, That is compared to the expression's single child, if that happens to be an operator with a precedence value. If the precedence value of the child is less than the precedence value of the parent, then a set parenthesis needs to be added.
to be continued...