Skip to content

Adding a tree transformation phase to uncompyle6

R. Bernstein edited this page Jul 11, 2018 · 20 revisions

Table of Contents

Adding a tree-transformation phase to uncompyle6

The problem

In Issue #178 "Poor if statement formatting" it was noted that

if True:
    pass
elif False:
    pass
elif not True:
    pass

is decompiled from bytecode to

if True:
    pass
else:
    if False:
        pass
    else:
        if not True:
            pass

Thoughts around fixing and a note about AST's.

My thoughts on how to structure a decompiler are in Decompilation at Runtime and the Design of a Decompiler for Dynamic Interpreted Languages and in that you'll see in Figure 1 that there is an optional phase in between tree building and final text generation a box called "Macro expansion" where you can transform the tree according to your liking. Here it would be to transform the else: if into elsif while adjusting the indentation.

Since Alexander Sieusahai has expressed interest in undertaking this. Thanks Alex!

He mentions though "ast", and already there is a possible source of confusion here. First, the tree we produce is "abstract" in the sense that it imposes a structure on top of the instructions. And yes, it definitely is a "tree". But it is not abstract in the other sense that that it is used to mean in that it is compact to the point of having only the essential nodes and removes unnecessary syntax. In the case of the source code that would be semicolons or line feed, or instruction in the instruction stream intact.

Also, it is not the same thing as Python's AST. We do however try to keep the nonterminal or tree node names as similar as possible. So while uncompyle2 may have a tree name like designator, in uncompyle6 it is called store to match the corresponding Python AST name Store. Note that in contrast to AST's CamelCase use lowercase snake_case to remind people that this ain't Python's AST. (Are there any volunteers out there to write a translator to convert uncompyle6's tree into Python's AST?)

Design of a Tree-Transformation Phase

Ok, so with all of that stuff out of the way, let's get to work on this!

We basically want to transform one tree into another. This will mean adding another phase to uncompyle6, but to start out let's just add this to the semantics directory, and perhaps later we'll split that out.

Let's look at the tree with this cool recently-added option --tree+ where I include the templating instructions. The code for above you get:

    ifelsestmt (5): ('%|if %c:\n%+%c%-%|else:\n%+%c%-', 0, 1, 3)
         0. testexpr
            testfalse (2)
                 0. expr
                    L.   1       0  LOAD_NAME             0  'True'
                 1. jmp_false (2)
                     0.              3  JUMP_IF_FALSE         4  'to 10'
                     1.              6  POP_TOP
         1. pass: ('%|pass\n',)
         2. jf_cf_pop (3)
             0. L.   2       7  JUMP_FORWARD         23  'to 33'
             1.           10_0  COME_FROM             3  '3'
             2.             10  POP_TOP
         3. else_suite
            suite_stmts
                stmt
                    ifelsestmt (5): ('%|if %c:\n%+%c%-%|else:\n%+%c%-', 0, 1, 3)
                         0. testexpr
                            testfalse (2)
         4.           33_2  COME_FROM             7  '7'
...

Boiling this down a bit we see the that basic templating rule used is: ifelsestmt: ('%|if %c:\n%+%c%-%|else:\n%+%c%-', 0, 1, 3)

The (5) previously shown after ifelsestmt just mean that there are 5 children, but we really only use 3 of them at index 0, 1, and 3 respectively.

If you look at uncompyle6/semantics/consts.py around line 264 you'll see

    'ifelifstmt':	( '%|if %c:\n%+%c%-%c', 0, 1, 3 ),

so this is what we want to get used instead. So what needs to get done is to write some transformation system might have a rule that says if we see a tree pattern like:

ifelsestmt [testexpr testfalse . else_suite [suite_stmts [stmt [ifelsestmt] . ]]

transform that to:

ifelifstmt [testexpr testfalse . ifelsestmt]

A little about the jargon used above to direct tree transformation. Something like ifelsestmt represents a AST object defined in uncompyle6/parsers/astnode. (And yes, I see that name is poorly named. I will have to change that sometime.) The "kind" field of the object would be change from ifelsestmt to ifelifstmt. The brackets indicate that what is enclosed in the brackets are children of the tree node that precede it. And the dot means that you can match any AST object.

I'm sure code to do this kind of thing has been written many times, especially in the LISP community. For example Emacs Lisp has a pcase pattern matching macro. The think can be done stupidly by iterating over the entire tree for each kind of transform. Or clever though is to iterate once and try the set of patterns on each node.

Other places this can be used

The next most glaring situation is where uncompyle6 produces

if not expr:
   RaiseAssertion ...

instead of:

assert expr

Finally I'll point out that there already is a bit of tree hackery inside uncompyle/semantics/pysource.py to remove an assignment to __doc__ from the tree, (see around line 2121) which has:

   if do_doc and self.hide_internal:

or extraneous return statements from the end of the main program, the end of the module, or the end of some functions; See function is_return_none() in pysource.py.

If we had the better tree transformation phase and the rules in a nice dictionary, all of that code would be much cleaner.

Clone this wiki locally