Implement foundation for detecting partially defined vars #13601

ilinum · 2022-09-03T23:13:08Z

Description

This diff lays the foundation for detecting partially defined variables. Think of the following situation:

if foo():
    x = 1
print(x)  # Error: "x" may be undefined.

Now, mypy will generate the error in such a case.

Note that this diff is not complete. It still generates a lot of false positives. Running it on mypy itself generated 182 errors.
Therefore, this feature is hidden behind a flag and I will implement it in multiple PRs.

Test Plan

Added tests.

ilinum · 2022-09-03T23:13:52Z

mypy/build.py

@@ -2340,6 +2342,11 @@ def finish_passes(self) -> None:
        manager = self.manager
        if self.options.semantic_analysis_only:
            return
+        if manager.options.disallow_undefined_vars:
+            manager.errors.set_file(self.xpath, self.tree.fullname, options=manager.options)
+            self.tree.accept(


I'm not sure if this was the right place to plug this in. It needed to be done before self.free_state() is called.

I think that you could add another method, such as detect_partially_undefined_vars (feel free to alter the name) that performs the analysis if enabled, and call it from process_stale_scc after calling type_check_second_pass and before calling finish_passes. I.e., this would be an additional pass after type checking.

ilinum · 2022-09-03T23:17:15Z

mypy/undefined_vars.py

+    """DefinedVariableTracker manages the state and scope for the UndefinedVariablesVisitor."""
+
+    def __init__(self) -> None:
+        # todo(stas): we should initialize this with some variables.


I'd like to evaluate imports in some way in order to avoid false positives. Currently, this won't work:

from foo import x if bool(): x = 10 y = x # This will incorrectly generate an error.

Is there a good way of doing this? Anywhere I should start looking?

Can you treat from foo import x similarly to the assignment x = ... (the value of x doesn't make a difference to the analysis)?

That would work some of the time. However, this doesn't work for * imports (i.e. from foo import *). After semantic analyzer, mypy should know which symbols are being imported, so it should be possible to get this info somehow.

cdce8p

I'm not sure if it makes sense for this check to be part of mypy directly as it isn't related to typing itself. E.g. I know that pylint does implement something similar. From my experience there, this is one of the most common sources for false-positives.

Some examples I found while testing the changes here

from typing import Any

def f1(val: Any) -> None:
    for x in (1, 2, 3):
        if x == val:
            x = 1
        print(x)  # false-positive


def f2() -> None:
    d: dict[str, Any] = {}
    if (x := d.get("val")) is None:
        x = 2
    print(x)  # false-positive


def f3(val: Any) -> None:
    x = 1
    if val:
        if val == 2:
            x = 2
        print(x)  # false-positive

cdce8p · 2022-09-04T10:47:56Z

mypy/main.py

+    # Experiment flag to detect undefined variables being used.
+    add_invertible_flag("--disallow-undefined-vars", default=False, help=argparse.SUPPRESS)


An alternative would be an error code which is disabled by default. Take a look at mypy/errorcodes.py and TRUTHY_BOOL as example.

That is interesting! I don't have a strong opinion here. I think we'll want an error code eventually for type: ignore[partially-defined]. I'm guessing that if we have an error code, we shouldn't have a separate flag? So we should probably just make it an error code now. Is my understanding correct here?

I think we'll want an error code eventually for type: ignore[partially-defined]

IMO it would definitely make sense for it to have a separate error-code. Reusing misc doesn't feel right.

I'm guessing that if we have an error code, we shouldn't have a separate flag? So we should probably just make it an error code now. Is my understanding correct here?

Some error codes might also have flags but my understanding is that these are all legacy cases. The idea is to move more things to error-codes in the future.

I agree that it would be better to have a dedicated error code for this (that is not enabled by default) and not have a dedicated flag. We'd only run the analysis if the error code is enabled.

These error codes are nice :) Done!

ilinum · 2022-09-04T18:11:13Z

Thank you for taking a look!

I'm not sure if it makes sense for this check to be part of mypy directly as it isn't related to typing itself.

I guess it depends on what you think mypy is. If you think of mypy as a type checker, then yes, this doesn't fit. If you think of mypy as a static analysis tool, then this feature does fit. Note that mypy has similar features already (stuff like --warn-unreachable). I am wondering: perhaps this feature is better implemented as a plugin?

... I know that pylint does implement something similar. this is one of the most common sources for false-positives.

The feature we have today is incomplete (e.g. it doesn't support for loops). Therefore, with the current implementation, there will be lots of false positives. The first two you pointed out are just from unsupported language features. The last one was a bug (which is now fixed).

I would love to hear what pylint finds as false positives right now. My assertion is that once this feature is fully implemented over multiple PRs there should be relatively few false positives. However, there may be things that I am missing that make this inherently impossible to do -- I would love to hear about this now and not when I've spent more time working on this :)

cdce8p · 2022-09-05T10:51:15Z

I'm not sure if it makes sense for this check to be part of mypy directly as it isn't related to typing itself.

I guess it depends on what you think mypy is. If you think of mypy as a type checker, then yes, this doesn't fit. If you think of mypy as a static analysis tool, then this feature does fit. Note that mypy has similar features already (stuff like --warn-unreachable). I am wondering: perhaps this feature is better implemented as a plugin?

True, the unreachable check is fairly similar to this one. Even with the number of potential issues topic-reachability Detecting unreachable code
Another argument would be that pyright has something similar with possibly unbound.

... I know that pylint does implement something similar. this is one of the most common sources for false-positives.

The feature we have today is incomplete (e.g. it doesn't support for loops). Therefore, with the current implementation, there will be lots of false positives. The first two you pointed out are just from unsupported language features. The last one was a bug (which is now fixed).

I would love to hear what pylint finds as false positives right now. My assertion is that once this feature is fully implemented over multiple PRs there should be relatively few false positives. However, there may be things that I am missing that make this inherently impossible to do -- I would love to hear about this now and not when I've spent more time working on this :)

I don't think it's impossible, it'll just require constant work. There will always be some obscure edge case that wasn't considered and of course new language features which are added over time won't make it any easier.

One of the most common issues for pylint AFAIK, and any other static code analyzer for that matter, is understanding control flow. At some point the added complexity just isn't worth the benefit and you just accept a false-positive instead.

My assertion is that once this feature is fully implemented over multiple PRs there should be relatively few false positives.

I would recommend fixing the known false-positives as early as possible. The bug reports will add up quickly anyway.

Some more false-positives I found

from typing import Any

def f4(val: Any) -> None:
    if val == 1:
        x = 1
    else:
        return None

    print(x)  # false-positive


def f5(val: Any) -> None:
    if val == 1:
        x = 1
    elif val == 2:
        x = 2
    else:
        return None

    print(x)  # false-positive


def f6(val: list[int]) -> None:
    for i in val:
        x = 1
        if i == 2:
            break
    else:
        return None

    print(x)  # false-positive


def f7(val: Any) -> None:
    def x() -> None: ...
    if val:
        x = x
    print(x)  # false-positive


def f8(val: Any) -> None:
    if val == 1:
        x = 1
    else:
        assert False

    print(x)  # false-positive


def f9(val: Any) -> None:
    while True:
        if val == 1:
            x = 1
        elif val == 2:
            continue
        else:
            break
        print(x)  # false-positive

JukkaL

This is a good start! I looked at the false positives from self check, and many of them would be fixed by adding support for return statements.

JukkaL · 2022-09-05T15:01:09Z

mypy/main.py

+    # Experiment flag to detect undefined variables being used.
+    add_invertible_flag("--disallow-undefined-vars", default=False, help=argparse.SUPPRESS)


I agree that it would be better to have a dedicated error code for this (that is not enabled by default) and not have a dedicated flag. We'd only run the analysis if the error code is enabled.

JukkaL · 2022-09-05T15:15:17Z

mypy/build.py

@@ -2340,6 +2342,11 @@ def finish_passes(self) -> None:
        manager = self.manager
        if self.options.semantic_analysis_only:
            return
+        if manager.options.disallow_undefined_vars:
+            manager.errors.set_file(self.xpath, self.tree.fullname, options=manager.options)
+            self.tree.accept(


I think that you could add another method, such as detect_partially_undefined_vars (feel free to alter the name) that performs the analysis if enabled, and call it from process_stale_scc after calling type_check_second_pass and before calling finish_passes. I.e., this would be an additional pass after type checking.

JukkaL · 2022-09-05T15:19:18Z

mypy/undefined_vars.py

+    """DefinedVariableTracker manages the state and scope for the UndefinedVariablesVisitor."""
+
+    def __init__(self) -> None:
+        # todo(stas): we should initialize this with some variables.


Can you treat from foo import x similarly to the assignment x = ... (the value of x doesn't make a difference to the analysis)?

ilinum · 2022-09-06T17:56:03Z

mypy/server/update.py

@@ -651,6 +651,7 @@ def restore(ids: list[str]) -> None:
    state.type_checker().reset()
    state.type_check_first_pass()
    state.type_check_second_pass()
+    state.detect_partially_defined_vars()


I think this is necessary for mypyd?

ilinum · 2022-09-06T17:59:19Z

I don't think it's impossible, it'll just require constant work. There will always be some obscure edge case that wasn't considered and of course new language features which are added over time won't make it any easier.

One of the most common issues for pylint AFAIK, and any other static code analyzer for that matter, is understanding control flow. At some point the added complexity just isn't worth the benefit and you just accept a false-positive instead.

I don't have that much experience with mypy and it's hard for me to judge whether this feature is worth the extra mypy work required. Perhaps, @JukkaL can chime in.

ilinum · 2022-09-06T18:05:05Z

I'm happy to either keep working on this branch or merge this PR and keep working in follow-up PRs.

It probably just depends on how hard it is to review :) I would expect the code will grow quite a bit as we add more support for more features.

Personally, I don't really care; just let me know which one you prefer.

github-actions · 2022-09-06T18:51:45Z

According to mypy_primer, this change has no effect on the checked open source code. 🤖🎉

JukkaL · 2022-09-07T17:33:43Z

I don't have that much experience with mypy and it's hard for me to judge whether this feature is worth the extra mypy work required. Perhaps, @JukkaL can chime in.

We can decide how far we are willing to go in terms of complexity once we have a better understanding of the frequencies and kinds of false positives (after the implementation can handle the easy cases). There will certainly be some false positives, but it would be nice if the rate could be low enough that we can enable this by default. It's too early to tell yet, however.

JukkaL

Looks good! It makes sense to merge this and implement additional functionality as follow-up PRs.

There's still a conflict that needs to be fixed.

JukkaL · 2022-09-07T17:37:30Z

mypy/server/update.py

@@ -651,6 +651,7 @@ def restore(ids: list[str]) -> None:
    state.type_checker().reset()
    state.type_check_first_pass()
    state.type_check_second_pass()
+    state.detect_partially_defined_vars()


JukkaL · 2022-09-07T17:41:25Z

There's still a conflict that needs to be fixed.

Actually maybe it's enough to mark this as "Ready for review" and merge master.

This builds on #13601 to add support for statements like `continue`, `break`, `return`, `raise` in partially defined variables check. The simplest example is: ```python def f1() -> int: if int(): x = 1 else: return 0 return x ``` Previously, mypy would generate a false positive on the last line of example. See test cases for more details. Adding this support was relatively simple, given all the already existing code. Things that aren't supported yet: `match`, `with`, and detecting unreachable blocks. After this PR, when enabling this check on mypy itself, it generates 18 errors, all of them are potential bugs.

ilinum requested a review from JukkaL September 3, 2022 23:13

Implement foundation for detecting partially defined vars

5f1c4d7

ilinum force-pushed the basic-detect-undefined-vars branch from ae01a05 to 5f1c4d7 Compare September 3, 2022 23:15

ilinum commented Sep 3, 2022

View reviewed changes

This comment has been minimized.

Sign in to view

cdce8p reviewed Sep 4, 2022

View reviewed changes

implement scope inheritance

177c015

This comment has been minimized.

Sign in to view

ilinum added 2 commits September 4, 2022 12:13

fix scope inheritance for non-primary branches

2b64875

implement for loop

bca8133

This comment has been minimized.

Sign in to view

JukkaL reviewed Sep 5, 2022

View reviewed changes

ilinum added 2 commits September 6, 2022 10:53

address code review comments

1c49f68

Merge branch 'master' into basic-detect-undefined-vars

8b86a53

ilinum commented Sep 6, 2022

View reviewed changes

fix mypy error

8f07a4c

ilinum added 4 commits September 6, 2022 11:07

remove extra newline

3aeb8be

remove comment + rename test file

dc48a6e

rename file again

02ba3ea

use error code when reporting

5c564ba

This comment has been minimized.

Sign in to view

JukkaL approved these changes Sep 7, 2022

View reviewed changes

ilinum marked this pull request as ready for review September 7, 2022 18:05

JukkaL merged commit 82a97f7 into python:master Sep 8, 2022

ilinum deleted the basic-detect-undefined-vars branch September 8, 2022 16:21

ilinum mentioned this pull request Sep 8, 2022

Add support for jump statements in partially defined vars check #13632

Merged

dosisod mentioned this pull request Oct 6, 2022

Ensure variable is bound in FURB127 dosisod/refurb#47

Open

		# Experiment flag to detect undefined variables being used.
		add_invertible_flag("--disallow-undefined-vars", default=False, help=argparse.SUPPRESS)

Uh oh!

Implement foundation for detecting partially defined vars #13601

Implement foundation for detecting partially defined vars #13601

Conversation

ilinum commented Sep 3, 2022

Description

Test Plan

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

This comment has been minimized.

This comment has been minimized.

cdce8p left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ilinum Sep 4, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ilinum commented Sep 4, 2022

Uh oh!

This comment has been minimized.

This comment has been minimized.

cdce8p commented Sep 5, 2022

Uh oh!

JukkaL left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ilinum commented Sep 6, 2022

Uh oh!

ilinum commented Sep 6, 2022

Uh oh!

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

github-actions bot commented Sep 6, 2022

Uh oh!

JukkaL commented Sep 7, 2022

Uh oh!

JukkaL left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JukkaL commented Sep 7, 2022

Uh oh!

Uh oh!

ilinum Sep 4, 2022 •

edited

Loading