Skip to content

Implement foundation for detecting partially defined vars #13601

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Sep 8, 2022

Conversation

ilinum
Copy link
Collaborator

@ilinum ilinum commented Sep 3, 2022

Description

This diff lays the foundation for detecting partially defined variables. Think of the following situation:

if foo():
    x = 1
print(x)  # Error: "x" may be undefined.

Now, mypy will generate the error in such a case.

Note that this diff is not complete. It still generates a lot of false positives. Running it on mypy itself generated 182 errors.
Therefore, this feature is hidden behind a flag and I will implement it in multiple PRs.

Test Plan

Added tests.

@ilinum ilinum requested a review from JukkaL September 3, 2022 23:13
@ilinum ilinum force-pushed the basic-detect-undefined-vars branch from ae01a05 to 5f1c4d7 Compare September 3, 2022 23:15
@@ -2340,6 +2342,11 @@ def finish_passes(self) -> None:
manager = self.manager
if self.options.semantic_analysis_only:
return
if manager.options.disallow_undefined_vars:
manager.errors.set_file(self.xpath, self.tree.fullname, options=manager.options)
self.tree.accept(
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this was the right place to plug this in. It needed to be done before self.free_state() is called.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that you could add another method, such as detect_partially_undefined_vars (feel free to alter the name) that performs the analysis if enabled, and call it from process_stale_scc after calling type_check_second_pass and before calling finish_passes. I.e., this would be an additional pass after type checking.

"""DefinedVariableTracker manages the state and scope for the UndefinedVariablesVisitor."""

def __init__(self) -> None:
# todo(stas): we should initialize this with some variables.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to evaluate imports in some way in order to avoid false positives. Currently, this won't work:

from foo import x

if bool():
  x = 10
y = x  # This will incorrectly generate an error.

Is there a good way of doing this? Anywhere I should start looking?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you treat from foo import x similarly to the assignment x = ... (the value of x doesn't make a difference to the analysis)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would work some of the time. However, this doesn't work for * imports (i.e. from foo import *). After semantic analyzer, mypy should know which symbols are being imported, so it should be possible to get this info somehow.

@github-actions

This comment has been minimized.

1 similar comment
@github-actions

This comment has been minimized.

Copy link
Collaborator

@cdce8p cdce8p left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if it makes sense for this check to be part of mypy directly as it isn't related to typing itself. E.g. I know that pylint does implement something similar. From my experience there, this is one of the most common sources for false-positives.

Some examples I found while testing the changes here

from typing import Any

def f1(val: Any) -> None:
    for x in (1, 2, 3):
        if x == val:
            x = 1
        print(x)  # false-positive


def f2() -> None:
    d: dict[str, Any] = {}
    if (x := d.get("val")) is None:
        x = 2
    print(x)  # false-positive


def f3(val: Any) -> None:
    x = 1
    if val:
        if val == 2:
            x = 2
        print(x)  # false-positive

mypy/main.py Outdated
Comment on lines 826 to 827
# Experiment flag to detect undefined variables being used.
add_invertible_flag("--disallow-undefined-vars", default=False, help=argparse.SUPPRESS)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alternative would be an error code which is disabled by default. Take a look at mypy/errorcodes.py and TRUTHY_BOOL as example.

Copy link
Collaborator Author

@ilinum ilinum Sep 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is interesting! I don't have a strong opinion here. I think we'll want an error code eventually for type: ignore[partially-defined]. I'm guessing that if we have an error code, we shouldn't have a separate flag? So we should probably just make it an error code now. Is my understanding correct here?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we'll want an error code eventually for type: ignore[partially-defined]

IMO it would definitely make sense for it to have a separate error-code. Reusing misc doesn't feel right.

I'm guessing that if we have an error code, we shouldn't have a separate flag? So we should probably just make it an error code now. Is my understanding correct here?

Some error codes might also have flags but my understanding is that these are all legacy cases. The idea is to move more things to error-codes in the future.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that it would be better to have a dedicated error code for this (that is not enabled by default) and not have a dedicated flag. We'd only run the analysis if the error code is enabled.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These error codes are nice :) Done!

@ilinum
Copy link
Collaborator Author

ilinum commented Sep 4, 2022

Thank you for taking a look!

I'm not sure if it makes sense for this check to be part of mypy directly as it isn't related to typing itself.

I guess it depends on what you think mypy is. If you think of mypy as a type checker, then yes, this doesn't fit. If you think of mypy as a static analysis tool, then this feature does fit. Note that mypy has similar features already (stuff like --warn-unreachable). I am wondering: perhaps this feature is better implemented as a plugin?

... I know that pylint does implement something similar. this is one of the most common sources for false-positives.

The feature we have today is incomplete (e.g. it doesn't support for loops). Therefore, with the current implementation, there will be lots of false positives. The first two you pointed out are just from unsupported language features. The last one was a bug (which is now fixed).

I would love to hear what pylint finds as false positives right now. My assertion is that once this feature is fully implemented over multiple PRs there should be relatively few false positives. However, there may be things that I am missing that make this inherently impossible to do -- I would love to hear about this now and not when I've spent more time working on this :)

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@cdce8p
Copy link
Collaborator

cdce8p commented Sep 5, 2022

I'm not sure if it makes sense for this check to be part of mypy directly as it isn't related to typing itself.

I guess it depends on what you think mypy is. If you think of mypy as a type checker, then yes, this doesn't fit. If you think of mypy as a static analysis tool, then this feature does fit. Note that mypy has similar features already (stuff like --warn-unreachable). I am wondering: perhaps this feature is better implemented as a plugin?

True, the unreachable check is fairly similar to this one. Even with the number of potential issues topic-reachability Detecting unreachable code
Another argument would be that pyright has something similar with possibly unbound.

... I know that pylint does implement something similar. this is one of the most common sources for false-positives.

The feature we have today is incomplete (e.g. it doesn't support for loops). Therefore, with the current implementation, there will be lots of false positives. The first two you pointed out are just from unsupported language features. The last one was a bug (which is now fixed).

I would love to hear what pylint finds as false positives right now. My assertion is that once this feature is fully implemented over multiple PRs there should be relatively few false positives. However, there may be things that I am missing that make this inherently impossible to do -- I would love to hear about this now and not when I've spent more time working on this :)

I don't think it's impossible, it'll just require constant work. There will always be some obscure edge case that wasn't considered and of course new language features which are added over time won't make it any easier.

One of the most common issues for pylint AFAIK, and any other static code analyzer for that matter, is understanding control flow. At some point the added complexity just isn't worth the benefit and you just accept a false-positive instead.

My assertion is that once this feature is fully implemented over multiple PRs there should be relatively few false positives.

I would recommend fixing the known false-positives as early as possible. The bug reports will add up quickly anyway.

Some more false-positives I found
from typing import Any

def f4(val: Any) -> None:
    if val == 1:
        x = 1
    else:
        return None

    print(x)  # false-positive


def f5(val: Any) -> None:
    if val == 1:
        x = 1
    elif val == 2:
        x = 2
    else:
        return None

    print(x)  # false-positive


def f6(val: list[int]) -> None:
    for i in val:
        x = 1
        if i == 2:
            break
    else:
        return None

    print(x)  # false-positive


def f7(val: Any) -> None:
    def x() -> None: ...
    if val:
        x = x
    print(x)  # false-positive


def f8(val: Any) -> None:
    if val == 1:
        x = 1
    else:
        assert False

    print(x)  # false-positive


def f9(val: Any) -> None:
    while True:
        if val == 1:
            x = 1
        elif val == 2:
            continue
        else:
            break
        print(x)  # false-positive

Copy link
Collaborator

@JukkaL JukkaL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good start! I looked at the false positives from self check, and many of them would be fixed by adding support for return statements.

mypy/main.py Outdated
Comment on lines 826 to 827
# Experiment flag to detect undefined variables being used.
add_invertible_flag("--disallow-undefined-vars", default=False, help=argparse.SUPPRESS)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that it would be better to have a dedicated error code for this (that is not enabled by default) and not have a dedicated flag. We'd only run the analysis if the error code is enabled.

@@ -2340,6 +2342,11 @@ def finish_passes(self) -> None:
manager = self.manager
if self.options.semantic_analysis_only:
return
if manager.options.disallow_undefined_vars:
manager.errors.set_file(self.xpath, self.tree.fullname, options=manager.options)
self.tree.accept(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that you could add another method, such as detect_partially_undefined_vars (feel free to alter the name) that performs the analysis if enabled, and call it from process_stale_scc after calling type_check_second_pass and before calling finish_passes. I.e., this would be an additional pass after type checking.

"""DefinedVariableTracker manages the state and scope for the UndefinedVariablesVisitor."""

def __init__(self) -> None:
# todo(stas): we should initialize this with some variables.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you treat from foo import x similarly to the assignment x = ... (the value of x doesn't make a difference to the analysis)?

@@ -651,6 +651,7 @@ def restore(ids: list[str]) -> None:
state.type_checker().reset()
state.type_check_first_pass()
state.type_check_second_pass()
state.detect_partially_defined_vars()
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is necessary for mypyd?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah.

@ilinum
Copy link
Collaborator Author

ilinum commented Sep 6, 2022

I don't think it's impossible, it'll just require constant work. There will always be some obscure edge case that wasn't considered and of course new language features which are added over time won't make it any easier.

One of the most common issues for pylint AFAIK, and any other static code analyzer for that matter, is understanding control flow. At some point the added complexity just isn't worth the benefit and you just accept a false-positive instead.

I don't have that much experience with mypy and it's hard for me to judge whether this feature is worth the extra mypy work required. Perhaps, @JukkaL can chime in.

@ilinum
Copy link
Collaborator Author

ilinum commented Sep 6, 2022

I'm happy to either keep working on this branch or merge this PR and keep working in follow-up PRs.

It probably just depends on how hard it is to review :) I would expect the code will grow quite a bit as we add more support for more features.

Personally, I don't really care; just let me know which one you prefer.

@github-actions

This comment has been minimized.

4 similar comments
@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions
Copy link
Contributor

github-actions bot commented Sep 6, 2022

According to mypy_primer, this change has no effect on the checked open source code. 🤖🎉

@JukkaL
Copy link
Collaborator

JukkaL commented Sep 7, 2022

I don't have that much experience with mypy and it's hard for me to judge whether this feature is worth the extra mypy work required. Perhaps, @JukkaL can chime in.

We can decide how far we are willing to go in terms of complexity once we have a better understanding of the frequencies and kinds of false positives (after the implementation can handle the easy cases). There will certainly be some false positives, but it would be nice if the rate could be low enough that we can enable this by default. It's too early to tell yet, however.

Copy link
Collaborator

@JukkaL JukkaL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! It makes sense to merge this and implement additional functionality as follow-up PRs.

There's still a conflict that needs to be fixed.

@@ -651,6 +651,7 @@ def restore(ids: list[str]) -> None:
state.type_checker().reset()
state.type_check_first_pass()
state.type_check_second_pass()
state.detect_partially_defined_vars()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah.

@JukkaL
Copy link
Collaborator

JukkaL commented Sep 7, 2022

There's still a conflict that needs to be fixed.

Actually maybe it's enough to mark this as "Ready for review" and merge master.

@ilinum ilinum marked this pull request as ready for review September 7, 2022 18:05
@JukkaL JukkaL merged commit 82a97f7 into python:master Sep 8, 2022
@ilinum ilinum deleted the basic-detect-undefined-vars branch September 8, 2022 16:21
JukkaL pushed a commit that referenced this pull request Sep 12, 2022
This builds on #13601 to add support for statements like `continue`,
`break`, `return`, `raise` in partially defined variables check. The
simplest example is:
```python
def f1() -> int:
    if int():
        x = 1
    else:
        return 0
    return x
```

Previously, mypy would generate a false positive on the last line of
example. See test cases for more details.

Adding this support was relatively simple, given all the already
existing code.

Things that aren't supported yet: `match`, `with`, and detecting
unreachable blocks.

After this PR, when enabling this check on mypy itself, it generates 18
errors, all of them are potential bugs.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants