-
-
Notifications
You must be signed in to change notification settings - Fork 3k
Implement foundation for detecting partially defined vars #13601
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
ae01a05
to
5f1c4d7
Compare
@@ -2340,6 +2342,11 @@ def finish_passes(self) -> None: | |||
manager = self.manager | |||
if self.options.semantic_analysis_only: | |||
return | |||
if manager.options.disallow_undefined_vars: | |||
manager.errors.set_file(self.xpath, self.tree.fullname, options=manager.options) | |||
self.tree.accept( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if this was the right place to plug this in. It needed to be done before self.free_state()
is called.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that you could add another method, such as detect_partially_undefined_vars
(feel free to alter the name) that performs the analysis if enabled, and call it from process_stale_scc
after calling type_check_second_pass
and before calling finish_passes
. I.e., this would be an additional pass after type checking.
mypy/undefined_vars.py
Outdated
"""DefinedVariableTracker manages the state and scope for the UndefinedVariablesVisitor.""" | ||
|
||
def __init__(self) -> None: | ||
# todo(stas): we should initialize this with some variables. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to evaluate imports in some way in order to avoid false positives. Currently, this won't work:
from foo import x
if bool():
x = 10
y = x # This will incorrectly generate an error.
Is there a good way of doing this? Anywhere I should start looking?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you treat from foo import x
similarly to the assignment x = ...
(the value of x
doesn't make a difference to the analysis)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would work some of the time. However, this doesn't work for *
imports (i.e. from foo import *
). After semantic analyzer, mypy should know which symbols are being imported, so it should be possible to get this info somehow.
This comment has been minimized.
This comment has been minimized.
1 similar comment
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if it makes sense for this check to be part of mypy directly as it isn't related to typing itself. E.g. I know that pylint does implement something similar. From my experience there, this is one of the most common sources for false-positives.
Some examples I found while testing the changes here
from typing import Any
def f1(val: Any) -> None:
for x in (1, 2, 3):
if x == val:
x = 1
print(x) # false-positive
def f2() -> None:
d: dict[str, Any] = {}
if (x := d.get("val")) is None:
x = 2
print(x) # false-positive
def f3(val: Any) -> None:
x = 1
if val:
if val == 2:
x = 2
print(x) # false-positive
mypy/main.py
Outdated
# Experiment flag to detect undefined variables being used. | ||
add_invertible_flag("--disallow-undefined-vars", default=False, help=argparse.SUPPRESS) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An alternative would be an error code which is disabled by default. Take a look at mypy/errorcodes.py
and TRUTHY_BOOL
as example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is interesting! I don't have a strong opinion here. I think we'll want an error code eventually for type: ignore[partially-defined]
. I'm guessing that if we have an error code, we shouldn't have a separate flag? So we should probably just make it an error code now. Is my understanding correct here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we'll want an error code eventually for
type: ignore[partially-defined]
IMO it would definitely make sense for it to have a separate error-code. Reusing misc
doesn't feel right.
I'm guessing that if we have an error code, we shouldn't have a separate flag? So we should probably just make it an error code now. Is my understanding correct here?
Some error codes might also have flags but my understanding is that these are all legacy cases. The idea is to move more things to error-codes in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that it would be better to have a dedicated error code for this (that is not enabled by default) and not have a dedicated flag. We'd only run the analysis if the error code is enabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These error codes are nice :) Done!
Thank you for taking a look!
I guess it depends on what you think mypy is. If you think of mypy as a type checker, then yes, this doesn't fit. If you think of mypy as a static analysis tool, then this feature does fit. Note that mypy has similar features already (stuff like
The feature we have today is incomplete (e.g. it doesn't support I would love to hear what pylint finds as false positives right now. My assertion is that once this feature is fully implemented over multiple PRs there should be relatively few false positives. However, there may be things that I am missing that make this inherently impossible to do -- I would love to hear about this now and not when I've spent more time working on this :) |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
True, the
I don't think it's impossible, it'll just require constant work. There will always be some obscure edge case that wasn't considered and of course new language features which are added over time won't make it any easier. One of the most common issues for pylint AFAIK, and any other static code analyzer for that matter, is understanding control flow. At some point the added complexity just isn't worth the benefit and you just accept a false-positive instead.
I would recommend fixing the known false-positives as early as possible. The bug reports will add up quickly anyway. Some more false-positives I foundfrom typing import Any
def f4(val: Any) -> None:
if val == 1:
x = 1
else:
return None
print(x) # false-positive
def f5(val: Any) -> None:
if val == 1:
x = 1
elif val == 2:
x = 2
else:
return None
print(x) # false-positive
def f6(val: list[int]) -> None:
for i in val:
x = 1
if i == 2:
break
else:
return None
print(x) # false-positive
def f7(val: Any) -> None:
def x() -> None: ...
if val:
x = x
print(x) # false-positive
def f8(val: Any) -> None:
if val == 1:
x = 1
else:
assert False
print(x) # false-positive
def f9(val: Any) -> None:
while True:
if val == 1:
x = 1
elif val == 2:
continue
else:
break
print(x) # false-positive |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good start! I looked at the false positives from self check, and many of them would be fixed by adding support for return statements.
mypy/main.py
Outdated
# Experiment flag to detect undefined variables being used. | ||
add_invertible_flag("--disallow-undefined-vars", default=False, help=argparse.SUPPRESS) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that it would be better to have a dedicated error code for this (that is not enabled by default) and not have a dedicated flag. We'd only run the analysis if the error code is enabled.
@@ -2340,6 +2342,11 @@ def finish_passes(self) -> None: | |||
manager = self.manager | |||
if self.options.semantic_analysis_only: | |||
return | |||
if manager.options.disallow_undefined_vars: | |||
manager.errors.set_file(self.xpath, self.tree.fullname, options=manager.options) | |||
self.tree.accept( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that you could add another method, such as detect_partially_undefined_vars
(feel free to alter the name) that performs the analysis if enabled, and call it from process_stale_scc
after calling type_check_second_pass
and before calling finish_passes
. I.e., this would be an additional pass after type checking.
mypy/undefined_vars.py
Outdated
"""DefinedVariableTracker manages the state and scope for the UndefinedVariablesVisitor.""" | ||
|
||
def __init__(self) -> None: | ||
# todo(stas): we should initialize this with some variables. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you treat from foo import x
similarly to the assignment x = ...
(the value of x
doesn't make a difference to the analysis)?
@@ -651,6 +651,7 @@ def restore(ids: list[str]) -> None: | |||
state.type_checker().reset() | |||
state.type_check_first_pass() | |||
state.type_check_second_pass() | |||
state.detect_partially_defined_vars() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is necessary for mypyd?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah.
I don't have that much experience with mypy and it's hard for me to judge whether this feature is worth the extra mypy work required. Perhaps, @JukkaL can chime in. |
I'm happy to either keep working on this branch or merge this PR and keep working in follow-up PRs. It probably just depends on how hard it is to review :) I would expect the code will grow quite a bit as we add more support for more features. Personally, I don't really care; just let me know which one you prefer. |
This comment has been minimized.
This comment has been minimized.
4 similar comments
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
According to mypy_primer, this change has no effect on the checked open source code. 🤖🎉 |
We can decide how far we are willing to go in terms of complexity once we have a better understanding of the frequencies and kinds of false positives (after the implementation can handle the easy cases). There will certainly be some false positives, but it would be nice if the rate could be low enough that we can enable this by default. It's too early to tell yet, however. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! It makes sense to merge this and implement additional functionality as follow-up PRs.
There's still a conflict that needs to be fixed.
@@ -651,6 +651,7 @@ def restore(ids: list[str]) -> None: | |||
state.type_checker().reset() | |||
state.type_check_first_pass() | |||
state.type_check_second_pass() | |||
state.detect_partially_defined_vars() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah.
Actually maybe it's enough to mark this as "Ready for review" and merge master. |
This builds on #13601 to add support for statements like `continue`, `break`, `return`, `raise` in partially defined variables check. The simplest example is: ```python def f1() -> int: if int(): x = 1 else: return 0 return x ``` Previously, mypy would generate a false positive on the last line of example. See test cases for more details. Adding this support was relatively simple, given all the already existing code. Things that aren't supported yet: `match`, `with`, and detecting unreachable blocks. After this PR, when enabling this check on mypy itself, it generates 18 errors, all of them are potential bugs.
Description
This diff lays the foundation for detecting partially defined variables. Think of the following situation:
Now, mypy will generate the error in such a case.
Note that this diff is not complete. It still generates a lot of false positives. Running it on mypy itself generated 182 errors.
Therefore, this feature is hidden behind a flag and I will implement it in multiple PRs.
Test Plan
Added tests.