Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Warn against reusing exception names after the except: block on Python 3 #59

Merged
merged 1 commit into from
May 4, 2016

Conversation

ambv
Copy link
Member

@ambv ambv commented Mar 11, 2016

The following code is invalid on Python 3:

try:
    raise ValueError('ve')
except ValueError as exc:
    pass

print("Last exception:", exc)    # raises an UnboundLocalError

This pull request makes the Checker warn against this problem. There are new tests that pass on both Python 2 and Python 3.


Last line will raise UnboundLocalError on both Python 2 and
Python 3 because the existence of that exception name creates a local
scope placeholder for it, obscuring any globals, etc."""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, that is sneaky. Good catch.

@bitglue
Copy link
Member

bitglue commented Mar 11, 2016

Looks pretty good, thanks for the patch.

There's one more case to consider which might need to be addressed:

# test.py
def f():
    e = 'foo'
    try:
        pass
    except Exception as e:
        pass

    print(e)

f()
$ python3 test.py
foo
$ pyflakes test.py 
test.py:8: undefined name 'e'

There's a similar difference in the scope of bound names between Python 2 and 3 with generator expressions. You might look at GENERATOREXP for a partial solution. The problem there is I don't think it's possible to bind additional names within a generator expression, but that is possible in a try: block, as test_namesDeclaredInExceptBlocks already demonstrates. If you self.pushScope() before handling the try: block, then all those names will go out of scope when you self.popScope() 😕

@ambv
Copy link
Member Author

ambv commented Mar 11, 2016

@bitglue, this case is handled correctly as is. The code in your example would fail if the exception is raised. You're only including exception handling because, well, sometimes those exceptions will be raised. You don't want to be surprised exactly in this scenario.

Even if the code executes just fine without the exception, PyFlakes should warn that in the case of an exception there's going to be trouble.

@bitglue
Copy link
Member

bitglue commented Mar 11, 2016

What about the slightly more complicated:

def maybe_raise_exception():
    import random
    if random.randint(0, 1):
        raise Exception('whoops')

def f():
    e = 'foo'
    try:
        maybe_raise_exception()
    except Exception as e:
        print(e)
        raise SystemExit(1)

    print(e)

f()

@ambv
Copy link
Member Author

ambv commented Mar 11, 2016

Haha, sure, if we're re-raising or otherwise exiting the flow of the function in the except: clause then print(e) would work. For PyFlakes, we'd have to implement a complex heuristic to overcome that, looking for return, yield, raise, continue, break statements within the except: block. But that would still be prone to conditionals, nested functions raising, etc. It's almost easier to just execute the code and unit test it at this point.

Instead of going that path, let's agree that your last example is still a code smell in the sense that there's no need for the exception name to shadow an unrelated local (or global) name. So, how about I implement warning against reusing the exception name if it already exists in any scope above? It would be in the spirit of "function redefinition".

# the exception, which is not a Name node, but a simple string.
if isinstance(node.name, str):
self.handleNodeStore(node)
if not isinstance(node.name, str):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use if PY2: instead of type inspection, so that it is more clear that this is version specific code

@jayvdb
Copy link
Member

jayvdb commented Mar 11, 2016

This should only generate an error on Python 3. It is valid to refer to the exception variable in Python 2

@jayvdb
Copy link
Member

jayvdb commented Mar 11, 2016

Reusing names as an exception variable name is valid, and is quite common based on my analysis done for #57.

Reusing a function name is different; the previous definition is replaced and becomes inaccessible, which isnt appropriate for a public symbol. Very likely the code should have better usage of module level if statements if needs alternative implementations of a public symbol. And if it was a private symbol (prefixed by _) there is no good reason not to give each function a separate name -- the only benefit of reusing the same symbol is to keep the module dictionary smaller.

Anyways, you don't need complex heuristics to avoid the problematic case that @bitglue has pointed out. There are two solutions:

  1. A try block is given its own scope on Python 3 at least, so that the variables in that scope die with that scope. It would be neater for a new scope to also be created on Python 2, and then the names in the try scope to be copied into the outer scope at the end of try block.
  2. An exception variable needs a new class, which has different characteristics than the base class.
    2.a. it doesnt replace an existing name (of a different class) in the scope, as an existing name persists after the try block
    2.b. names that are instances of this new class are deleted after the try block is completed.

@ambv
Copy link
Member Author

ambv commented Mar 11, 2016

  1. Agreed that all the errors raised by this pull request only apply to Python 3.
  2. I will convert the pull request to use if PY2 instead. I like that approach better, too.
  3. As pointed out by tests in this pull request, shadowing names of locals or nonlocals by exception names is a landmine, the behavior is different depending on if the exception was raised or not and whether the shadowed name is a previous exception name, a pre-existing local or a nonlocal. It is best to avoid it, hence my suggestion of warning against this use. To paraphrase you, @jayvdb: "there is no good reason not to give each exception a separate name".
  4. Your suggestions are both incorrect. The first one states that try blocks have their separate scope which is not true, look at the example given in test_namesDeclaredInExceptBlocks. It is just the exception name that is being unbound immediately after leaving the except: block. The second suggestion you're making suggests that the exception name only temporarily shadows an existing name but this is not true. If we're shadowing a local name, it doesn't shadow anything at all if the exception is not raised but it does if it is (and even unbinds the local on Python 3 in that case after exiting the except: block). You never get the old value back. If we're shadowing a global name and the exception is not raised, the code fails with an UnboundLocalError on both Python 2 and Python 3 (see test_undefinedExceptionNameObscuringGlobalVariable).

@jayvdb
Copy link
Member

jayvdb commented Mar 11, 2016

Re 2, the comparison is wrong. the exception names are local; if the code works, naming of variables is a style issue and doesnt affect anyone else. function names are public.

There are good reasons to re-use existing names for exception names. Existing very large code-bases do it. The code @bitglue has given above is effectively what is really done. Is it nice? no. But it is common enough that we cant say it is illegal.

Wrt my suggestions, I didnt go into every niggly implementation detail in order that they could be understood at a high level.

@ambv
Copy link
Member Author

ambv commented Mar 11, 2016

Remember that we're talking about Python 3 code here.

"if the code works" - the entire point is that on Python 3 it doesn't because the behavior is different depending on whether the exception was raised or not. That's not style, that's correctness. Warning the programmer that he's redeclaring the name would help finding bugs.

If there are reasons to bind exception names to locals, this should be done by explicit assignment, like the example in test_namesDeclaredInExceptBlocks shows. This way it's clear what's going on and the name is not going to magically unbind after exiting the except: block.

@bitglue
Copy link
Member

bitglue commented Mar 12, 2016

Instead of going that path, let's agree that your last example is still a code smell in the sense that there's no need for the exception name to shadow an unrelated local (or global) name. So, how about I implement warning against reusing the exception name if it already exists in any scope above? It would be in the spirit of "function redefinition".

Not quite, I think. An example of function redefinition, with pyflakes will complain about:

def f():
    print("foo")

def f():
    print("bar")

This program, no matter how you run it, can never print "foo". This is not just a smell: there is no reason at all to want the first definition of f().

Change the program just a little bit:

def f():
    print("foo")

f()

def f():
    print("bar")

Now there is a reason for the first f(), and pyflakes will pass this code without a complaint. Even though it smells.

Would it be possible to relax the constraints somewhat so there may be false negatives, but there are no false positives? I think this would be more in line with the design principles.

@sigmavirus24
Copy link
Member

To be clear, on both Python 2 and Python 3, the code sample that @bitglue mentioned here and reproduced below will absolutely replace that local value. In Python 3, that means that if there's already a local with that name then this warning should be skipped. (I haven't reviewed the code or tested this PR out yet though so that may already be the case.)

def maybe_raise_exception():
    import random
    if random.randint(0, 1):
        raise Exception('whoops')

def f():
    e = 'foo'
    try:
        maybe_raise_exception()
    except Exception as e:
        print(e)
        raise SystemExit(1)

    print(e)

f()

In other words in f() if you didn't have the raise SystemExit you would have the exception printed twice. This is the only way (I'm aware of) to reference a bound exception outside an except clause if necessary on Python 3.

@ambv
Copy link
Member Author

ambv commented Mar 14, 2016

Thanks for taking the time to review this!

@sigmavirus24, the behavior is different on Python 3 from what you describe. Let's separate two parts of the issue here. First, let's start with the full exception situation and then discuss false positives, "styling suggestions", etc.

Availability of exception names after the except: block

I'm sorry I failed to explain what the behavior actually is. Let me try one more time. Please read this through, run the examples yourself if necessary.

Consider the following gist: https://gist.github.com/ambv/1143f0d0b86977059bed

There are six examples. In the first three examples the exception is not raised. In example1 there's no previous name clashing with our exception name. In example2 there's a previous local name clashing with our exception name. In example3 there's a global (or more correctly, nonlocal) name clashing with our exception name.

The second three examples are copies of the first three where the exception is raised. You can run the examples on both Python 2 and Python 3 to confirm that the comments on expected output don't lie.

As you can see, in the case where no exception is raised, Python 2 behaves like Python 3. When the exception is raised, Python 2 shadows whatever value there was before, and keeps the value. Python 3 shadows whatever value there was before, and unbinds it after leaving the except: block.

Let's agree on a few findings those examples show us.

  1. Code that involves exception handling but only behaves as expected when the exception is not raised, is incorrect.
  2. Code that involves exception handling but only behaves as expected when the exception is raised, is incorrect.
  3. Python 3 code that tries to reference exception names after the except: block is always incorrect. (it raises UnboundLocalError in every case but one, when there was a local name and no exception was raised, but see Finding 1)
  4. Python 3 code might be corrected by making an explicit assignment to a name within the exception block. This removes all quirkiness of unbinding names after the except: block and states programmer's intent better. An example of this is in test_namesDeclaredInExceptBlocks in this pull request.

False positives

Based on the three findings above, this pull request proposes to point out exception names as undefined after exiting the except: block on Python 3, like so:

/tmp/exceptions.py:11: undefined name 'e1'
/tmp/exceptions.py:23: undefined name 'e2'
/tmp/exceptions.py:46: undefined name 'e4'
/tmp/exceptions.py:58: undefined name 'e5'

Since doing that correctly for globals is prohibitively complex (note that warnings for 'e3' and 'e6' are missing), I propose to additionally warn against shadowing locals and globals by exception names, again, just on Python 3.

Now, I understand that pyflakes tries hard to never issue false positives. In our case, a false positive would be issued for example5 if the except: block contained a statement breaking the function flow (with return, raise, yield, break, or continue). Taking that into account, I'm proposing the following:

  1. To modify the pull request to not issue "undefined name" warnings when shadowing a pre-existing local occurred. This solves the possible false positive.
  2. To add a warning called "variable 'x' (defined in enclosing scope on line N) redefined as exception name" that would cover local shadowing and global shadowing. This is useful both to warn against unexpected namespace pollution and to denote that the original variable becomes inaccessible within the except: block. In this sense it's strictly correct and is not a false positive. Please note that the situation we're warning against is different than just redefining a value by an explicit assignment somewhere. That's the point of the warning. It's not a style suggestion.

If you strongly feel that 2. has no place in pyflakes, please suggest where in the linter stack does it belong and I'll add it there. Give it five minutes, I really think this would find many bugs before hitting production.

@jayvdb
Copy link
Member

jayvdb commented Mar 14, 2016

I disagree with your first finding "Code that involves exception handling but only behaves as expected when the exception is not raised, is incorrect." as written. Consider the following:

def example10():
    e10 = 'foo'
    try:
        pass
    except Exception as e10:
        do_logging(e10)
        raise

    print(e10)

In the above, print(e10) only occurs if the exceptions are not raised. To avoid false positives, you would need to track breakages of execution in the except: clauses; i.e. looking for return or raise statements. And the finally: block too I suppose, but I expect it to be very rare that someone re-declares a variable in a finally: to workaround the Python 3 UnboundLocalError . Using e10 as the exception name seems like it can only be horrendous code style without any justification, but that is irrelevant here.

It is not trivial to determine whether the code from the except: block flows to the block that follows, but it also isnt terribly complex. Once it is done, stricter processing of exception variable names is possible without false positives.

The same applies to "Code that involves exception handling but only behaves as expected when the exception is raised, is incorrect." as written. As before, to avoid false positives you would need to track breakages of execution in the else: clause, e.g. looking for return or raise statements.

Consider the following for a rather sensible bit of code, where reusing the exception name is desirable:

def example11():
    try:
        intentionally_detect_exception_raised()
    except TypeError as e11:
        pass
    except ValueError as e11:
        pass
    else:
        return

    print(e11)

Note that your global variable examples (3 and 6) are not failing for the reason you believe they are failing. As e3 and e6 are modified in each function, those variables are always local variables, and the global variable is completely irrelevant. i.e. If you added print(e3) at the beginning of example3, you will find it does not refer to the global on Python 2 or 3.

Also note that it should not be prohibitively complex to behave differently if a variable is global; the variables that are from the global scope are easy to identify within pyflakes.

@ambv
Copy link
Member Author

ambv commented Mar 14, 2016

I do agree that handling local name shadowing while supporting flow control breaks would be a nice feature. But saying "it is not trivial to determine whether the code from the except: block flows to the block that follows, but it also isn't terribly complex" just goes to show that you didn't think it through, @jayvdb . What if the re-raise (https://gist.github.com/ambv/5925849ac299cd15386d, but really any exception applies) happens within a function that you call in the except: block? Still not terribly complex to discover? If so, please create a patch for this, I'd be very excited to see this work.

So, we're back in square one where, as I commented before, I'm proposing to do what @sigmavirus24 suggested, which is to silence the "undefined name" warnings for local name shadowing. I would do the same for global name shadowing since it has the same false positive problem. But in this case I don't have to because that warning didn't work for global name shadowing in the first place. In PyFlakes, either something's bound in the current scope or it doesn't exist (and then we're looking at nonlocal scopes to find it). Unbound local discovery in PyFlakes is done by checking assignments against names from outer scopes "used" before within the current scope. In our problem the unbound read happens after the implicit assignment by the except: block so we'd have to create a new concept of an "unbound but present name" in the current scope.

And to close this, please read line 73 in test_undefined_names.py in this pull request and tell me again that I don't understand why example3 and example6 fail. Yet again, @jayvdb, your suggestions are incorrect and your attitude is not helpful.

@jayvdb
Copy link
Member

jayvdb commented Mar 14, 2016

Sure, an exception occurring within an except: block may from code that we cant analyse, and the simplest assumption is every line can cause an exception. My point is that there are some blocks of code that we can know exactly how they effect flow control, and only in those cases you can make stronger assertions like "Code that involves exception handling but only behaves as expected when the exception is not raised, is incorrect."

And my apologies for assuming that you didnt understand that example3 and example6 are merely restating example1 and example4.

In your proposed solution, would there be an error on print(e10) and print(e11) in my examples?

@bitglue
Copy link
Member

bitglue commented Mar 14, 2016

To add a warning called "variable 'x' (defined in enclosing scope on line N) redefined as exception name" that would cover local shadowing and global shadowing.

Keep in mind that pyflakes doesn't have warnings. It only has errors. It is important to me, and I suspect many other people who put pyflakes in their automated testing process, that it continues to not emit errors for any code which does not have actual errors.

If I'm understanding correctly, you are proposing this change as a solution for example5. Here it is, for discussion:

def example5():
    e5 = 'old local value'
    try:
        raise ValueError('ve5')
    except Exception as e5:
        pass

    print(e5)
    # py2: prints 've5'
    # py3: raises UnboundLocalError

I think in this case we can only be sure there's an error on Python 3 because we can be sure the except: block won't raise an exception or exit. It is somewhat common to have just pass in an except: block, so if you wanted to special-case that scenario and emit an error in that case, that would be fine. It may also be so complex it's not worth the effort, and that's fine too. There are plenty of kinds of errors that pyflakes does not catch: it just tries to catch the obvious ones.

However, many times there will be something other than pass in that except: block. I don't think even complex heuristics will be sufficient here. Firstly, because any statement might raise an exception. In some cases the author may put a statement there he knows raises an exception, but pyflakes probably doesn't know that.

Secondly, consider this:

def example5b():
    e5 = 'old local value'
    failed = False
    try:
        raise ValueError('ve5')
    except Exception as e5:
        failed = True

    if not failed:
        print(e5)

This prints nothing but also does not have any undefined names in Python 2 or 3. Pyflakes does not currently have any concept of conditional branches, so it treats this code as if the if not failed: isn't even there, identically to example 5. If you try to handle conditional branches like this in any accurate way you end up being a Python interpreter.

I'm reluctant to concede that because this case is complicated, pyflakes should just emit an error because it might be a problem and the author might have made a mistake. This just isn't in the design principles, and it's a big part of what differentiates pyflakes from pylint.

If you strongly feel that 2. has no place in pyflakes, please suggest where in the linter stack does it belong and I'll add it there.

My personal belief is this sort of thing belongs in unit tests. Pyflakes is there to catch problems that a unit test won't, like unused imports, or dead code. It just happens to catch some other errors, like undefined names, because it's easy for pyflakes to do so.

I've changed my mind on issues like this before. The single most persuasive argument you can make is to run your proposed change against a lot of existing Python code and demonstrate that there are no false positives.

@ambv
Copy link
Member Author

ambv commented Mar 14, 2016

@bitglue, okay, what you said makes perfect sense to me. I'm modifying the pull request to only issue errors on 100% clear cut cases (e.g. Python 3 with no shadowing at all so no controversy and false positives). PyFlakes doesn't have warnings so my suggestion to introduce the "redefinition check" is not applicable. Fair enough, I agree.

@sigmavirus24, @bitglue, where would this redefinition warning fit? Would pep8 be a good home for this?

@bitglue
Copy link
Member

bitglue commented Mar 14, 2016

pep8 probably not, since I don't think there's anything in PEP 8 about it. Personally, I'll catch these errors in unit tests. You could also check out pylint.

@ambv ambv force-pushed the master branch 2 times, most recently from fa8861e to 3d1f8de Compare March 14, 2016 21:55
@ambv
Copy link
Member Author

ambv commented Mar 14, 2016

As discussed, I updated the pull request to only issue warnings if no shadowing is happening at all. I left all discussed false positives as tests to make sure they never issue warnings if we do end up extending this functionality in the future.

I decided to write a custom flake8 plugin to handle warning against exception name redefining a local/nonlocal. This way projects can enable this should they choose to.

@ambv
Copy link
Member Author

ambv commented Mar 17, 2016

@bitglue, is there anything else here that you'd like me to do before merging this?

@bitglue
Copy link
Member

bitglue commented Mar 18, 2016

Looks good -- can you please rebase onto master so I can fast-forward merge it?

@ambv
Copy link
Member Author

ambv commented Apr 12, 2016

Done!

@bitglue bitglue merged commit 2a698f8 into PyCQA:master May 4, 2016
@jayvdb
Copy link
Member

jayvdb commented May 4, 2016

as expected, this is causing false positive errors on python 3

@jayvdb
Copy link
Member

jayvdb commented May 4, 2016

(ughh.. ignore me going off half cocked, it is finding lots of real errors as far as I can see.)

@ambv
Copy link
Member Author

ambv commented May 4, 2016

If you see any false positives, give us examples here. I cut the pull request's scope by a lot to ensure there's no false positives (see my last message on March 14). If there's still any, we should fix it.

@jayvdb
Copy link
Member

jayvdb commented May 6, 2016

Found an exception https://bugs.launchpad.net/pyflakes/+bug/1578903

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants