Bug report
On second thought this issue should be an enhancement instead of a bug report. Sorry for the wrong template.
Bug description:
There are some cases where _pyrepl auto-indentation works not well.
Cases
- A line ending with
: in a multi-line string is wrongly indented.
Observed
>>> s = '''
... Note:
... ␣␣␣␣|
Expected
>>> s = '''
... Note:
... |
# inside strings is seen as a comment, the following : is ignored.
Observed
Expected
>>> if ' ' == '#':
... ␣␣␣␣|
- When the entire cursor line is a comment and is already indented, pressing Enter gives a further indent.
Observed
>>> def f():
... # foo⤶
... ␣␣␣␣␣␣␣␣|
Expected
>>> def f():
... # foo⤶
... ␣␣␣␣|
Possible solution
Currently _should_auto_indent() parses the buffer from right to left and stops at the first newline it encounters. Only the last line that is not a comment line of the buffer is parsed.
But by parsing from right to left we can't tell if a # starts a comment or is part of a string. For example if we, from right to left, encounter a " first and then a #, we don't know if the # is a comment. To know that, we need to know if the " is a string boundary, but the # might comment out the ", so we can't be sure. There is a information dependency cycle.
To fix this I made a change to parse the buffer from left to right, keeping track of whether current char is inside a string or a comment. This approach solves the above three cases. However the whole buffer is parsed on every call of _should_auto_indent(), with very long buffer, there might be noticeable delay when pressing Enter.
I think this is a big change. It affects how _should_auto_indent() works as a whole. I am hesitated to create a PR and just put it here first to hopefully get feedback.
CPython versions tested on:
3.15
Operating systems tested on:
Linux
Linked PRs
Bug report
On second thought this issue should be an enhancement instead of a bug report. Sorry for the wrong template.
Bug description:
There are some cases where
_pyreplauto-indentation works not well.Cases
:in a multi-line string is wrongly indented.Observed
Expected
#inside strings is seen as a comment, the following:is ignored.Observed
Expected
Observed
Expected
Possible solution
Currently _should_auto_indent() parses the
bufferfrom right to left and stops at the first newline it encounters. Only the last line that is not a comment line of thebufferis parsed.But by parsing from right to left we can't tell if a
#starts a comment or is part of a string. For example if we, from right to left, encounter a"first and then a#, we don't know if the#is a comment. To know that, we need to know if the"is a string boundary, but the#might comment out the", so we can't be sure. There is a information dependency cycle.To fix this I made a change to parse the
bufferfrom left to right, keeping track of whether current char is inside a string or a comment. This approach solves the above three cases. However the wholebufferis parsed on every call of_should_auto_indent(), with very long buffer, there might be noticeable delay when pressing Enter.I think this is a big change. It affects how
_should_auto_indent()works as a whole. I am hesitated to create a PR and just put it here first to hopefully get feedback.CPython versions tested on:
3.15
Operating systems tested on:
Linux
Linked PRs