-
-
Notifications
You must be signed in to change notification settings - Fork 125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make IndexOpsMixin (and Index) generic #760
Conversation
Most of the the remaining errors caused by |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noticed some things to consider.
Am wondering how MultiIndex
will work as a subclass of Index[S1]
. Is everything good because we just say class MultiIndex(Index)
, so the S1
is ignored? But will some of the methods of Index
that refer to S1
still work right if the index is a MultiIndex
?
I'm not sure, same applies to edit: Leaving |
…..]] but keeping it as Index[Any]
@twoertwein I'm headed out of town for 3 days, so won't look at this until Monday. |
This reverts commit 4929ecb.
…bclasses in parent classes)
… methods that aren't already provided by object
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking pretty good, IMHO
tests/test_frame.py
Outdated
@@ -1761,7 +1761,8 @@ def test_getmultiindex_columns() -> None: | |||
[(i, s) for i in [1] for s in df.columns.get_level_values(1)] | |||
] | |||
res4: pd.DataFrame = df[[df.columns[0]]] | |||
check(assert_type(df[df.columns[0]], pd.Series), pd.Series) | |||
column: Scalar = df.columns[0] | |||
check(assert_type(df[column], pd.Series), pd.Series) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be annoying if one can't write df[df.columns[0]]
without doing what you have done here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if this can be fixed - maybe we can do some gymnastic with the order of overloads. df.columns
is Index[Any]
and would therefore probably match the first overload.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that both mypy and pyright do not limit a bounded TypeVar to its bound when it is unknown/any, we have only two options:
- make people cast the Index to the appropriate type
- return Scalar
- return Scalar now; in the future, make pd.DataFrame generic in terms of the Index; return S1
I have no particular preference, except that making DataFrame generic might not happen anytime soon
btw. I will offline from Friday-Sunday, you are welcome to push changes to this PR or also merge it. Since this is a large PR and the tests might not cover everything, I would prefer if you or others could run this version of pandas-stubs on your internal codebase before the next release to catch potential regressions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unrelated: seems that the newest version of numexp broke CI/pandas
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that both mypy and pyright do not limit a bounded TypeVar to its bound when it is unknown/any, we have only two options:
- make people cast the Index to the appropriate type
- return Scalar
- return Scalar now; in the future, make pd.DataFrame generic in terms of the Index; return S1
I have no particular preference, except that making DataFrame generic might not happen anytime soon
Would changing DataFrame.columns()
to return Index[Scalar]
work?
btw. I will offline from Friday-Sunday, you are welcome to push changes to this PR or also merge it. Since this is a large PR and the tests might not cover everything, I would prefer if you or others could run this version of pandas-stubs on your internal codebase before the next release to catch potential regressions.
I will see if I can give this a try.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is a large PR and the tests might not cover everything, I would prefer if you or others could run this version of pandas-stubs on your internal codebase before the next release to catch potential regressions.
I tried the version I placed in your repo with the PR on two large code bases and no new errors appeared, so I think once you merge that in, I can approve and merge in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for testing it and finding a way to let mypy (and pyright) clearly indicate the unintended calls!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you test if using DataFrame.columns()
returning Index[Scalar]
would solve the problem noted in another comment so that the expression df[df.columns[0]]
would work without having to do a cast
or creating a temporary variable?
tests/test_scalars.py
Outdated
assert_type( | ||
md_int64_index // td, Never # pyright: ignore[reportGeneralTypeIssues] | ||
) | ||
assert_type( | ||
md_float_index // td, Never # pyright: ignore[reportGeneralTypeIssues] | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concerned about the above change. Previously, mypy
was seeing that this is an invalid operation. If you make the above change, mypy is no longer detecting that. That's not a good thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, I don't know how to fix it. I think the reason why it worked previously was a behavior of Never when it is being used on input arguments: it is interpreted by mypy/pyright to indicate that that call is invalid. Now, with the generic class, we need to have annotations on the input arguments (we don't have the luxury of using Never there anymore). We can still return Never (but that has a slightly different semantic meaning).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found a fix. I created a PR against your branch at twoertwein#3
Key was to remove __floordiv__()
from OpsMixin
and let any subclass of OpsMixin
define only the valid values.
I think it's a mypy bug. See python/mypy#15861
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the mypy folks say it's not a bug, and pyright says that its implementation was incorrect. So I think the only way to do this is with what I did - you can't use Never
or NoReturn
to "override" the implementation in a subclass that matches Any
tests/test_scalars.py
Outdated
assert_type( | ||
md_int64_index / td, Never # pyright: ignore[reportGeneralTypeIssues] | ||
) | ||
assert_type( | ||
md_float_index / td, Never # pyright: ignore[reportGeneralTypeIssues] | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above
It sounds as if it should work but it creates different problems:
I try to see how far I get on my local branch but I'm not too optimistic that there is an ideal solution for this problem (except for DataFrame being generic in index and columns) edit: |
Yes, I agree. For the cases where that was incorrect, then you have to do a |
I will see if I can find some time to test that one issue related to why |
I believe you have an open issue at mypy related to that :) In theory, mypy should still warn about unreachable code after those lines and prevent people from using the return value. |
…tects it as invalid
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion to avoid using npt
in the constructors
fix issue with floordiv for mypy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @twoertwein . Great contribution
__iter__()
forDatetimeIndex
#723, closes Index.intersection does not return correct sub type #744, closes Bug with iterating df.columns #502, and closes Generic type for Indexes #340assert_type()
to assert the type of any return valueMostly working, but still a few failing tests and I haven't added new tests.
xref microsoft/pyright#5642