We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Describe the bug A clear and concise description of what the bug is.
Run test_dataframe_corr_with with MARS_CI_BACKEND=ray environment:
test_dataframe_corr_with
MARS_CI_BACKEND=ray
mars/dataframe/statistics/tests/test_statistics_execution.py:166 (test_dataframe_corr_with) setup = <mars.deploy.oscar.session.SyncSession object at 0x1572b63a0> def test_dataframe_corr_with(setup): rs = np.random.RandomState(0) raw_df = rs.rand(20, 10) raw_df = pd.DataFrame( np.where(raw_df > 0.4, raw_df, np.nan), columns=list("ABCDEFGHIJ") ) raw_df2 = rs.rand(20, 10) raw_df2 = pd.DataFrame( np.where(raw_df2 > 0.4, raw_df2, np.nan), columns=list("ACDEGHIJKL") ) raw_s = rs.rand(20) raw_s = pd.Series(np.where(raw_s > 0.4, raw_s, np.nan)) raw_s2 = rs.rand(10) raw_s2 = pd.Series(np.where(raw_s2 > 0.4, raw_s2, np.nan), index=raw_df2.columns) df = DataFrame(raw_df) df2 = DataFrame(raw_df2) result = df.corrwith(df2) pd.testing.assert_series_equal(result.execute().fetch(), raw_df.corrwith(raw_df2)) result = df.corrwith(df2, axis=1) pd.testing.assert_series_equal( result.execute().fetch(), raw_df.corrwith(raw_df2, axis=1) ) result = df.corrwith(df2, method="kendall") pd.testing.assert_series_equal( result.execute().fetch(), raw_df.corrwith(raw_df2, method="kendall") ) df = DataFrame(raw_df, chunk_size=4) df2 = DataFrame(raw_df2, chunk_size=6) s = Series(raw_s, chunk_size=5) s2 = Series(raw_s2, chunk_size=5) with pytest.raises(Exception): df.corrwith(df2, method="kendall").execute() result = df.corrwith(df2) > pd.testing.assert_series_equal( result.execute().fetch().sort_index(), raw_df.corrwith(raw_df2).sort_index() ) mars/dataframe/statistics/tests/test_statistics_execution.py:207: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ pandas/_libs/testing.pyx:53: in pandas._libs.testing.assert_almost_equal ??? _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > ??? E AssertionError: Series are different E E Series values are different (66.66667 %) E [index]: [A, B, C, D, E, F, G, H, I, J, K, L] E [left]: [-0.999999999999993, nan, -0.7101574548120652, 0.9999999999999998, -0.9999999999999999, nan, nan, 0.4734592586091424, -0.0859298220144991, nan, nan, nan] E [right]: [0.2138665453062885, nan, 0.49514640730296355, 0.12552323646958655, -0.6144321615177657, nan, -0.11419110881167645, 0.4737487368091909, -0.23766427699623854, -0.33167205059238336, nan, nan] pandas/_libs/testing.pyx:168: AssertionError
but, if the chunk_size argument is removed, the case will be passed. The bug may be in rechunk/align operations.
It's wired that if I copied the code from UT, run the following code with mars backend directly, some errors will be occrred.
import numpy as np import pandas as pd import mars.dataframe as md from mars.deploy.oscar.tests.session import new_test_session sess = new_test_session() sess.as_default() rs = np.random.RandomState(0) raw_df = rs.rand(20, 10) raw_df = pd.DataFrame( np.where(raw_df > 0.4, raw_df, np.nan), columns=list("ABCDEFGHIJ") ) raw_df2 = rs.rand(20, 10) raw_df2 = pd.DataFrame( np.where(raw_df2 > 0.4, raw_df2, np.nan), columns=list("ACDEGHIJKL") ) df = md.DataFrame(raw_df) # no chunk_size output 1, chunk_size=4 output 2 df2 = md.DataFrame(raw_df2, chunk_size=6) result = df.corrwith(df2) pd.testing.assert_series_equal( result.execute().fetch().sort_index(), raw_df.corrwith(raw_df2).sort_index() )
output 1 (df1 no chunk_size)
File "/Users/po.lb/Work/mars/mars/services/subtask/worker/processor.py", line 473, in run await self._execute_graph(chunk_graph) File "/Users/po.lb/Work/mars/mars/services/subtask/worker/processor.py", line 237, in _execute_graph await to_wait File "/Users/po.lb/Work/mars/mars/lib/aio/_threads.py", line 36, in to_thread return await loop.run_in_executor(None, func_call) File "/Users/po.lb/.pyenv/versions/3.8.13/lib/python3.8/concurrent/futures/thread.py", line 57, in run result = self.fn(*self.args, **self.kwargs) File "/Users/po.lb/Work/mars/mars/services/subtask/worker/tests/subtask_processor.py", line 84, in _execute_operand self.assert_object_consistent(out, ctx[out.key]) File "/Users/po.lb/Work/mars/mars/tests/core.py", line 493, in assert_object_consistent self.assert_dataframe_consistent(expected, real) File "/Users/po.lb/Work/mars/mars/tests/core.py", line 355, in assert_dataframe_consistent self.assert_shape_consistent(expected.shape, real.shape) File "/Users/po.lb/Work/mars/mars/tests/core.py", line 274, in assert_shape_consistent raise AssertionError( AssertionError: [address=127.0.0.1:62877, pid=4188] shape in metadata (6, 6) is not consistent with real shape (6, 8)
output 2 (df chunk_size=4)
Traceback (most recent call last): File "/Users/po.lb/Work/mars/t3.py", line 22, in <module> pd.testing.assert_series_equal( File "/Users/po.lb/.pyenv/versions/3.8.13/lib/python3.8/site-packages/pandas/_testing/asserters.py", line 1077, in assert_series_equal _testing.assert_almost_equal( File "pandas/_libs/testing.pyx", line 53, in pandas._libs.testing.assert_almost_equal File "pandas/_libs/testing.pyx", line 168, in pandas._libs.testing.assert_almost_equal File "/Users/po.lb/.pyenv/versions/3.8.13/lib/python3.8/site-packages/pandas/_testing/asserters.py", line 665, in raise_assert_detail raise AssertionError(msg) AssertionError: Series are different Series values are different (66.66667 %) [index]: [A, B, C, D, E, F, G, H, I, J, K, L] [left]: [-0.999999999999993, nan, -0.7101574548120652, 0.9999999999999998, -0.9999999999999999, nan, nan, 0.4734592586091424, -0.0859298220144991, nan, nan, nan] [right]: [0.2138665453062885, nan, 0.49514640730296355, 0.12552323646958655, -0.6144321615177657, nan, -0.11419110881167645, 0.4737487368091909, -0.23766427699623854, -0.33167205059238336, nan, nan]
The output 2 of Mars backend is same with the output of CI with Ray backend.
To Reproduce To help us reproducing this bug, please provide information below:
Expected behavior A clear and concise description of what you expected to happen.
Additional context Add any other context about the problem here.
The text was updated successfully, but these errors were encountered:
No branches or pull requests
Describe the bug
A clear and concise description of what the bug is.
Run
test_dataframe_corr_with
withMARS_CI_BACKEND=ray
environment:but, if the chunk_size argument is removed, the case will be passed. The bug may be in rechunk/align operations.
It's wired that if I copied the code from UT, run the following code with mars backend directly, some errors will be occrred.
output 1 (df1 no chunk_size)
output 2 (df chunk_size=4)
The output 2 of Mars backend is same with the output of CI with Ray backend.
To Reproduce
To help us reproducing this bug, please provide information below:
Expected behavior
A clear and concise description of what you expected to happen.
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered: