GH1033 Add overloads of engine for pd.read_json #1035

loicdiridollou · 2024-11-16T23:16:56Z

Closes Added new argument engine to read_json() to support parsing JSON with pyarrow by specifying engine="pyarrow" #1033
Tests added: Please use assert_type() to assert the type of any return value

loicdiridollou · 2024-11-16T23:17:47Z

For engine="pyarrow", you are forced to pass lines=True and you can not pass a StringIO buffer, it has to be a file path or a ReadBuffer[bytes].

Dr-Irv

I'm a little concerned about the misuse of ellipses with default arguments. Ellipses should only be used in an argument when the argument is optional. When you want a specific result to happen as a result of the argument being specified, you don't use ellipses. The overloads that require values to be specified (i.e., the ones without ellipses) should come before the ones that use ellipses. And the ones with ellipses should have "broad" types. So writing something like engine: Literal["pyarrow"] = ... can't be correct, because the default value of engine is ujson, so if the stub is to work without specification of that parameter it would be engine: Literal["ujson", "pyarrow"] = ... .

Dr-Irv · 2024-11-18T16:24:31Z

pandas-stubs/io/json/_json.pyi

+    lines: Literal[True] = ...,
+    chunksize: None = ...,
+    compression: CompressionOptions = ...,
+    nrows: int | None = ...,
+    storage_options: StorageOptions = ...,
+    dtype_backend: DtypeBackend | NoDefault = ...,
+    engine: Literal["pyarrow"] = ...,


Since the default value of lines is False, if you have ... for both lines and engine, it means that you don't have to specify either. So I think you don't want the ellipses here on either argument.

Dr-Irv · 2024-11-18T16:25:36Z

pandas-stubs/io/json/_json.pyi

@@ -72,6 +98,7 @@ def read_json(
    nrows: int | None = ...,
    storage_options: StorageOptions = ...,
    dtype_backend: DtypeBackend | NoDefault = ...,
+    engine: Literal["pyarrow"] = ...,


I think the ellipses here should be removed.

Dr-Irv · 2024-11-18T16:25:52Z

pandas-stubs/io/json/_json.pyi

    lines: Literal[True],
    chunksize: int,
    compression: CompressionOptions = ...,
    nrows: int | None = ...,
    storage_options: StorageOptions = ...,
    dtype_backend: DtypeBackend | NoDefault = ...,
+    engine: Literal["pyarrow"] = ...,


remove ellipses

Dr-Irv · 2024-11-18T16:26:54Z

pandas-stubs/io/json/_json.pyi

+    lines: Literal[True] = ...,
+    chunksize: None = ...,
+    compression: CompressionOptions = ...,
+    nrows: int | None = ...,
+    storage_options: StorageOptions = ...,
+    dtype_backend: DtypeBackend | NoDefault = ...,
+    engine: Literal["pyarrow"] = ...,


same comment about the ellipses

Dr-Irv · 2024-11-18T16:29:01Z

tests/test_io.py

+    check(
+        assert_type(
+            pd.read_json(dd, lines=True, engine="pyarrow"),
+            pd.DataFrame,
+        ),
+        pd.DataFrame,
+    )


You should add a test with TYPE_CHECKING_INVALID_USAGE that makes sure that we disallow lines=False with engine="pyarrow" .

GHXXX Add overloads of engine for pd.read_json

7603dde

loicdiridollou changed the title ~~GHXXX Add overloads of engine for pd.read_json~~ GH1033 Add overloads of engine for pd.read_json Nov 16, 2024

Dr-Irv requested changes Nov 18, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GH1033 Add overloads of engine for pd.read_json #1035

GH1033 Add overloads of engine for pd.read_json #1035

loicdiridollou commented Nov 16, 2024

loicdiridollou commented Nov 16, 2024

Dr-Irv left a comment

Dr-Irv Nov 18, 2024

Dr-Irv Nov 18, 2024

Dr-Irv Nov 18, 2024

Dr-Irv Nov 18, 2024

Dr-Irv Nov 18, 2024

GH1033 Add overloads of engine for pd.read_json #1035

Are you sure you want to change the base?

GH1033 Add overloads of engine for pd.read_json #1035

Conversation

loicdiridollou commented Nov 16, 2024

loicdiridollou commented Nov 16, 2024

Dr-Irv left a comment

Choose a reason for hiding this comment

Dr-Irv Nov 18, 2024

Choose a reason for hiding this comment

Dr-Irv Nov 18, 2024

Choose a reason for hiding this comment

Dr-Irv Nov 18, 2024

Choose a reason for hiding this comment

Dr-Irv Nov 18, 2024

Choose a reason for hiding this comment

Dr-Irv Nov 18, 2024

Choose a reason for hiding this comment