Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recommended way to write to S3? #374

Open
codeananda opened this issue Mar 11, 2024 · 2 comments
Open

Recommended way to write to S3? #374

codeananda opened this issue Mar 11, 2024 · 2 comments

Comments

@codeananda
Copy link

codeananda commented Mar 11, 2024

I can easily read from S3 out of the box (assuming the required env variables are set).

But I cannot write to S3 out of the box.

This works

import geopandas
from dotenv import load_dotenv

load_dotenv(".env")

a = geopandas.read_file("s3://bucket-name/key.gpkg" ,engine='pyogrio')

But this doesn't

a.to_file("s3://bucket-name/written_by_geopandas.gpkg", engine='pyogrio')

Any ideas?

2024-03-11 15:11:06.305 | INFO     | __main__:_write_updated_titiles_to_disk:149 - Writing updated titles GeoDataFrame to disk.
2024-03-11 15:11:06.746 | ERROR    | writers:write:46 - 
        Could not write to file: s3://landstack-big-bertha/grid_test_1/titles_with_distances_and_intersections_20240311_150903.gpkg
        gdf.columns=#columns here...
      dtype='object')
        gdf.head()=    POLY_ID                                           geometry  ...  column_here another_column_here
0  56352124  POLYGON ((348033.380 169232.193, 348033.380 16...  ...                                         NaN                                NaN
1  54913918  POLYGON ((360220.150 169892.100, 360216.954 16...  ...                                         NaN                                NaN
2  56739811  POLYGON ((361916.946 179819.353, 361912.807 17...  ...                                         NaN                                NaN
3  54956921  POLYGON ((359997.850 167736.400, 359998.050 16...  ...                                         NaN                                NaN
4  19424703  POLYGON ((355649.200 176617.900, 355651.400 17...  ...                                         NaN                                NaN

[5 rows x 54 columns]
2024-03-11 15:11:06.747 | ERROR    | writers:write:51 - sqlite3_open(/vsis3/landstack-big-bertha/grid_test_1/titles_with_distances_and_intersections_20240311_150903.gpkg) failed: unable to open database file
Traceback (most recent call last):

  File "pyogrio/_io.pyx", line 1603, in pyogrio._io.ogr_create
    ogr_dataset = exc_wrap_pointer(GDALCreate(ogr_driver, path_c, 0, 0, 0, GDT_Unknown, options))
  File "pyogrio/_err.pyx", line 179, in pyogrio._err.exc_wrap_pointer
    raise exc

pyogrio._err.CPLE_OpenFailedError: sqlite3_open(/vsis3/landstack-big-bertha/grid_test_1/titles_with_distances_and_intersections_20240311_150903.gpkg) failed: unable to open database file


During handling of the above exception, another exception occurred:


Traceback (most recent call last):

  File "/home/ec2-user/big_bertha/compute_distances_and_intersections.py", line 699, in <module>
    output_gdf = intersector.compute_distances_and_intersections()
                 │           └ <function GridIntersector.compute_distances_and_intersections at 0x7f988cd51ab0><__main__.GridIntersector object at 0x7f988d7ec9a0>

  File "/home/ec2-user/big_bertha/compute_distances_and_intersections.py", line 116, in compute_distances_and_intersections
    output_file = self._write_updated_titiles_to_disk(titles_gdf)
                  │    │                              └         POLY_ID                                           geometry  ...  os_air_travel_interchange_pct_intersection  os_trans...
                  │    └ <function GridIntersector._write_updated_titiles_to_disk at 0x7f988cd51bd0><__main__.GridIntersector object at 0x7f988d7ec9a0>

  File "/home/ec2-user/big_bertha/compute_distances_and_intersections.py", line 154, in _write_updated_titiles_to_disk
    self._gdf_writer.write(titles_gdf, titles_output_file)
    │    │           │     │           └ S3Path('s3://landstack-big-bertha/grid_test_1/titles_with_distances_and_intersections_20240311_150903.gpkg')
    │    │           │     └         POLY_ID                                           geometry  ...  os_air_travel_interchange_pct_intersection  os_trans...
    │    │           └ <function GDFWriter.write at 0x7f988cd516c0>
    │    └ <writers.GDFWriter object at 0x7f988d7ee5c0><__main__.GridIntersector object at 0x7f988d7ec9a0>

> File "/home/ec2-user/big_bertha/writers.py", line 44, in write
    gdf.to_file(output_path, mode=mode)
    │   │       │                 └ 'w'
    │   │       └ S3Path('s3://landstack-big-bertha/grid_test_1/titles_with_distances_and_intersections_20240311_150903.gpkg')
    │   └ <function GeoDataFrame.to_file at 0x7f988ea9d630>POLY_ID                                           geometry  ...  os_air_travel_interchange_pct_intersection  os_trans...

  File "/home/ec2-user/.cache/pypoetry/virtualenvs/big-bertha-zeGGBcQi-py3.10/lib/python3.10/site-packages/geopandas/geodataframe.py", line 1246, in to_file
    _to_file(self, filename, driver, schema, index, **kwargs)
    │        │     │         │       │       │        └ {'mode': 'w'}
    │        │     │         │       │       └ None
    │        │     │         │       └ None
    │        │     │         └ None
    │        │     └ S3Path('s3://landstack-big-bertha/grid_test_1/titles_with_distances_and_intersections_20240311_150903.gpkg')
    │        └         POLY_ID                                           geometry  ...  os_air_travel_interchange_pct_intersection  os_trans...
    └ <function _to_file at 0x7f988ea9f130>
  File "/home/ec2-user/.cache/pypoetry/virtualenvs/big-bertha-zeGGBcQi-py3.10/lib/python3.10/site-packages/geopandas/io/file.py", line 635, in _to_file
    _to_file_pyogrio(df, filename, driver, schema, crs, mode, **kwargs)
    │                │   │         │       │       │    │       └ {}
    │                │   │         │       │       │    └ 'w'
    │                │   │         │       │       └ None
    │                │   │         │       └ None
    │                │   │         └ 'GPKG'
    │                │   └ S3Path('s3://landstack-big-bertha/grid_test_1/titles_with_distances_and_intersections_20240311_150903.gpkg')
    │                └         POLY_ID                                           geometry  ...  os_air_travel_interchange_pct_intersection  os_trans...
    └ <function _to_file_pyogrio at 0x7f988ea9f250>
  File "/home/ec2-user/.cache/pypoetry/virtualenvs/big-bertha-zeGGBcQi-py3.10/lib/python3.10/site-packages/geopandas/io/file.py", line 685, in _to_file_pyogrio
    pyogrio.write_dataframe(df, filename, driver=driver, **kwargs)
    │       │               │   │                │         └ {}
    │       │               │   │                └ 'GPKG'
    │       │               │   └ S3Path('s3://landstack-big-bertha/grid_test_1/titles_with_distances_and_intersections_20240311_150903.gpkg')
    │       │               └         POLY_ID                                           geometry  ...  os_air_travel_interchange_pct_intersection  os_trans...
    │       └ <function write_dataframe at 0x7f988cd03760><module 'pyogrio' from '/home/ec2-user/.cache/pypoetry/virtualenvs/big-bertha-zeGGBcQi-py3.10/lib/python3.10/site-packages/py...
  File "/home/ec2-user/.cache/pypoetry/virtualenvs/big-bertha-zeGGBcQi-py3.10/lib/python3.10/site-packages/pyogrio/geopandas.py", line 548, in write_dataframe
    write(
    └ <function write at 0x7f988cd035b0>
  File "/home/ec2-user/.cache/pypoetry/virtualenvs/big-bertha-zeGGBcQi-py3.10/lib/python3.10/site-packages/pyogrio/raw.py", line 530, in write
    ogr_write(
    └ <cyfunction ogr_write at 0x7f988cec69b0>
  File "pyogrio/_io.pyx", line 1799, in pyogrio._io.ogr_write
    ogr_dataset = ogr_create(path_c, driver_c, dataset_options)
  File "pyogrio/_io.pyx", line 1612, in pyogrio._io.ogr_create
    raise DataSourceError(str(exc))
          └ <class 'pyogrio.errors.DataSourceError'>

pyogrio.errors.DataSourceError: sqlite3_open(/vsis3/landstack-big-bertha/grid_test_1/titles_with_distances_and_intersections_20240311_150903.gpkg) failed: unable to open database file
@codeananda
Copy link
Author

Update: this works

import boto3
from cloudpathlib import S3Path
from loguru import logger

output_path = S3Path("s3://bucket-name/key.gpkg")

if isinstance(output_path, S3Path):
    a.to_file(output_path.name, engine='pyogrio')
    logger.info(f"Written to disk locally {output_path.name}")
    
    s3 = boto3.resource('s3')    
    s3.Bucket(output_path.bucket).upload_file(output_path.name, output_path.key)
    logger.info(f"Uploaded to S3!")

@lclous
Copy link

lclous commented Aug 19, 2024

If using cloudpathlib, this has also been working for me:

from cloudpathlib import S3Path

output_path = S3Path("s3://bucket-name/key.gpkg")

# Create a temporary local file
a.to_file(tmp_path)

# Upload to S3
output_path.upload_from(tmp_path)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants