Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use huggingface_hub cache #7105

Merged
merged 19 commits into from
Aug 21, 2024
Merged
4 changes: 2 additions & 2 deletions .github/conda/meta.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ requirements:
- dataclasses
- multiprocess
- fsspec
- huggingface_hub >=0.21.2,<1.0.0
- huggingface_hub >=0.22.0,<1.0.0
- packaging
- aiohttp
run:
Expand All @@ -41,7 +41,7 @@ requirements:
- dataclasses
- multiprocess
- fsspec
- huggingface_hub >=0.21.2,<1.0.0
- huggingface_hub >=0.22.0,<1.0.0
- packaging
- aiohttp

Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,7 @@ jobs:
run: uv pip install --system --upgrade pyarrow huggingface-hub dill
- name: Install dependencies (minimum versions)
if: ${{ matrix.deps_versions != 'deps-latest' }}
run: uv pip install --system pyarrow==15.0.0 huggingface-hub==0.21.2 transformers dill==0.3.1.1
run: uv pip install --system pyarrow==15.0.0 huggingface-hub==0.22.0 transformers dill==0.3.1.1
- name: Test with pytest
run: |
python -m pytest -rfExX -m ${{ matrix.test }} -n 2 --dist loadfile -sv ./tests/
Expand Down
2 changes: 0 additions & 2 deletions docs/source/audio_dataset.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -523,8 +523,6 @@ The reason you need to use a combination of [`~DownloadManager.download`] and [`
```py
def _split_generators(self, dl_manager):
"""Returns SplitGenerators."""
dl_manager.download_config.ignore_url_params = True

audio_path = dl_manager.download(_AUDIO_URL)
local_extracted_archive = dl_manager.extract(audio_path) if not dl_manager.is_streaming else None
path_to_clips = "librivox-indonesia"
Expand Down
2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -133,7 +133,7 @@
# for data streaming via http
"aiohttp",
# To get datasets from the Datasets Hub on huggingface.co
"huggingface-hub>=0.21.2",
"huggingface-hub>=0.22.0",
# Utilities from PyPA to e.g., compare versions
"packaging",
# To parse YAML metadata from dataset cards
Expand Down
9 changes: 0 additions & 9 deletions src/datasets/builder.py
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,6 @@
from .dataset_dict import DatasetDict, IterableDatasetDict
from .download.download_config import DownloadConfig
from .download.download_manager import DownloadManager, DownloadMode
from .download.mock_download_manager import MockDownloadManager
from .download.streaming_download_manager import StreamingDownloadManager, xjoin
from .exceptions import DatasetGenerationCastError, DatasetGenerationError, FileFormatError, ManualDownloadError
from .features import Features
Expand Down Expand Up @@ -931,14 +930,6 @@ def download_and_prepare(
)

is_local = not is_remote_filesystem(self._fs)

if (
isinstance(dl_manager, MockDownloadManager)
or not is_local
or file_format != "arrow"
or max_shard_size is not None
):
try_from_hf_gcs = False
self.dl_manager = dl_manager

# Prevent parallel local disk operations
Expand Down
2 changes: 0 additions & 2 deletions src/datasets/commands/datasets_cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,6 @@
from datasets.commands.convert import ConvertCommand
from datasets.commands.convert_to_parquet import ConvertToParquetCommand
from datasets.commands.delete_from_hub import DeleteFromHubCommand
from datasets.commands.dummy_data import DummyDataCommand
from datasets.commands.env import EnvironmentCommand
from datasets.commands.test import TestCommand
from datasets.utils.logging import set_verbosity_info
Expand All @@ -25,7 +24,6 @@ def main():
ConvertCommand.register_subcommand(commands_parser)
EnvironmentCommand.register_subcommand(commands_parser)
TestCommand.register_subcommand(commands_parser)
DummyDataCommand.register_subcommand(commands_parser)
ConvertToParquetCommand.register_subcommand(commands_parser)
DeleteFromHubCommand.register_subcommand(commands_parser)

Expand Down
Loading
Loading