The instructions below allow to generate the crops used for pre-training CroCo v2 from the following real-world datasets: ARKitScenes, MegaDepth, 3DStreetView and IndoorVL.
First, download the metadata and put them in ./data/
:
mkdir -p data
cd data/
wget https://download.europe.naverlabs.com/ComputerVision/CroCo/data/crop_metadata.zip
unzip crop_metadata.zip
rm crop_metadata.zip
cd ..
Second, download the original datasets in ./data/original_datasets/
.
mkdir -p data/original_datasets
Download the raw
dataset from https://github.com/apple/ARKitScenes/blob/main/DATA.md and put it in ./data/original_datasets/ARKitScenes/
.
The resulting file structure should be like:
./data/original_datasets/ARKitScenes/
└───Training
└───40753679
│ │ ultrawide
│ │ ...
└───40753686
│
...
Download MegaDepth v1 Dataset
from https://www.cs.cornell.edu/projects/megadepth/ and put it in ./data/original_datasets/MegaDepth/
.
The resulting file structure should be like:
./data/original_datasets/MegaDepth/
└───0000
│ └───images
│ │ │ 1000557903_87fa96b8a4_o.jpg
│ │ └ ...
│ └─── ...
└───0001
│ │
│ └ ...
└─── ...
Download 3D_Street_View
dataset from https://github.com/amir32002/3D_Street_View and put it in ./data/original_datasets/3DStreetView/
.
The resulting file structure should be like:
./data/original_datasets/3DStreetView/
└───dataset_aligned
│ └───0002
│ │ │ 0000002_0000001_0000002_0000001.jpg
│ │ └ ...
│ └─── ...
└───dataset_unaligned
│ └───0003
│ │ │ 0000003_0000001_0000002_0000001.jpg
│ │ └ ...
│ └─── ...
Download the IndoorVL
datasets using Kapture.
pip install kapture
mkdir -p ./data/original_datasets/IndoorVL
cd ./data/original_datasets/IndoorVL
kapture_download_dataset.py update
kapture_download_dataset.py install "HyundaiDepartmentStore_*"
kapture_download_dataset.py install "GangnamStation_*"
cd -
Now, extract the crops for each of the dataset:
for dataset in ARKitScenes MegaDepth 3DStreetView IndoorVL;
do
python3 datasets/crops/extract_crops_from_images.py --crops ./data/crop_metadata/${dataset}/crops_release.txt --root-dir ./data/original_datasets/${dataset}/ --output-dir ./data/${dataset}_crops/ --imsize 256 --nthread 8 --max-subdir-levels 5 --ideal-number-pairs-in-dir 500;
done
Due to some legal issues, we can only release 144,228 pairs out of the 1,593,689 pairs used in the paper. To account for it in terms of number of pre-training iterations, the pre-training command in this repository uses 125 training epochs including 12 warm-up epochs and learning rate cosine schedule of 250, instead of 100, 10 and 200 respectively. The impact on the performance is negligible.