-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unable to get kaggle_criteo_weekly.txt
#1
Comments
Hey, |
Should the sparse features be converted from the 32 bit hex IDs to contiguous indicies? (similar to the |
So I have forgotten what Torchrec needs. We convert the hex ids to integers, where unique ids are assigned a unique integer. I am happy to share pre-processed data if it helps you. |
Sure. That would be great! It will also help if you can share with me your script to create the JSONL from npz or the raw dataset. |
Okay, share your email, I can send you a link to download data. |
My email is '[email protected]' Thanks |
Shared the data file. Replace the csv processed file with the folder I have shared with you. |
Get. I will have a look. Thank you for your help |
Hi. I read your paper and find your ideas interesting. Thank you for opening your source code.
However, when I try to run the Oracle Cacher, I cannot find an indication on how to get the
kaggle_criteo_weekly.txt
that is required by--processed-csv
. Can you please give me some instructions on how to generate that file from the criteo kaggle or terabytes datasets?Also, I saw that your
CSVLoader
usesorjson
to parse every line. So, I am confused whether it is actually a CSV file or JSONL file? (I have both csv and npz versions of the terabytes dataset. But neither of them seems to work for that argument.)The text was updated successfully, but these errors were encountered: