You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
At first, I used Dump_bin's DumpDataAll mode to import data it worked fine.
Part of the imported data is as follows
df[df['instrument']=='SH600306']
Out[35]:
instrument datetime $volume $factor $close
41691 SH600306 2024-04-23 1022018.0 0.281253 0.686257
41692 SH600306 2024-04-24 1372334.0 0.281253 0.652507
41693 SH600306 2024-04-25 951008.0 0.281253 0.618756
41694 SH600306 2024-04-26 1968818.0 0.281253 0.587818
41695 SH600306 2024-04-29 1532764.0 0.281253 0.559693
But when I append new data with DumpDataUpdate, there is an error.
The original data is as follows
dfraw.loc[(dfraw['date']>'2024-04-29'),['instrument','date','close']]
Out[54]:
instrument date close
4356 SH600306 2024-05-29 0.098438
4357 SH600306 2024-05-30 0.092813
4358 SH600306 2024-05-31 0.101251
4359 SH600306 2024-06-03 0.092813
4360 SH600306 2024-06-04 0.095626
4361 SH600306 2024-06-05 0.092813
4362 SH600306 2024-06-06 0.092813
4363 SH600306 2024-06-07 0.095626
4364 SH600306 2024-06-11 0.090001
4365 SH600306 2024-06-12 0.090001
4366 SH600306 2024-06-13 0.087188
4367 SH600306 2024-06-14 0.081563
I am hoping to debug dump_bin.py to find the problem. I ran it to here,the following code may be problem.
def _data_to_bin(self, df: pd.DataFrame, calendar_list: List[pd.Timestamp], features_dir: Path):
if df.empty:
logger.warning(f"{features_dir.name} data is None or empty")
return
if not calendar_list:
logger.warning("calendar_list is empty")
return
# align index
_df = self.data_merge_calendar(df, calendar_list)
if _df.empty:
logger.warning(f"{features_dir.name} data is not in calendars")
return
When align index, calendar_list does not contain dates such as 2024-05-06, but SH600306 data is empty in these days.
The text was updated successfully, but these errors were encountered:
My guess is that your data is not normalized causing this issue. I tried using the command: python scripts/get_data.py qlib_data --target_dir <user data dir> --region cn
Download the data, confirm that SH600306 exists in this data, and then use the command: python scripts/data_collector/yahoo/collector.py update_data_to_bin --qlib_data_1d_dir <user data dir> --end_date <end date>
Performing an incremental update on the downloaded data did not happen as you described. It is recommended to use this method for incremental updates to the data.
🐛 Bug Description
At first, I used Dump_bin's DumpDataAll mode to import data it worked fine.
Part of the imported data is as follows
df[df['instrument']=='SH600306']
Out[35]:
instrument datetime $volume $factor $close
41691 SH600306 2024-04-23 1022018.0 0.281253 0.686257
41692 SH600306 2024-04-24 1372334.0 0.281253 0.652507
41693 SH600306 2024-04-25 951008.0 0.281253 0.618756
41694 SH600306 2024-04-26 1968818.0 0.281253 0.587818
41695 SH600306 2024-04-29 1532764.0 0.281253 0.559693
But when I append new data with DumpDataUpdate, there is an error.
The original data is as follows
dfraw.loc[(dfraw['date']>'2024-04-29'),['instrument','date','close']]
Out[54]:
instrument date close
4356 SH600306 2024-05-29 0.098438
4357 SH600306 2024-05-30 0.092813
4358 SH600306 2024-05-31 0.101251
4359 SH600306 2024-06-03 0.092813
4360 SH600306 2024-06-04 0.095626
4361 SH600306 2024-06-05 0.092813
4362 SH600306 2024-06-06 0.092813
4363 SH600306 2024-06-07 0.095626
4364 SH600306 2024-06-11 0.090001
4365 SH600306 2024-06-12 0.090001
4366 SH600306 2024-06-13 0.087188
4367 SH600306 2024-06-14 0.081563
Some of the imported data is shown below
dfnew[dfnew.instrument=='SH600306']
Out[8]:
instrument datetime $volume $factor $close
10288 SH600306 2024-04-22 363992.0 0.281253 0.722820
10289 SH600306 2024-04-23 1022018.0 0.281253 0.686257
10290 SH600306 2024-04-24 1372334.0 0.281253 0.652507
10291 SH600306 2024-04-25 951008.0 0.281253 0.618756
10292 SH600306 2024-04-26 1968818.0 0.281253 0.587818
10293 SH600306 2024-04-29 1532764.0 0.281253 0.559693
10294 SH600306 2024-04-30 188390272.0 0.281253 0.098438
10295 SH600306 2024-05-06 117053368.0 0.281253 0.092813
10296 SH600306 2024-05-07 99965448.0 0.281253 0.101251
10297 SH600306 2024-05-08 85975896.0 0.281253 0.092813
10298 SH600306 2024-05-09 46003664.0 0.281253 0.095626
10299 SH600306 2024-05-10 61825620.0 0.281253 0.092813
10300 SH600306 2024-05-13 26138518.0 0.281253 0.092813
10301 SH600306 2024-05-14 19884768.0 0.281253 0.095626
10302 SH600306 2024-05-15 24197052.0 0.281253 0.090001
10303 SH600306 2024-05-16 12483558.0 0.281253 0.090001
10304 SH600306 2024-05-17 9390678.0 0.281253 0.087188
10305 SH600306 2024-05-20 27141916.0 0.281253 0.081563
I am hoping to debug dump_bin.py to find the problem. I ran it to here,the following code may be problem.
When align index, calendar_list does not contain dates such as 2024-05-06, but SH600306 data is empty in these days.
The text was updated successfully, but these errors were encountered: