-
Notifications
You must be signed in to change notification settings - Fork 142
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stock splits values sometimes shifted by one day #142
Comments
Hi @micktg, when did you first notice this? Are the close prices similarly misaligned or is it just the split data? |
I first noticed this not so long ago. Maybe a few weeks tops. I read in the changelogs for v2.3.0, that the history method has been refactored, so my guess is, that this might have something to do with that. The weird thing is, that I could never provoke that behavior (wrong history data respectively wrong split dates) if test it and my code seems to collect the correct data for various other stock symbols as well. |
Could certainly be related to v2.3. I've had a look through the changes although nothing's immediately jumping out at me. Few more questions:
The ideal would be if you could offer a concrete minimal example of how price requests for the same symbol at different times has returned differently indexed data. Thanks for raising this. |
If I run the code (see above) right now, I get the following result "2022-09-23": 67.95999908447266,
"2022-09-26": 66.30000305175781, <-- Monday (Sunday missing)
"2022-09-27": 67.16999816894531,
"2022-09-28": 68.36000061035156,
"2022-09-29": 64.13999938964844,
"2022-09-30": 63.36000061035156,
"2022-10-03": 66.11000061035156, This is my databse entry from prd "2022-09-23": 67.95999908447266,
"2022-09-25": 66.30000305175781, <-- Sunday
"2022-09-26": 66.30000305175781, <-- Monday
"2022-09-27": 67.16999816894531,
"2022-09-28": 68.36000061035156,
"2022-09-29": 64.13999938964844,
"2022-09-30": 63.36000061035156,
"2022-10-02": 66.11000061035156,
"2022-10-03": 66.11000061035156, Somehow in prd there are entries for the weekend. This is in and of itself nothing bad, but this indicated to me, that there might be so different behavior either in the yahoo history response or maybe in your history function than it was before.
So, its NASDAQ, NYSE, and even a German stock exchange. So, this does not seem to be a behavior specific to an exchange.
Here is an excerpt if (not model.delisted and (not last_update or is_new_day(last_update))):
close_history, splits_temp = get_price_close_history(ticker, model.symbol) |
I think I've worked out what's happening. The data being returned by How daily data is indexed was changed in v2.3 to fix some bugs and provide for more consistent and meaningful indexing. The 'date' level of the index always has dtype 'object'. Closed sessions are represented by a Why am I saying all this? Your code is using import yahooquery as yq
amd = yq.Ticker("AMD")
df = amd.history(interval="1d", start="2023-02-01", end="2023-02-04")
df
df.reset_index()
However, if there is a live indice then this is happening... df = amd.history(interval="1d", start="2023-02-01", end="2023-02-07")
df
df.reset_index()
It appears that on creating a column from the 'date' level pandas is coercing the dtype from 'object' to a tz-aware dtype based on the live indice, in this case The issue is not with Hope that helps. |
Foremost, thank you for the effort and the detailed explanation. What I have not quite understood yet is: How can I provoke that a live index is displayed to debug my code? Furthermore, I will take your advice into account and do without |
No worries.
Yup. Although if it's to play around then just use a symbol that trades 24 hours, for example a future or a currency. |
Can you give me some more hints how to get a live index during test? Here some example:
QM=F
EURUSD=X
I don't get the live indices and cannot recreate the behavior with the shifted dates I am definitely running on |
? I get a live interval for all these (at the moment at least). >>> import yahooquery as yq
>>> yq.__version__
'2.3.0'
Are you processing the returned data? Can you include the full code that you ran to get those returns. |
I just copied and pasted your snippet:
Somehow, it does not display the live indices |
That is odd. Also, that call should not return data for 2023-02-07. What versions of pandas and numpy are you using? If they aren't the latest versions then could you upgrade them and try again. I'm assuming the Also, could you let me have your return from |
I created a # Create virtualenv with desired python version (here v3.11)
virtualenv -p python3.11 venv; source ./venv/bin/activate
# Install all requirements
./venv/bin/python3.11 -m pip install -r requirements.txt # requirements.txt
# -----------------------------------------------------------------------------
# Financial provider
# -----------------------------------------------------------------------------
# Yahoo Finance API
# https://github.com/dpguthrie/yahooquery
yahooquery==2.3.0
# https://github.com/ranaroussi/yfinance
# yfinance==0.2.3
# -----------------------------------------------------------------------------
# Dev
# -----------------------------------------------------------------------------
# logs
loguru==0.6.0
# formatter
autopep8==2.0.1
# linting
pylint==2.15.9 Here is the result of Package Version
------------------ ---------
astroid 2.13.2
autopep8 2.0.1
certifi 2022.12.7
charset-normalizer 2.1.1
dill 0.3.6
idna 3.4
isort 5.11.4
lazy-object-proxy 1.9.0
loguru 0.6.0
lxml 4.9.2
mccabe 0.7.0
numpy 1.24.1
pandas 1.5.2
pip 23.0
platformdirs 2.6.2
pycodestyle 2.10.0
pylint 2.15.9
python-dateutil 2.8.2
pytz 2022.7
requests 2.28.1
requests-futures 1.0.0
setuptools 65.6.3
six 1.16.0
tomlkit 0.11.6
tqdm 4.64.1
typing_extensions 4.4.0
urllib3 1.26.13
wheel 0.38.4
wrapt 1.14.1
yahooquery 2.3.0 These are the versions of the asked libs. I did not install them directly. They are installed as dependencies from some other lib numpy 1.24.1
pandas 1.5.2 No hacks from my side. yq.__version__: 2.3.0
pd.__version__: 1.5.2
np.__version__: 1.24.1
pd.Timestamp.now(): 2023-02-10 21:37:39.073704
open high low close volume adjclose
symbol date
QM=F 2023-02-07 74.500000 77.599998 74.349998 77.150002 15376 77.150002
2023-02-08 77.525002 78.574997 77.074997 78.474998 12726 78.474998
2023-02-09 78.474998 78.849998 76.525002 78.050003 12726 78.050003
2023-02-11 77.599998 80.324997 77.449997 79.849998 13089 79.849998
I guess the last value should not be the 11th of February. It was 10th of February when I last posted here. |
I'm in the same timezone (at least according to the Spanish state), so I don't think it's to do with that. I've also run it with Python 3.11 and I'm getting back the live indice as before. Could you run the following and post the print... import yahooquery as yq
import pandas as pd
symbol = "QM=F"
ticker = yq.Ticker(symbol)
start = yq.utils._convert_to_timestamp("2023-02-08")
end = yq.utils._convert_to_timestamp(None, start=False)
print(f"{start=}, {end=}")
params = {"period1": start, "period2": end, "interval": "1d"}
data = ticker._get_data("chart", params)
index = data[symbol]["timestamp"]
print(f"{index=}")
dti = pd.to_datetime(index, unit="s")
print(f"{dti=}") |
Here is the result: start=1675810800, end=1676065639
index=[1675746000, 1675832400, 1675918800, 1676064967]
dti=DatetimeIndex(['2023-02-07 05:00:00', '2023-02-08 05:00:00',
'2023-02-09 05:00:00', '2023-02-10 21:36:07'],
dtype='datetime64[ns]', freq=None) These date time values make sense to me Here again the output from import yahooquery as yq
import pandas as pd
import numpy as np
print("yq.__version__:", yq.__version__)
print("pd.__version__:", pd.__version__)
print("np.__version__:", np.__version__)
print("pd.Timestamp.now():", pd.Timestamp.now())
ticker = yq.Ticker("QM=F")
print(ticker.history(start="2023-02-08")) Output yq.__version__: 2.3.0
pd.__version__: 1.5.2
np.__version__: 1.24.1
pd.Timestamp.now(): 2023-02-10 22:49:02.618941
open high low close volume adjclose
symbol date
QM=F 2023-02-07 74.500000 77.599998 74.349998 77.150002 15376 77.150002
2023-02-08 77.525002 78.574997 77.074997 78.474998 12726 78.474998
2023-02-09 78.474998 78.849998 76.525002 78.050003 12726 78.050003
2023-02-11 77.599998 80.324997 77.449997 79.824997 13296 79.824997 Just to confirm. that there is some difference in the same environment |
At least part of the bug (possibly all of it) is in I'll add a fix as another commit to the current PR #141. In the meantime, try changing the def _convert_to_timestamp(date=None, start=True):
if date is not None:
return int(pd.Timestamp(date).timestamp())
if start:
return int(pd.Timestamp("1942-01-01").timestamp())
return int(pd.Timestamp.now().timestamp()) Let me know how you get on! |
I did change the function # def _convert_to_timestamp(date=None, start=True):
# if date is None:
# date = int((-858880800 * start) + (time.time() * (not start)))
# elif isinstance(date, datetime.datetime):
# date = int(time.mktime(date.timetuple()))
# else:
# date = int(time.mktime(time.strptime(str(date), "%Y-%m-%d")))
# return date
def _convert_to_timestamp(date=None, start=True):
print(f"TODO test new '_convert_to_timestamp' function")
if date is not None:
return int(pd.Timestamp(date).timestamp())
if start:
return int(pd.Timestamp("1942-01-01").timestamp())
return int(pd.Timestamp.now().timestamp()) and ran the following code: import yahooquery as yq
import pandas as pd
import numpy as np
symbol = "QM=F"
ticker = yq.Ticker(symbol)
start = yq.utils._convert_to_timestamp("2023-02-08")
end = yq.utils._convert_to_timestamp(None, start=False)
print(f"{start=}, {end=}")
params = {"period1": start, "period2": end, "interval": "1d"}
data = ticker._get_data("chart", params)
index = data[symbol]["timestamp"]
print(f"{index=}")
dti = pd.to_datetime(index, unit="s")
print(f"{dti=}")
print("yq.__version__:", yq.__version__)
print("pd.__version__:", pd.__version__)
print("np.__version__:", np.__version__)
print("pd.Timestamp.now():", pd.Timestamp.now())
ticker = yq.Ticker("QM=F")
print(ticker.history(start="2023-02-08")) Result: TODO test new '_convert_to_timestamp' function
TODO test new '_convert_to_timestamp' function
start=1675814400, end=1676074563
index=[1675832400, 1675918800, 1676066385]
dti=DatetimeIndex(['2023-02-08 05:00:00', '2023-02-09 05:00:00',
'2023-02-10 21:59:45'],
dtype='datetime64[ns]', freq=None)
yq.__version__: 2.3.0
pd.__version__: 1.5.2
np.__version__: 1.24.1
pd.Timestamp.now(): 2023-02-11 00:16:03.289537
TODO test new '_convert_to_timestamp' function
TODO test new '_convert_to_timestamp' function
open high low close volume adjclose
symbol date
QM=F 2023-02-08 77.525002 78.574997 77.074997 78.474998 12726 78.474998
2023-02-09 78.474998 78.849998 76.525002 78.050003 12726 78.050003
2023-02-11 77.599998 80.324997 77.449997 79.824997 13408 79.824997 |
I found a python snippet to simulate another timezone: import os, time
print(time.strftime('%X %x %Z'))
os.environ['TZ'] = 'Europe/London'
time.tzset()
print(time.strftime('%X %x %Z')) When I run the code again, I also got the live indices now: 00:26:44 02/11/23 CET
23:26:44 02/10/23 GMT
symbol date
QM=F 2023-02-08 77.525002 78.574997 77.074997 78.474998 12726 78.474998
2023-02-09 78.474998 78.849998 76.525002 78.050003 12726 78.050003
2023-02-10 16:59:45-05:00 77.599998 80.324997 77.449997 79.824997 13408 79.824997 |
👍 The data coming back from yahoo is now as required, so the bug seems to be between line 1279 of ticker.py (the call to _historical_data_to_dataframe in ticker.history) and the end of ticker.history. As you've shown, it's appears to be to do with the timezone. I'll try and have another look over the weekend. |
Great! Let me know if I can provide any help. My guess is, that you should be able to simulate the behavior that I get via setting my timezone like import os, time
os.environ['TZ'] = 'Europe/Berlin'
time.tzset() |
I think I've found it... yahooquery/yahooquery/utils/__init__.py Lines 125 to 126 in 81b3e09
It's a bug in the v2.3 implementation. I wrongly assumed that I'll add another commit to #141 to fix. With that, I'm hopeful that all queries raised in this issue, and bugs that have been brought to light by it are now resolved. Let me know otherwise. To get it working in the meantime replace the two lines above in your local install with the following single line: Cheers for raising it. |
Fixes a v2.3 bug due to wrongly assuming that `pd.Timestamp.fromtimestamp` converts based on UTC by default (actually converts based on system time).
Awesome job! I am eager to see if that solves my problem. |
Thanks for the fix. |
@maread99 @dpguthrie symbol date ... EDIT: All other tickers I'm tracking are on the ASX which opens at 10:00 and closes at 16:00 AEDT, and for these, the history function returns the correct data. From a purely speculative observation, it looks as if close times on 24hr tickers are marked as 00:00:00 for time of closing (which is actually 24hrs prior). |
Describe the bug
I am using v2.3.0
I have a script that loads the historical data like prices and also stock splits once a day.
It has already happened several times that these split data are shifted by one day and then added to my database accordingly wrong. If I load e.g.: the split data from AMZN, the (correct) result is like this
However, it happens (irregularly) that the data are loaded incorrectly, and then the result is like this
In my database, the old (correct) values are merged with the new ones, which then looks accordingly like this
The code to determine the splits is as follows:
Typically, this works very well. I could never provoke it during testing that the data was loaded incorrectly. It happens strangely only in production that at some point wrong split data is suddenly added to the database.
I am not sure now if I have a bug in the code, the historical data from Yahoo is sometimes not correct, or maybe there is a bug in Yahooquery. I am grateful for any advice.
The text was updated successfully, but these errors were encountered: