Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stock splits values sometimes shifted by one day #142

Closed
mickdewald opened this issue Feb 4, 2023 · 23 comments
Closed

Stock splits values sometimes shifted by one day #142

mickdewald opened this issue Feb 4, 2023 · 23 comments

Comments

@mickdewald
Copy link

mickdewald commented Feb 4, 2023

Describe the bug
I am using v2.3.0
I have a script that loads the historical data like prices and also stock splits once a day.

It has already happened several times that these split data are shifted by one day and then added to my database accordingly wrong. If I load e.g.: the split data from AMZN, the (correct) result is like this

{
	"1998-06-02" : 2,
	"1999-01-05" : 3,
	"1999-09-02" : 2,
	"2022-06-06" : 20,
}

However, it happens (irregularly) that the data are loaded incorrectly, and then the result is like this

{
	"1998-06-01" : Double("2"),
	"1999-01-04" : Double("3"),
	"1999-09-01" : Double("2"),
	"2022-06-05" : Double("20")
}

In my database, the old (correct) values are merged with the new ones, which then looks accordingly like this

{
	"1998-06-02" : 2,
	"1999-01-05" : 3,
	"1999-09-02" : 2,
	"2022-06-06" : 20,
	"1998-06-01" : Double("2"),
	"1999-01-04" : Double("3"),
	"1999-09-01" : Double("2"),
	"2022-06-05" : Double("20")
}

The code to determine the splits is as follows:

period, interval = 'max', '1d' # Get complete history

# TODO
# Since v2.3.0 a dataframe will always be returned regardless of bad symbols existing
df: DataFrame | dict = ticker.history(period=period, interval=interval)
assert isinstance(df, DataFrame)
# print(f "df: {df}") # pandas data is easier to read if printed (not logged)

# clean symbol
symbol = symbol.replace("~", "") # e.g. GIG~ -> GIG

# Check if there is data
if symbol in df and 'No data found' in df[symbol]:
    return {}, {}  # {'AABA': 'No data found, symbol may be delisted'}

# Apply 'reset_index' to be able to use df['date']
df = df.reset_index()
# print(f "df.reset_index()")
# print(f "df: {df}")

# convert into json object
# {datetime.date(2021, 12, 15): 179.3000030517578, ...}
# {timestamp('2022-04-19 09:30:00'): 95.7300033569336, ...}
price_closing_tmp = Series(df.close.values, index=df.date).to_dict()
# print(f "price_closing_tmp: {price_closing_tmp}")

# convert closing prices in format (MongoDB only allows strings as keys)
# price_closing = df['close'].tolist()
# When using "interval='1h'", there are lots of data points per day, but this is not a problem
# because only the latest data point per day will be stored in our list
# Previous values for a day will be overwritten with "price_closing[date] = value"
for key, value in price_closing_tmp.items():
    date = key.strftime('%Y-%m-%d') # type:ignore - 2022-03-10 as string
    price_closing[date] = value

# DEBUG
# print_json_dict(price_closing)

# check if the dataframe does contain splits
if hasattr(df, 'splits'):
    splits_tmp = Series(df.splits.values, index=df.date).to_dict()
    # print(f "splits_tmp: {splits_tmp}")

    for key, value in splits_tmp.items():
        date = key.strftime('%Y-%m-%d') # type:ignore - 2022-03-10 as string
        if value == 0.0 or value is None or isnan(value):
            continue # ignore None, NaN or 0.0 values for split
        splits[date] = value

    # DEBUG
    print_json_dict(splits)

Typically, this works very well. I could never provoke it during testing that the data was loaded incorrectly. It happens strangely only in production that at some point wrong split data is suddenly added to the database.

I am not sure now if I have a bug in the code, the historical data from Yahoo is sometimes not correct, or maybe there is a bug in Yahooquery. I am grateful for any advice.

@maread99
Copy link
Contributor

maread99 commented Feb 5, 2023

Hi @micktg, when did you first notice this?

Are the close prices similarly misaligned or is it just the split data?

@mickdewald
Copy link
Author

I first noticed this not so long ago. Maybe a few weeks tops. I read in the changelogs for v2.3.0, that the history method has been refactored, so my guess is, that this might have something to do with that.

The weird thing is, that I could never provoke that behavior (wrong history data respectively wrong split dates) if test it and my code seems to collect the correct data for various other stock symbols as well.

@maread99
Copy link
Contributor

maread99 commented Feb 6, 2023

Could certainly be related to v2.3. I've had a look through the changes although nothing's immediately jumping out at me.

Few more questions:

  1. Are the close prices similarly misaligned or is it just the split data?
  2. On which exchange(s) do the symbols trade? (at least those for which the shifting is occurring).
  3. Do you update the database at the same time every day? If not, could it be that the shifting is happening only when you're updating at a certain time of the day?

The ideal would be if you could offer a concrete minimal example of how price requests for the same symbol at different times has returned differently indexed data.

Thanks for raising this.

@mickdewald
Copy link
Author

mickdewald commented Feb 6, 2023

  1. Now that you mention it, yes, there is also something going on with the prices
    Example AMD in dev

If I run the code (see above) right now, I get the following result

  "2022-09-23": 67.95999908447266,
  "2022-09-26": 66.30000305175781,                        <-- Monday (Sunday missing)
  "2022-09-27": 67.16999816894531,
  "2022-09-28": 68.36000061035156,
  "2022-09-29": 64.13999938964844,
  "2022-09-30": 63.36000061035156,
  "2022-10-03": 66.11000061035156,

This is my databse entry from prd
Example AMD in prd

  "2022-09-23": 67.95999908447266,
  "2022-09-25": 66.30000305175781,                        <-- Sunday
  "2022-09-26": 66.30000305175781,                        <-- Monday
  "2022-09-27": 67.16999816894531,
  "2022-09-28": 68.36000061035156,
  "2022-09-29": 64.13999938964844,
  "2022-09-30": 63.36000061035156,
  "2022-10-02": 66.11000061035156,
  "2022-10-03": 66.11000061035156,

Somehow in prd there are entries for the weekend. This is in and of itself nothing bad, but this indicated to me, that there might be so different behavior either in the yahoo history response or maybe in your history function than it was before.

  1. I noticed the weird split behavior for the following symbols
  • AAPL
  • ADS.DE
  • BRK-B
  • V
  • TSLA
  • SHOP
  • NVDA
  • NKE
  • NFLX
  • MSFT
  • MCD
  • MA
  • MA
  • GOOGL
  • AMZN
  • AMD

So, its NASDAQ, NYSE, and even a German stock exchange. So, this does not seem to be a behavior specific to an exchange.

  1. I don't update the database at the same time. My algorithm checks if the last update is older than today and checks once a day for new history data. This means it could happen at any time a user opens my app.

Here is an excerpt

if (not model.delisted and (not last_update or is_new_day(last_update))):
    close_history, splits_temp = get_price_close_history(ticker, model.symbol)

@maread99
Copy link
Contributor

maread99 commented Feb 6, 2023

I think I've worked out what's happening.

The data being returned by Ticker.history is correct and is being indexed as intended.

How daily data is indexed was changed in v2.3 to fix some bugs and provide for more consistent and meaningful indexing. The 'date' level of the index always has dtype 'object'. Closed sessions are represented by a datatime.date object. If the current session is open, or hasn't long closed, then Yahoo timestamps this with the time of the last trade. I refer to this as a 'live indice'. It's important to be able to distinguish a live indice from a closed session in order to not unwittingly assume that the 'close' is the session close rather than the live price that it actually represents. For this reason a live indice is represented with a datetime.datetime object that gives the time of the last trade and makes clear the timezone of that time. (For more info on the live indices see this comment on the issue fixed in v2.3).

Why am I saying all this? Your code is using DataFrame.index_reset. If there's no live indice then it does the following (which is how it will always have worked prior to v2.3):

import yahooquery as yq
amd = yq.Ticker("AMD")
df = amd.history(interval="1d", start="2023-02-01", end="2023-02-04")
df
open high low close volume adjclose
symbol date
AMD 2023-02-01 78.470001 85.480003 77.879997 84.639999 152548200 84.639999
2023-02-02 84.290001 88.940002 83.199997 88.309998 97762100 88.309998
2023-02-03 86.660004 88.800003 85.830002 86.089996 66525400 86.089996
df.reset_index()
symbol date open high low close volume adjclose
0 AMD 2023-02-01 78.470001 85.480003 77.879997 84.639999 152548200 84.639999
1 AMD 2023-02-02 84.290001 88.940002 83.199997 88.309998 97762100 88.309998
2 AMD 2023-02-03 86.660004 88.800003 85.830002 86.089996 66525400 86.089996

However, if there is a live indice then this is happening...

df = amd.history(interval="1d", start="2023-02-01", end="2023-02-07")
df
open high low close volume adjclose
symbol date
AMD 2023-02-01 78.470001 85.480003 77.879997 84.639999 152548200 84.639999
2023-02-02 84.290001 88.940002 83.199997 88.309998 97762100 88.309998
2023-02-03 86.660004 88.800003 85.830002 86.089996 66525400 86.089996
2023-02-06 12:15:01-05:00 84.629997 86.278999 83.820000 84.174004 29580744 84.174004
df.reset_index()
symbol date open high low close volume adjclose
0 AMD 2023-01-31 19:00:00-05:00 78.470001 85.480003 77.879997 84.639999 152548200 84.639999
1 AMD 2023-02-01 19:00:00-05:00 84.290001 88.940002 83.199997 88.309998 97762100 88.309998
2 AMD 2023-02-02 19:00:00-05:00 86.660004 88.800003 85.830002 86.089996 66525400 86.089996
3 AMD 2023-02-06 12:15:01-05:00 84.629997 86.278999 83.820000 84.174004 29580744 84.174004

It appears that on creating a column from the 'date' level pandas is coercing the dtype from 'object' to a tz-aware dtype based on the live indice, in this case datetime64[ns, America/New_York]. The consequence is that all the date objects are converted to tz-aware values. Your key.strftime('%Y-%m-%d') code is then wrongly assuming the day as the day prior to the day that the indice refers to. (Although this wouldn't explain misalignment of stocks listed on a German exchange, in which case the times would move forward rather than back).

The issue is not with yahooquery but with letting pandas coerce the dtype. Instead, if the dataframe has data for only a single symbol then just drop the 'symbol' level with df.droplevel(0) and then access the remaining index directly. Alternatively the date level of the index can be directly accessed with df.index.levels[1].

Hope that helps.

@mickdewald
Copy link
Author

mickdewald commented Feb 7, 2023

Foremost, thank you for the effort and the detailed explanation.

What I have not quite understood yet is: How can I provoke that a live index is displayed to debug my code?
Do I just have to wait for the exchanges to open?

Furthermore, I will take your advice into account and do without df.reset_index() and test the other method. Calling df.reset_index() has always felt like a hack to me, only at the time it was the only way I found to achieve what I want.

@maread99
Copy link
Contributor

maread99 commented Feb 7, 2023

No worries.

What I have not quite understood yet is: How can I provoke that a live index is displayed to debug my code?
Do I just have to wait for the exchanges to open?

Yup. Although if it's to play around then just use a symbol that trades 24 hours, for example a future or a currency.

@mickdewald
Copy link
Author

Can you give me some more hints how to get a live index during test?
It just does not work for me:

Here some example:
^GDAXI

df:                            open          high           low         close      volume      adjclose
symbol date
^GDAXI 1987-12-30   1005.190002   1005.190002   1005.190002   1005.190002         0.0   1005.190002
       1988-01-04    956.489990    956.489990    956.489990    956.489990         0.0    956.489990
       1988-01-05    996.099976    996.099976    996.099976    996.099976         0.0    996.099976
       1988-01-06   1006.010010   1006.010010   1006.010010   1006.010010         0.0   1006.010010
       1988-01-07   1014.469971   1014.469971   1014.469971   1014.469971         0.0   1014.469971
...                         ...           ...           ...           ...         ...           ...
       2023-02-06  15367.040039  15406.929688  15275.570312  15345.910156  54430400.0  15345.910156
       2023-02-07  15359.589844  15363.410156  15273.599609  15320.879883  56184100.0  15320.879883
       2023-02-08  15450.669922  15486.129883  15371.929688  15412.049805  61344800.0  15412.049805
       2023-02-09  15560.150391  15658.559570  15519.629883  15523.419922  66247500.0  15523.419922
       2023-02-11  15438.940430  15487.980469  15246.389648  15350.530273         0.0  15350.530273

QM=F

symbol date
QM=F   2002-06-17  26.299999  26.750000  26.250000  26.340000   1917.0  26.340000
       2002-06-18  26.450001  26.450001  25.625000  25.650000      0.0  25.650000
       2002-06-19  25.500000  26.174999  25.450001  25.525000   1939.0  25.525000
       2002-06-20  25.750000  25.975000  25.375000  25.924999   2404.0  25.924999
       2002-06-21  25.750000  26.424999  25.225000  25.875000   1804.0  25.875000
...                      ...        ...        ...        ...      ...        ...
       2023-02-06  73.324997  74.525002  72.250000  74.099998  13629.0  74.099998
       2023-02-07  74.500000  77.599998  74.349998  77.150002  15376.0  77.150002
       2023-02-08  77.525002  78.574997  77.074997  78.474998  12726.0  78.474998
       2023-02-09  78.474998  78.849998  76.525002  78.050003  12726.0  78.050003
       2023-02-10  77.599998  80.324997  77.449997  78.599998   8966.0  78.599998

EURUSD=X

symbol   date
EURUSD=X 2003-12-01  1.203398  1.204007  1.194401  1.196501     0.0  1.196501
         2003-12-02  1.196101  1.210903  1.194600  1.208897     0.0  1.208897
         2003-12-03  1.209000  1.213003  1.207700  1.212298     0.0  1.212298
         2003-12-04  1.212004  1.214403  1.204398  1.208094     0.0  1.208094
         2003-12-05  1.207802  1.219096  1.206593  1.218695     0.0  1.218695
...                       ...       ...       ...       ...     ...       ...
         2023-02-06  1.079086  1.080030  1.071834  1.079086     0.0  1.079086
         2023-02-07  1.073076  1.074600  1.066951  1.073076     0.0  1.073076
         2023-02-08  1.072973  1.076021  1.071685  1.072973     0.0  1.072973
         2023-02-09  1.071582  1.079040  1.071157  1.071582     0.0  1.071582
         2023-02-11  1.074345  1.075384  1.068490  1.070320     0.0  1.070320

I don't get the live indices and cannot recreate the behavior with the shifted dates

I am definitely running on yahooquery==2.3.0

@maread99
Copy link
Contributor

? I get a live interval for all these (at the moment at least).

>>> import yahooquery as yq
>>> yq.__version__
'2.3.0'
ticker = yq.Ticker("QM=F")
ticker.history(start="2023-02-08")
open high low close volume adjclose
symbol date
QM=F 2023-02-08 77.525002 78.574997 77.074997 78.474998 12726 78.474998
2023-02-09 78.474998 78.849998 76.525002 78.050003 12726 78.050003
2023-02-10 13:18:31-05:00 77.599998 80.324997 77.449997 79.349998 12263 79.349998

Are you processing the returned data? Can you include the full code that you ran to get those returns.

@mickdewald
Copy link
Author

I just copied and pasted your snippet:

import yahooquery as yq
print("yq.__version__", yq.__version__)
ticker = yq.Ticker("QM=F")
print(ticker.history(start="2023-02-08"))
yq.__version__ 2.3.0
                        open       high        low      close  volume   adjclose
symbol date                                                                     
QM=F   2023-02-07  74.500000  77.599998  74.349998  77.150002   15376  77.150002
       2023-02-08  77.525002  78.574997  77.074997  78.474998   12726  78.474998
       2023-02-09  78.474998  78.849998  76.525002  78.050003   12726  78.050003
       2023-02-10  77.599998  80.324997  77.449997  79.625000   12381  79.625000

Somehow, it does not display the live indices

@maread99
Copy link
Contributor

That is odd. Also, that call should not return data for 2023-02-07.

What versions of pandas and numpy are you using? If they aren't the latest versions then could you upgrade them and try again.

I'm assuming the yahooquery library installation is clean, i.e. no local hacks?

Also, could you let me have your return from pd.Timestamp.now().

@mickdewald
Copy link
Author

mickdewald commented Feb 10, 2023

I created a venv just like in my dev environment like so

# Create virtualenv with desired python version (here v3.11)
virtualenv -p python3.11 venv; source ./venv/bin/activate

# Install all requirements
./venv/bin/python3.11 -m pip install -r requirements.txt
# requirements.txt
# -----------------------------------------------------------------------------
# Financial provider
# -----------------------------------------------------------------------------
# Yahoo Finance API
# https://github.com/dpguthrie/yahooquery
yahooquery==2.3.0

# https://github.com/ranaroussi/yfinance
# yfinance==0.2.3

# -----------------------------------------------------------------------------
# Dev
# -----------------------------------------------------------------------------
# logs
loguru==0.6.0

# formatter
autopep8==2.0.1

# linting
pylint==2.15.9

Here is the result of venv/bin/python -m pip list

Package            Version
------------------ ---------
astroid            2.13.2
autopep8           2.0.1
certifi            2022.12.7
charset-normalizer 2.1.1
dill               0.3.6
idna               3.4
isort              5.11.4
lazy-object-proxy  1.9.0
loguru             0.6.0
lxml               4.9.2
mccabe             0.7.0
numpy              1.24.1
pandas             1.5.2
pip                23.0
platformdirs       2.6.2
pycodestyle        2.10.0
pylint             2.15.9
python-dateutil    2.8.2
pytz               2022.7
requests           2.28.1
requests-futures   1.0.0
setuptools         65.6.3
six                1.16.0
tomlkit            0.11.6
tqdm               4.64.1
typing_extensions  4.4.0
urllib3            1.26.13
wheel              0.38.4
wrapt              1.14.1
yahooquery         2.3.0

These are the versions of the asked libs. I did not install them directly. They are installed as dependencies from some other lib

numpy              1.24.1
pandas             1.5.2

No hacks from my side.

yq.__version__: 2.3.0
pd.__version__: 1.5.2
np.__version__: 1.24.1

pd.Timestamp.now(): 2023-02-10 21:37:39.073704

                        open       high        low      close  volume   adjclose
symbol date                                                                     
QM=F   2023-02-07  74.500000  77.599998  74.349998  77.150002   15376  77.150002
       2023-02-08  77.525002  78.574997  77.074997  78.474998   12726  78.474998
       2023-02-09  78.474998  78.849998  76.525002  78.050003   12726  78.050003
       2023-02-11  77.599998  80.324997  77.449997  79.849998   13089  79.849998

I guess the last value should not be the 11th of February. It was 10th of February when I last posted here.
I am in Germany, by the way, Central European Time (CET), if that might help find what's going on.

@maread99
Copy link
Contributor

maread99 commented Feb 10, 2023

I'm in the same timezone (at least according to the Spanish state), so I don't think it's to do with that.

I've also run it with Python 3.11 and I'm getting back the live indice as before.

Could you run the following and post the print...

import yahooquery as yq
import pandas as pd
symbol = "QM=F"
ticker = yq.Ticker(symbol)
start = yq.utils._convert_to_timestamp("2023-02-08")
end = yq.utils._convert_to_timestamp(None, start=False)
print(f"{start=}, {end=}")

params = {"period1": start, "period2": end, "interval": "1d"}
data = ticker._get_data("chart", params)
index = data[symbol]["timestamp"]
print(f"{index=}")

dti = pd.to_datetime(index, unit="s")
print(f"{dti=}")

@mickdewald
Copy link
Author

mickdewald commented Feb 10, 2023

import yahooquery as yq
import pandas as pd
symbol = "QM=F"
ticker = yq.Ticker(symbol)
start = yq.utils._convert_to_timestamp("2023-02-08")
end = yq.utils._convert_to_timestamp(None, start=False)
print(f"{start=}, {end=}")

params = {"period1": start, "period2": end, "interval": "1d"}
data = ticker._get_data("chart", params)
index = data[symbol]["timestamp"]
print(f"{index=}")

dti = pd.to_datetime(index, unit="s")
print(f"{dti=}")

Here is the result:

start=1675810800, end=1676065639
index=[1675746000, 1675832400, 1675918800, 1676064967]
dti=DatetimeIndex(['2023-02-07 05:00:00', '2023-02-08 05:00:00',
               '2023-02-09 05:00:00', '2023-02-10 21:36:07'],
              dtype='datetime64[ns]', freq=None)

These date time values make sense to me

Here again the output from

import yahooquery as yq
import pandas as pd
import numpy as np

print("yq.__version__:", yq.__version__)
print("pd.__version__:", pd.__version__)
print("np.__version__:", np.__version__)

print("pd.Timestamp.now():", pd.Timestamp.now())

ticker = yq.Ticker("QM=F")
print(ticker.history(start="2023-02-08"))

Output

yq.__version__: 2.3.0
pd.__version__: 1.5.2
np.__version__: 1.24.1
pd.Timestamp.now(): 2023-02-10 22:49:02.618941
                        open       high        low      close  volume   adjclose
symbol date                                                                     
QM=F   2023-02-07  74.500000  77.599998  74.349998  77.150002   15376  77.150002
       2023-02-08  77.525002  78.574997  77.074997  78.474998   12726  78.474998
       2023-02-09  78.474998  78.849998  76.525002  78.050003   12726  78.050003
       2023-02-11  77.599998  80.324997  77.449997  79.824997   13296  79.824997

Just to confirm. that there is some difference in the same environment

@maread99
Copy link
Contributor

At least part of the bug (possibly all of it) is in yq.utils._convert_to_timestamp. It's been raised before (#49). The reason it's working for me does actually come back to the timezone (the system I'm working on is an hour behind you, i.e. GMT, which is making all the difference given the implementation).

I'll add a fix as another commit to the current PR #141.

In the meantime, try changing the yahooquery.utils._convert_to_timestamp function in your local install to:

def _convert_to_timestamp(date=None, start=True):
    if date is not None:
        return int(pd.Timestamp(date).timestamp())
    if start:
        return int(pd.Timestamp("1942-01-01").timestamp())
    return int(pd.Timestamp.now().timestamp())

Let me know how you get on!

@mickdewald
Copy link
Author

I did change the function

# def _convert_to_timestamp(date=None, start=True):
#     if date is None:
#         date = int((-858880800 * start) + (time.time() * (not start)))
#     elif isinstance(date, datetime.datetime):
#         date = int(time.mktime(date.timetuple()))
#     else:
#         date = int(time.mktime(time.strptime(str(date), "%Y-%m-%d")))
#     return date

def _convert_to_timestamp(date=None, start=True):
    print(f"TODO test new '_convert_to_timestamp' function")
    if date is not None:
        return int(pd.Timestamp(date).timestamp())
    if start:
        return int(pd.Timestamp("1942-01-01").timestamp())
    return int(pd.Timestamp.now().timestamp())

and ran the following code:

import yahooquery as yq
import pandas as pd
import numpy as np

symbol = "QM=F"
ticker = yq.Ticker(symbol)
start = yq.utils._convert_to_timestamp("2023-02-08")
end = yq.utils._convert_to_timestamp(None, start=False)
print(f"{start=}, {end=}")

params = {"period1": start, "period2": end, "interval": "1d"}
data = ticker._get_data("chart", params)
index = data[symbol]["timestamp"]
print(f"{index=}")

dti = pd.to_datetime(index, unit="s")
print(f"{dti=}")


print("yq.__version__:", yq.__version__)
print("pd.__version__:", pd.__version__)
print("np.__version__:", np.__version__)

print("pd.Timestamp.now():", pd.Timestamp.now())

ticker = yq.Ticker("QM=F")
print(ticker.history(start="2023-02-08"))

Result:

TODO test new '_convert_to_timestamp' function
TODO test new '_convert_to_timestamp' function
start=1675814400, end=1676074563
index=[1675832400, 1675918800, 1676066385]
dti=DatetimeIndex(['2023-02-08 05:00:00', '2023-02-09 05:00:00',
               '2023-02-10 21:59:45'],
              dtype='datetime64[ns]', freq=None)
yq.__version__: 2.3.0
pd.__version__: 1.5.2
np.__version__: 1.24.1
pd.Timestamp.now(): 2023-02-11 00:16:03.289537
TODO test new '_convert_to_timestamp' function
TODO test new '_convert_to_timestamp' function
                        open       high        low      close  volume   adjclose
symbol date                                                                     
QM=F   2023-02-08  77.525002  78.574997  77.074997  78.474998   12726  78.474998
       2023-02-09  78.474998  78.849998  76.525002  78.050003   12726  78.050003
       2023-02-11  77.599998  80.324997  77.449997  79.824997   13408  79.824997

@mickdewald
Copy link
Author

mickdewald commented Feb 10, 2023

I found a python snippet to simulate another timezone:

import os, time
print(time.strftime('%X %x %Z'))
os.environ['TZ'] = 'Europe/London'
time.tzset()
print(time.strftime('%X %x %Z'))

When I run the code again, I also got the live indices now:

00:26:44 02/11/23 CET
23:26:44 02/10/23 GMT

symbol date                                                                                    
QM=F   2023-02-08                 77.525002  78.574997  77.074997  78.474998   12726  78.474998
       2023-02-09                 78.474998  78.849998  76.525002  78.050003   12726  78.050003
       2023-02-10 16:59:45-05:00  77.599998  80.324997  77.449997  79.824997   13408  79.824997

@maread99
Copy link
Contributor

👍 The data coming back from yahoo is now as required, so the bug seems to be between line 1279 of ticker.py (the call to _historical_data_to_dataframe in ticker.history) and the end of ticker.history. As you've shown, it's appears to be to do with the timezone.

I'll try and have another look over the weekend.

@mickdewald
Copy link
Author

Great! Let me know if I can provide any help.

My guess is, that you should be able to simulate the behavior that I get via setting my timezone like

import os, time
os.environ['TZ'] = 'Europe/Berlin'
time.tzset()

@maread99
Copy link
Contributor

I think I've found it...

last_trade = pd.Timestamp.fromtimestamp(timestamp)
last_trade = last_trade.tz_localize("UTC")

It's a bug in the v2.3 implementation. I wrongly assumed that pd.Timestamp.fromtimestamp converted a timestamp to a pd.Timestamp based on UTC. By default it actually does the conversion based on the system timezone. As my system aligns with UTC I didn't realize the assumption was wrong. My bad (sorry!), I should have looked at the doc!

I'll add another commit to #141 to fix. With that, I'm hopeful that all queries raised in this issue, and bugs that have been brought to light by it are now resolved. Let me know otherwise.

To get it working in the meantime replace the two lines above in your local install with the following single line:
last_trade = pd.Timestamp.fromtimestamp(timestamp, tz="UTC")

Cheers for raising it.

maread99 added a commit to maread99/yahooquery that referenced this issue Feb 11, 2023
Fixes a v2.3 bug due to wrongly assuming that
 `pd.Timestamp.fromtimestamp` converts based on UTC by default
(actually converts based on system time).
@mickdewald
Copy link
Author

mickdewald commented Feb 11, 2023

Awesome job! I am eager to see if that solves my problem.

dpguthrie added a commit that referenced this issue Feb 14, 2023
Fix #49, #140, #142, #144, #145 and index for wk/mo intervals
@mickdewald
Copy link
Author

Thanks for the fix.
Can you foresee when there will be a new version of Yahoo Query?

@benhowell
Copy link

benhowell commented Feb 27, 2023

@maread99 @dpguthrie
I'm not sure this fixes the problem (or indeed manifests a new one).
Implementing the workaround has caused history to look 1 day (I assume whatever period you use) further back.
e.g.
t.history(start='2023-02-22', end=None, interval='1d')

symbol date ...
BTC-USD 2023-02-21 ...

EDIT:
It seems this error only occurs with tickers that daily close/open at midnight.
For me, I've confirmed this with both crypto (BTC-USD) and FX (AUDUSD=X).

All other tickers I'm tracking are on the ASX which opens at 10:00 and closes at 16:00 AEDT, and for these, the history function returns the correct data. From a purely speculative observation, it looks as if close times on 24hr tickers are marked as 00:00:00 for time of closing (which is actually 24hrs prior).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants