fin-ds
(financial data sources) is a Python package for retrieving financial OHLCV data from a data source using a common interface and returning a standardized pandas DataFrame. Using the library is as simple as:
ds = DataSourceFactory("Tiingo")
df = df.get_eod_data("AAPL")
To see the list of available data sources call DataSourceFactory.data_sources
:
['AlphaVantage', 'EODHD', 'NasdaqDataLink', 'Tiingo', 'YFinance']
Here are the column mappings from each data source into fin-ds:
fin-ds | AlphaVantage | EODHD | NasdaqDataLink | Tiingo | YFinance |
---|---|---|---|---|---|
date | date | date | date | date | Date |
ticker | - | symbol | ticker | - | - |
open | 1. open | open | open | open | Open |
high | 2. high | high | high | high | High |
low | 3. low | low | low | low | Low |
close | 4. close | close | close | close | Close |
volume | 6. volume | volume | volume | volume | Volume |
adj_open | - | - | adj_open | adjOpen | - |
adj_high | - | - | adj_high | adjHigh | - |
adj_low | - | - | adj_low | adjLow | - |
adj_close | 5. adjusted close | adj_close | adj_close | adjClose | Adj Close |
dividend | 7. dividend amount | - | dividend | divCash | - |
split | - | - | split | splitFactor | - |
- | - | interval | - | - | - |
fin-ds
is a Python package designed to simplify the process of fetching financial OHLCV (Open, High, Low, Close, Volume) and adjusted data from various data sources through a unified interface. It abstracts away the differences between data source APIs, returning data in a standardized pandas DataFrame format. This makes it an ideal tool for financial analysis, algorithmic trading strategy development, and data science projects focusing on financial markets.
The detailed requirements are spelled out in the requirements.txt file but at a high level, the following packages are required:
- pandas - Because pandas makes data analysis so much easier.
- python-decouple - Used to pull API keys from environment variables.
- Data source clients - Install the following packages for each data source you plan to use. Note that you only need to install the clients that you plan to use.
As with many other Python applications, it is recommended to install this package in a virtual environment.
$ pip install git+https://github.com/mattsmith321/fin-ds.git
In order for the references in the examples folder to work, the package must be installed within the application directory.
$ pip install -e .
At this point in time, there are no exposed configuration options.
fin-ds uses the YFinance library as the default data source. This means that a data source named does not have to be passed to the DataSourceFactory() constructor.
ds = DataSourceFactory()
df = ds.get_eod_data("AAPL")
fin-ds comes with several built-in data sources that can be easily accessed and used to fetch data. To use a built-in data source, simply instantiate the DataSourceFactory with the name of the data source you wish to use. The nice thing about this approach is that it allows you to change the data source but not have to change any code to deal with the output since each data source is mapped back to the same underlying structure.
ds = DataSourceFactory("AlphaVantage")
df = ds.get_eod_data("AAPL")
The available built-in data sources include:
- AlphaVantage
- EODHD
- NasdaqDataLink
- Tiingo
- YFinance
You can list all available data sources using:
print(DataSourceFactory.get_data_source_names())
To extend the functionality with your own data sources, you can create custom
data source classes and register them with the DataSourceFactory
. Custom data
source classes should subclass BaseDataSource
and implement the required methods.
-
Subclass
BaseDataSource
: Your custom data source class should inherit fromBaseDataSource
. -
Implement Required Methods: At a minimum, your class should implement the
get_eod_data
method.
from fin_ds.data_sources.base_data_source import BaseDataSource
import pandas as pd
class MyCustomDataSource(BaseDataSource):
def get_eod_data(self, ticker, interval="daily"):
# Custom logic to fetch data
# For demonstration, return an empty DataFrame
return pd.DataFrame()
# Register the custom data source with the factory
from fin_ds import DataSourceFactory
DataSourceFactory.register_data_source(MyCustomDataSource)
Once registered, you can instantiate your custom data source using the DataSourceFactory
just like built-in sources:
# Create an instance of your custom data source
my_custom_ds = DataSourceFactory("MyCustomDataSource")
# Use it to fetch data
data = my_custom_ds.get_eod_data("AAPL")
- Naming: The name used to instantiate the data source via
DataSourceFactory
is derived from the class name, omitting "DataSource" suffix if present. Ensure your class names are descriptive and unique. - API Keys: If your data source requires an API key, make sure to include logic in your class to handle this securely. You might use environment variables or configuration files to manage API keys outside your codebase.
- Error Handling: Implement robust error handling within your custom data source methods to deal with API limitations, network issues, or data inconsistencies.
By following these guidelines, you can seamlessly integrate custom data sources into your application, enhancing its data retrieval capabilities.
Encountering issues is a normal part of working with any software package. This section provides guidance on resolving common problems you may face while using this package.
If you encounter an error indicating that a data source could not be found, consider the following steps:
-
Check the Data Source Name: Ensure that the name you're using to instantiate the data source matches one of the available data sources. Remember, the name is case-sensitive.
-
List Available Data Sources: Use
DataSourceFactory.data_sources
to list all registered data sources to verify if your desired data source is available. -
Custom Data Source Registration: If you're trying to use a custom data source, ensure it has been registered correctly with
DataSourceFactory.register_data_source()
before attempting to use it.
These errors can occur when the package tries to dynamically load a data source module but fails. Possible reasons include:
-
Incorrect Directory Structure: Verify that your custom data source files are placed in the correct directory if they follow the built-in data source convention.
-
Incorrect Module Name: Ensure that the module file name matches the expected naming convention (
lowercase
version of the data source class name). -
Environment Issues: If you're using a virtual environment, ensure it's activated, and the package is installed within it.
If your data source requires an API key and you encounter authentication or access errors:
-
Verify API Key: Check that the API key is correct and has the necessary permissions.
-
Configuration Check: Ensure that the API key is properly configured, either through environment variables or configuration files as expected by your data source class.
Problems fetching data can arise due to various reasons:
-
Network Issues: Verify your internet connection.
-
API Limitations: Some APIs have call rate limits. Ensure you're not exceeding these limits.
-
Incorrect Parameters: Verify that the parameters passed to the data fetching methods (e.g., ticker symbols) are correct and supported by the data source.
-
Logging: Increase the logging level to
DEBUG
to get more detailed output that might help identify the issue.import logging logging.basicConfig(level=logging.DEBUG)
-
Interactive Python Shell: Experiment with your data sources in an interactive Python shell (e.g.,
ipython
orpython
REPL) for quicker feedback and easier troubleshooting.
If you've gone through these steps and still face issues, consider seeking further assistance by:
- Checking the Documentation: Review the package documentation for any additional troubleshooting tips or known issues.
- GitHub Issues: If you suspect a bug or have a feature request, use the GitHub Issues page for the project to search for existing issues or create a new one.
For a more detailed changelog, including the list of all changes for each version, please refer to the Releases page on GitHub.
- 0.3.0 - Added Nasdaq data sources using nasdaq-data-link package.
- 0.3.1 - Fixed how tickers with special characters are handled (at least for BRK-A).
- 1.0.0 - Made changes to class and module names to support dynamic loading.
- 1.1.0 - Added dynamic registration of custom data source classes.
- 2.0.0 - Added ability to backfill tickers to extend data. This required adding the ticker column so any previous cached data files need to be deleted.
- 3.0.0 - Significant refactoring to improve testability. Removed the list of backfill ticker associations and changed to allow submitting a backfill ticker. Removed merging of cached data with new data to avoid issues with splits and other adjustments.
Contributions are welcome! If you'd like to contribute, please follow these steps:
- Fork the repository.
- Create a new branch for your feature or fix.
- Commit your changes with clear, descriptive messages.
- Push your branch and submit a pull request.
Please ensure your code adheres to the Black code style, and include unit tests for new features or fixes. For more details, check out our CONTRIBUTING.md file.