streamsight.datasets.LastFMDataset
+-
+
- +class streamsight.datasets.LastFMDataset(filename: str | None = None, base_path: str | None = None, use_default_filters=False) +
Bases:
+Dataset
Last FM dataset.
+The Last FM dataset contains user interactions with artists. The tags in this +datasets are not used in this implementation. The dataset that will be used +would the the user_taggedartists-timestamps.dat file. The dataset contains +the following columns: [user, artist, tags, timestamp].
+The dataset is downloaded from the GroupLens website [CBK11].
+-
+
- +__init__(filename: str | None = None, base_path: str | None = None, use_default_filters=False) +
Methods
++ +
+ + +__init__
([filename, base_path, ...])+ + +add_filter
(filter) +Add a filter to be applied when loading the data.
+ +fetch_dataset
([force]) +Check if dataset is present, if not download
+ + +load
([apply_filters]) +Loads data into an InteractionMatrix object.
Attributes
++ +
+ ++ +URL to fetch the dataset from.
++ +Default base path where the dataset will be stored.
++ +Default filename that will be used if it is not specified by the user.
++ +Name of the column in the DataFrame that contains item identifiers.
++ +Name of the file containing user interaction on the MovieLens server.
++ +Name of the zip-file on the MovieLens server.
++ +Name of the column in the DataFrame that contains the tag a user gave to the item.
++ +Name of the column in the DataFrame that contains time of interaction in seconds since epoch.
++ +Name of the column in the DataFrame that contains user identifiers.
++ +File path of the dataset.
+ ++ +Name of the object's class.
-
+
- +DATASET_URL = 'https://files.grouplens.org/datasets/hetrec2011' +
URL to fetch the dataset from.
+
-
+
- +DEFAULT_BASE_PATH = 'data' +
Default base path where the dataset will be stored.
+
-
+
- +property DEFAULT_FILENAME: str +
Default filename that will be used if it is not specified by the user.
+
-
+
- +ITEM_IX = 'artistID' +
Name of the column in the DataFrame that contains item identifiers.
+
-
+
- +REMOTE_FILENAME = 'user_taggedartists-timestamps.dat' +
Name of the file containing user interaction on the MovieLens server.
+
-
+
- +REMOTE_ZIPNAME = 'hetrec2011-lastfm-2k' +
Name of the zip-file on the MovieLens server.
+
-
+
- +TAG_IX = 'tagID' +
Name of the column in the DataFrame that contains the tag a user gave to the item.
+
-
+
- +TIMESTAMP_IX = 'timestamp' +
Name of the column in the DataFrame that contains time of interaction in seconds since epoch.
+
-
+
- +USER_IX = 'userID' +
Name of the column in the DataFrame that contains user identifiers.
+
-
+
- +_abc_impl = <_abc._abc_data object> +
-
+
- +_check_safe() +
Check if the directory is safe. If directory does not exit, create it.
+
-
+
- +_dataframe_to_matrix(df: DataFrame) InteractionMatrix +
Converts a DataFrame to an InteractionMatrix.
+-
+
- Parameters: +
df (pd.DataFrame) – DataFrame to convert
+
+- Returns: +
InteractionMatrix object
+
+- Return type: +
- + +
-
+
- +property _default_filters: List[Filter] +
The default filters for all datasets
+Concrete classes can override this property to add more filters.
+-
+
- Returns: +
List of filters to be applied to the dataset
+
+- Return type: +
List[Filter]
+
+
-
+
- +_download_dataset() +
Downloads the dataset.
+Downloads the zipfile, and extracts the interaction file to self.file_path
+
-
+
- +_fetch_remote(url: str, filename: str) str +
Fetch data from remote url and save locally
+-
+
- Parameters: +
-
+
url (str) – url to fetch data from
+filename (str) – Path to save file to
+
+- Returns: +
The filename where data was saved
+
+- Return type: +
str
+
+
-
+
- +_load_dataframe() DataFrame +
Load the raw dataset from file, and return it as a pandas DataFrame.
+Transform the dataset downloaded to have integer user and item ids. This +will be needed for representation in the interaction matrix.
+-
+
- Returns: +
The interaction data as a DataFrame with a row per interaction.
+
+- Return type: +
pd.DataFrame
+
+
-
+
- +add_filter(filter: Filter) +
Add a filter to be applied when loading the data.
+Utilize
+DataFramePreprocessor
class to add filters to the +dataset to load. The filter will be applied when the data is loaded into +anInteractionMatrix
object whenload()
is called.-
+
- Parameters: +
filter (Filter) – Filter to be applied to the loaded DataFrame +processing to interaction matrix.
+
+
-
+
- +fetch_dataset(force=False) None +
Check if dataset is present, if not download
+-
+
- Parameters: +
force (bool, optional) – If True, dataset will be downloaded, +even if the file already exists. +Defaults to False.
+
+
-
+
- +property file_path: str +
File path of the dataset.
+
-
+
- +load(apply_filters=True) InteractionMatrix +
Loads data into an InteractionMatrix object.
+Data is loaded into a DataFrame using the
+_load_dataframe()
function. +Resulting DataFrame is parsed into anInteractionMatrix
object. If +apply_filters
is set to True, the filters set will be applied to the +dataset and mapping of user and item ids will be done. This is advised +even if there is no filter set, as it will ensure that the user and item +ids are incrementing in the order of time.-
+
- Parameters: +
apply_filters (bool, optional) – To apply the filters set and preprocessing, +defaults to True
+
+- Returns: +
Resulting interaction matrix
+
+- Return type: +
- + +
-
+
- +property name +
Name of the object’s class.
+