+
+ +
+

streamsight.datasets.LastFMDataset

+
+
+class streamsight.datasets.LastFMDataset(filename: str | None = None, base_path: str | None = None, use_default_filters=False)
+

Bases: Dataset

+

Last FM dataset.

+

The Last FM dataset contains user interactions with artists. The tags in this +datasets are not used in this implementation. The dataset that will be used +would the the user_taggedartists-timestamps.dat file. The dataset contains +the following columns: [user, artist, tags, timestamp].

+

The dataset is downloaded from the GroupLens website [CBK11].

+
+
+__init__(filename: str | None = None, base_path: str | None = None, use_default_filters=False)
+
+ +

Methods

+ + + + + + + + + + + + + + + +

__init__([filename, base_path, ...])

add_filter(filter)

Add a filter to be applied when loading the data.

fetch_dataset([force])

Check if dataset is present, if not download

load([apply_filters])

Loads data into an InteractionMatrix object.

+

Attributes

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +

DATASET_URL

URL to fetch the dataset from.

DEFAULT_BASE_PATH

Default base path where the dataset will be stored.

DEFAULT_FILENAME

Default filename that will be used if it is not specified by the user.

ITEM_IX

Name of the column in the DataFrame that contains item identifiers.

REMOTE_FILENAME

Name of the file containing user interaction on the MovieLens server.

REMOTE_ZIPNAME

Name of the zip-file on the MovieLens server.

TAG_IX

Name of the column in the DataFrame that contains the tag a user gave to the item.

TIMESTAMP_IX

Name of the column in the DataFrame that contains time of interaction in seconds since epoch.

USER_IX

Name of the column in the DataFrame that contains user identifiers.

file_path

File path of the dataset.

name

Name of the object's class.

+
+
+DATASET_URL = 'https://files.grouplens.org/datasets/hetrec2011'
+

URL to fetch the dataset from.

+
+ +
+
+DEFAULT_BASE_PATH = 'data'
+

Default base path where the dataset will be stored.

+
+ +
+
+property DEFAULT_FILENAME: str
+

Default filename that will be used if it is not specified by the user.

+
+ +
+
+ITEM_IX = 'artistID'
+

Name of the column in the DataFrame that contains item identifiers.

+
+ +
+
+REMOTE_FILENAME = 'user_taggedartists-timestamps.dat'
+

Name of the file containing user interaction on the MovieLens server.

+
+ +
+
+REMOTE_ZIPNAME = 'hetrec2011-lastfm-2k'
+

Name of the zip-file on the MovieLens server.

+
+ +
+
+TAG_IX = 'tagID'
+

Name of the column in the DataFrame that contains the tag a user gave to the item.

+
+ +
+
+TIMESTAMP_IX = 'timestamp'
+

Name of the column in the DataFrame that contains time of interaction in seconds since epoch.

+
+ +
+
+USER_IX = 'userID'
+

Name of the column in the DataFrame that contains user identifiers.

+
+ +
+
+_abc_impl = <_abc._abc_data object>
+
+ +
+
+_check_safe()
+

Check if the directory is safe. If directory does not exit, create it.

+
+ +
+
+_dataframe_to_matrix(df: DataFrame) InteractionMatrix
+

Converts a DataFrame to an InteractionMatrix.

+
+
Parameters:
+

df (pd.DataFrame) – DataFrame to convert

+
+
Returns:
+

InteractionMatrix object

+
+
Return type:
+

InteractionMatrix

+
+
+
+ +
+
+property _default_filters: List[Filter]
+

The default filters for all datasets

+

Concrete classes can override this property to add more filters.

+
+
Returns:
+

List of filters to be applied to the dataset

+
+
Return type:
+

List[Filter]

+
+
+
+ +
+
+_download_dataset()
+

Downloads the dataset.

+

Downloads the zipfile, and extracts the interaction file to self.file_path

+
+ +
+
+_fetch_remote(url: str, filename: str) str
+

Fetch data from remote url and save locally

+
+
Parameters:
+
    +
  • url (str) – url to fetch data from

  • +
  • filename (str) – Path to save file to

  • +
+
+
Returns:
+

The filename where data was saved

+
+
Return type:
+

str

+
+
+
+ +
+
+_load_dataframe() DataFrame
+

Load the raw dataset from file, and return it as a pandas DataFrame.

+

Transform the dataset downloaded to have integer user and item ids. This +will be needed for representation in the interaction matrix.

+
+
Returns:
+

The interaction data as a DataFrame with a row per interaction.

+
+
Return type:
+

pd.DataFrame

+
+
+
+ +
+
+add_filter(filter: Filter)
+

Add a filter to be applied when loading the data.

+

Utilize DataFramePreprocessor class to add filters to the +dataset to load. The filter will be applied when the data is loaded into +an InteractionMatrix object when load() is called.

+
+
Parameters:
+

filter (Filter) – Filter to be applied to the loaded DataFrame +processing to interaction matrix.

+
+
+
+ +
+
+fetch_dataset(force=False) None
+

Check if dataset is present, if not download

+
+
Parameters:
+

force (bool, optional) – If True, dataset will be downloaded, +even if the file already exists. +Defaults to False.

+
+
+
+ +
+
+property file_path: str
+

File path of the dataset.

+
+ +
+
+load(apply_filters=True) InteractionMatrix
+

Loads data into an InteractionMatrix object.

+

Data is loaded into a DataFrame using the _load_dataframe() function. +Resulting DataFrame is parsed into an InteractionMatrix object. If +apply_filters is set to True, the filters set will be applied to the +dataset and mapping of user and item ids will be done. This is advised +even if there is no filter set, as it will ensure that the user and item +ids are incrementing in the order of time.

+
+
Parameters:
+

apply_filters (bool, optional) – To apply the filters set and preprocessing, +defaults to True

+
+
Returns:
+

Resulting interaction matrix

+
+
Return type:
+

InteractionMatrix

+
+
+
+ +
+
+property name
+

Name of the object’s class.

+
+ +
+ +
+ + +
+