Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FSTORE-1090] Concepts & Guides for helper columns and on-demand features #333

Merged
merged 11 commits into from
Dec 19, 2023

Conversation

davitbzh
Copy link
Contributor

No description provided.

Copy link
Contributor

@SirOibaf SirOibaf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR also changed the behaviour around primary keys and event time. We should update the Training Data/Batch Data and Feature Vectors page to document this (e.g. how to include the pk or event time int he data returned when getting training data)

@@ -0,0 +1,6 @@
On-demand is, a feature that is computed at request-time using application-supplied inputs for an online model.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a title and the description tag to this page?

@@ -0,0 +1,127 @@
# Helper columns
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add the description field here?

docs/concepts/fs/feature_group/on_demand_feature.md Outdated Show resolved Hide resolved
@@ -0,0 +1,6 @@
On-demand is, a feature that is computed at request-time using application-supplied inputs for an online model.

In the image below shows an example of a housing price model that demonstrates how to implement an on-demand feature, a zip code (or post code) that is computed using longitude/latitude parameters. In your online application, longitude and latitude are provided as parameters to the application, and the same python function used to calculate the zip code in the feature pipeline is used to compute the zip code in the Online Inference pipeline. This is achieved by implementing the on-demand features as a Python function in a Python module. Also ensure that the same version of the Python module is installed in both the feature and inference pipelines.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switch the order here - put the "this is achived by implementing the on-demand feature as python function" first and then present the example.

@@ -0,0 +1,127 @@
# Helper columns

HSFS provides functionality to define two types of helper columns `inference_helper_columns` and `training_helper_columns` to [feature views](./overview.md).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't mention HSFS - people might be using the feature store using the Hopsworks library and they get confused on what hsfs means.

Say something like - when defining a feature view users can mark certain features as helper columns or training columns....

product to assign different weights during the training time.

## Definition
Both inference and training helper column name(s) must be part of the `Query` object. If helper column name(s) belong to feature group that is part of a `Join` with `prefix` defined, then this prefix needs to prepended
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use a !!! note section to call out this prefix thing.

Comment on lines 23 to 27
query = label_fg.select("fraud_label")\
.join(cc_profile.select("expiry_date"))\
.join(trans_fg.select(["category", "amount", "days_until_card_expires", "date_of_transaction"
"locaton_delta", "longitude", "latitude", "category"])) \
.join(window_aggs_fg.select_except(["trans_volume_mstd", "trans_volume_mavg", "trans_freq",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to have some many feature groups here? It's not htat people can copy paste it directly, so I think we should limit the setup code to the bare minimum and focus on the important part, the feature view definition.

```

## Retrieval
When replaying a `Query` during model inference, helper columns will be omitted. However, they can be optionally fetched with inference or training data.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't "replay a query" - You are retrieving data for model inference.

Comment on lines 66 to 73
# compute location delta
df['loc_delta_t_minus_1'] = df.apply(lambda row: location_delta(row['longitude'],
row['latitute'],
row['longitude_prev'],
row['latitute_prev']), axis=1)

# compute time delta
df['days_until_card_expires'] = df.apply(lambda row: time_delta(row['date_of_transaction'],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as above, have a single example so that what we care about doesn't get lost in all this code.

Comment on lines 101 to 122
# here cc_num, longitute, lattitude and date_of_transaction are provided as parameters to the application
cc_num = ...
longitude = ...
latitute = ...
date_of_transaction = ...

# get previous transaction location of this credit card
inference_helper = feature_view.get_inference_helper({"cc_num": cc_num}, return_type="dict")

# compute location delta
loc_delta_t_minus_1 = location_delta(longitude,
latitute,
inference_helper['longitude'],
inference_helper['latitute'])

# compute time delta
days_until_card_expires = time_delta(date_of_transaction,
inference_helper['expiry_date'])

# Now get assembled feature vector for prediction
feature_vector = feature_view.get_feature_vector({"cc_num": cc_num},
passed_features={"loc_delta_t_minus_1": loc_delta_t_minus_1,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same number of example comment.

docs/user_guides/fs/feature_view/batch-data.md Outdated Show resolved Hide resolved
docs/user_guides/fs/feature_view/batch-data.md Outdated Show resolved Hide resolved
docs/user_guides/fs/feature_view/training-data.md Outdated Show resolved Hide resolved
docs/user_guides/fs/feature_view/training-data.md Outdated Show resolved Hide resolved
docs/user_guides/fs/feature_view/helper-columns.md Outdated Show resolved Hide resolved
Both inference and training helper column name(s) must be part of the `Query` object. If helper column name(s) belong to feature group that is part of a `Join` with `prefix` defined, then this prefix needs to prepended
to the original column name when defining helper column list.

# Inference Helper columns
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be a h2 title not h1, i.e., put another # at the beginning

### Retrieval
When retrieving data for model inference, helper columns will be omitted. However, they can be optionally fetched with inference or training data.

### Batch inference
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you want #### here.

df = df[[f.name for f in feature_view.features if not (f.label or f.inference_helper_column or f.training_helper_column)]]
```

### Online inference
Copy link
Contributor

@SirOibaf SirOibaf Dec 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#### here.

@davitbzh davitbzh requested a review from SirOibaf December 18, 2023 23:33
@SirOibaf SirOibaf merged commit e88d96f into logicalclocks:main Dec 19, 2023
1 check passed
SirOibaf added a commit that referenced this pull request Dec 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants