-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FSTORE-1090] Concepts & Guides for helper columns and on-demand features #333
[FSTORE-1090] Concepts & Guides for helper columns and on-demand features #333
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR also changed the behaviour around primary keys and event time. We should update the Training Data/Batch Data and Feature Vectors page to document this (e.g. how to include the pk or event time int he data returned when getting training data)
@@ -0,0 +1,6 @@ | |||
On-demand is, a feature that is computed at request-time using application-supplied inputs for an online model. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a title and the description tag to this page?
@@ -0,0 +1,127 @@ | |||
# Helper columns |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add the description field here?
@@ -0,0 +1,6 @@ | |||
On-demand is, a feature that is computed at request-time using application-supplied inputs for an online model. | |||
|
|||
In the image below shows an example of a housing price model that demonstrates how to implement an on-demand feature, a zip code (or post code) that is computed using longitude/latitude parameters. In your online application, longitude and latitude are provided as parameters to the application, and the same python function used to calculate the zip code in the feature pipeline is used to compute the zip code in the Online Inference pipeline. This is achieved by implementing the on-demand features as a Python function in a Python module. Also ensure that the same version of the Python module is installed in both the feature and inference pipelines. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Switch the order here - put the "this is achived by implementing the on-demand feature as python function" first and then present the example.
@@ -0,0 +1,127 @@ | |||
# Helper columns | |||
|
|||
HSFS provides functionality to define two types of helper columns `inference_helper_columns` and `training_helper_columns` to [feature views](./overview.md). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't mention HSFS
- people might be using the feature store using the Hopsworks
library and they get confused on what hsfs means.
Say something like - when defining a feature view users can mark certain features as helper columns or training columns....
product to assign different weights during the training time. | ||
|
||
## Definition | ||
Both inference and training helper column name(s) must be part of the `Query` object. If helper column name(s) belong to feature group that is part of a `Join` with `prefix` defined, then this prefix needs to prepended |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use a !!! note
section to call out this prefix thing.
query = label_fg.select("fraud_label")\ | ||
.join(cc_profile.select("expiry_date"))\ | ||
.join(trans_fg.select(["category", "amount", "days_until_card_expires", "date_of_transaction" | ||
"locaton_delta", "longitude", "latitude", "category"])) \ | ||
.join(window_aggs_fg.select_except(["trans_volume_mstd", "trans_volume_mavg", "trans_freq", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to have some many feature groups here? It's not htat people can copy paste it directly, so I think we should limit the setup code to the bare minimum and focus on the important part, the feature view definition.
``` | ||
|
||
## Retrieval | ||
When replaying a `Query` during model inference, helper columns will be omitted. However, they can be optionally fetched with inference or training data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't "replay a query" - You are retrieving data for model inference.
# compute location delta | ||
df['loc_delta_t_minus_1'] = df.apply(lambda row: location_delta(row['longitude'], | ||
row['latitute'], | ||
row['longitude_prev'], | ||
row['latitute_prev']), axis=1) | ||
|
||
# compute time delta | ||
df['days_until_card_expires'] = df.apply(lambda row: time_delta(row['date_of_transaction'], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as above, have a single example so that what we care about doesn't get lost in all this code.
# here cc_num, longitute, lattitude and date_of_transaction are provided as parameters to the application | ||
cc_num = ... | ||
longitude = ... | ||
latitute = ... | ||
date_of_transaction = ... | ||
|
||
# get previous transaction location of this credit card | ||
inference_helper = feature_view.get_inference_helper({"cc_num": cc_num}, return_type="dict") | ||
|
||
# compute location delta | ||
loc_delta_t_minus_1 = location_delta(longitude, | ||
latitute, | ||
inference_helper['longitude'], | ||
inference_helper['latitute']) | ||
|
||
# compute time delta | ||
days_until_card_expires = time_delta(date_of_transaction, | ||
inference_helper['expiry_date']) | ||
|
||
# Now get assembled feature vector for prediction | ||
feature_vector = feature_view.get_feature_vector({"cc_num": cc_num}, | ||
passed_features={"loc_delta_t_minus_1": loc_delta_t_minus_1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same number of example comment.
Co-authored-by: Fabio Buso <[email protected]>
Both inference and training helper column name(s) must be part of the `Query` object. If helper column name(s) belong to feature group that is part of a `Join` with `prefix` defined, then this prefix needs to prepended | ||
to the original column name when defining helper column list. | ||
|
||
# Inference Helper columns |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to be a h2 title not h1, i.e., put another # at the beginning
### Retrieval | ||
When retrieving data for model inference, helper columns will be omitted. However, they can be optionally fetched with inference or training data. | ||
|
||
### Batch inference |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you want #### here.
df = df[[f.name for f in feature_view.features if not (f.label or f.inference_helper_column or f.training_helper_column)]] | ||
``` | ||
|
||
### Online inference |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#### here.
Co-authored-by: Fabio Buso <[email protected]>
Co-authored-by: Fabio Buso <[email protected]>
Co-authored-by: Fabio Buso <[email protected]>
Co-authored-by: Fabio Buso <[email protected]>
Co-authored-by: Fabio Buso <[email protected]>
…davitbzh/logicalclocks.github.io into ondemad_features_helper_columns
…ures (#333) Co-authored-by: Fabio Buso <[email protected]> Co-authored-by: davitbzh <[email protected]>
No description provided.