Skip to content

Commit

Permalink
update for grand prix s4e9
Browse files Browse the repository at this point in the history
  • Loading branch information
Eden Wu committed Aug 31, 2024
1 parent cf7b4b7 commit 7b57db4
Show file tree
Hide file tree
Showing 4 changed files with 1,307 additions and 4 deletions.
2 changes: 1 addition & 1 deletion alpha_automl/resource/base_grammar.bnf
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
S -> CLASSIFICATION_TASK | REGRESSION_TASK | CLUSTERING_TASK | TIME_SERIES_FORECAST_TASK | SEMISUPERVISED_TASK | NA_TASK
CLASSIFICATION_TASK -> IMPUTER ENCODERS FEATURE_GENERATOR FEATURE_SCALER FEATURE_SELECTOR CLASSIFIER CLASSIFICATION_ENSEMBLER
CLASSIFICATION_TASK -> IMPUTER ENCODERS FEATURE_GENERATOR FEATURE_SCALER FEATURE_SELECTOR CLASSIFIER
REGRESSION_TASK -> IMPUTER ENCODERS FEATURE_GENERATOR FEATURE_SCALER FEATURE_SELECTOR REGRESSOR REGRESSION_ENSEMBLER
CLUSTERING_TASK -> IMPUTER ENCODERS FEATURE_GENERATOR FEATURE_SCALER FEATURE_SELECTOR CLUSTERER
TIME_SERIES_FORECAST_TASK -> IMPUTER TIME_SERIES_FORECASTER | REGRESSION_TASK
Expand Down
8 changes: 5 additions & 3 deletions alpha_automl/wrapper_primitives/llm_feature_engine.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,14 +11,15 @@
logger = logging.getLogger(__name__)

class LLMFeatureGenerator(BasePrimitive):
def __init__(self, extra_system_prompt=None):
def __init__(self, description=None, extra_system_prompt=None):
self.description = description
self.extra_system_prompt = extra_system_prompt
self.prompt = None
self.code = None
pass

def fit(self, X, y=None):
self.prompt = build_prompt_from_df(description="", df=X)
self.prompt = build_prompt_from_df(description=self.description, df=X)
self.code = generate_code(self.prompt, self.extra_system_prompt)
return self

Expand All @@ -28,7 +29,7 @@ def transform(self, X, y=None):
access_scope = {"df": X_cp, "pd": pd, "np": np}
parsed = ast.parse(self.code)
exec(compile(parsed, filename="<ast>", mode="exec"), access_scope, loc)
return np.array(X_cp)
return X_cp

def get_prompt(
df, description, iterative=1, data_description_unparsed=None, samples=None, **kwargs
Expand All @@ -55,6 +56,7 @@ def get_prompt(
This code also drops columns, if these may be redundant and hurt the predictive performance of the downstream classifier (Feature selection). Dropping columns may help as the chance of overfitting is lower, especially if the dataset is small.
The classifier will be trained on the dataset with the generated columns and evaluated on a holdout set. The evaluation metric is accuracy. The best performing code will be selected.
Added columns can be used in other codeblocks, dropped columns are not available anymore.
Remember do not include np.inf, -np.inf in any generated columns.
Code formatting for each added column:
```python
Expand Down
Loading

0 comments on commit 7b57db4

Please sign in to comment.