Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CausalRandomForestRegressor with causal_mse predicts to inf on data with nuisance #589

Open
winston-zillow opened this issue Dec 16, 2022 · 8 comments
Labels
bug Something isn't working

Comments

@winston-zillow
Copy link

Describe the bug
After training the CausalRandomForestRegressor with criterion causal_mse on data with nuisance, many of the predicted ITE values are inf.

To Reproduce
I changed the causal trees with synthetic data notebook to use data generated by simulate_nuisance_and_easy_treatment

# y, X, w, tau, b, e = synthetic_data(mode=5, n=10000, p=20, sigma=5.0)
from causalml.dataset import simulate_nuisance_and_easy_treatment
y, X, w, tau, b, e = simulate_nuisance_and_easy_treatment(n=10000, p=20, sigma=5.0)

after training the CausalRandomForestRegressor with criterion causal_mse with the same codes:

rforest2 = CausalRandomForestRegressor(criterion="causal_mse",
                                       min_samples_leaf=200,
                                       control_name=0,
                                       n_estimators=50,
                                       n_jobs=4)
rforest2.fit(X=df_train[feature_names].values,
             treatment=df_train['treatment'].values,
             y=df_train['outcome'].values
             )

many of the predicted ITE values are inf.

rf2_ite_pred = rforest2.predict(df_test[feature_names].values)
rf2_ite_pred[:100]

This is the case even if I change the nuisance to something simpler:

    #b = (
    #       np.sin(np.pi * X[:, 0] * X[:, 1])
    #        + 2 * (X[:, 2] - 0.5) ** 2
    #        + X[:, 3]
    #        + 0.5 * X[:, 4]
    #)
    b = X[:, 3] + 2 * X[:, 4] + 3 * X[:, 1]

Expected behavior
Should predict to valid values.

Environment (please complete the following information):

  • OS: macOS
  • Python Version: 3.9.16
  • Versions of Major Dependencies: pandas==1.5.2, scikit-learn==1.0.2
@winston-zillow winston-zillow added the bug Something isn't working label Dec 16, 2022
@winston-zillow
Copy link
Author

Note: CausalRandomForestRegressor with standard_mse predicts fine on the same data.

@winston-zillow
Copy link
Author

More debug info: one of the trained tree seems to be bad:

print(rforest2.estimators_[10].feature_importances_)
=> [ 0. nan  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
  0.  0.]

After these trees are removed the predictions won't be inf but still may predicts to extreme negative values (-3e+13.)

@alexander-pv
Copy link
Collaborator

Hi, thanks for the report. The issue has been fixed recently in #583.

Please, reinstall the package from source.
You can also generate the desired type of synth data by changing mode parameter:

y, X, w, tau, b, e = synthetic_data(mode=1, n=10000, p=20, sigma=5.0)

In causal_trees_with_synthetic_data.ipynb you will get the following result:
synth_data_mode1

@winston-zillow
Copy link
Author

Thanks. Reinstalling from source fixes the problem!

@winston-zillow
Copy link
Author

winston-zillow commented Apr 18, 2023

This still happens with my real world data. Some predictions result in nan (rather than inf.) Maybe there's still issue?

@alexander-pv
Copy link
Collaborator

alexander-pv commented Apr 26, 2023

Hi. Could you please plot each tree from your fitted CausalRandomForestRegressor using plot_causal_tree in causalml.inference.tree.plot and attach images?
You can also attach small dataset which reproduces the nan issue.

@lemonlmn
Copy link

lemonlmn commented Oct 3, 2024

Hi, I encounter the same nan issue using CausalRandomForestRegressor for the predict.

When using 'causal_mse' the nan ratio is around 10%. Using 'standard_mse' is better, but still have around 2% nan.

@lemonlmn
Copy link

lemonlmn commented Oct 3, 2024

BTW, seems plot_causal_tree only works for CausalTreeRegressor, not for CausalRandomForestRegressor.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants