Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError: y data should have at least 1 samples, but found 0 #290

Open
ajayxcel opened this issue Apr 19, 2024 · 3 comments
Open

ValueError: y data should have at least 1 samples, but found 0 #290

ajayxcel opened this issue Apr 19, 2024 · 3 comments
Labels
analysis-methods Relating to analysis methods.

Comments

@ajayxcel
Copy link

ajayxcel commented Apr 19, 2024

Hi, I have been facing an error while analyzing for 'Turbine Ideal Energy'. I'm facing this error if I have less than 2 years of SCADA data. Even if I have data which is short by a day, it throws the error. I'm wondering if we have to use 2 years or more data or it's just a bug. Could you look into this issue please? I have pasted the entire error for reference. Also, I have similar error with other codes as well except AEP when I used data less than 2 years. Thank you very much for consideration.

ValueError                                Traceback (most recent call last)
Cell In[15], line 6
      1 # We can choose to save key plots to a file by setting enable_plotting=True and 
      2 # specifying a directory to save the images. For now we turn off this feature. 
      3 # ta.run(reanalysis_subset=['era5', 'merra2'], enable_plotting=False, plot_dir=None,
      4 #        wind_bin_thresh=wind_bin_thresh, max_power_filter=max_power_filter,
      5 #        correction_threshold=correction_threshold)
----> 6 ta.run(reanalysis_products=['era5', 'merra2'])

File ~\AppData\Local\anaconda3\envs\openoa-env\lib\site-packages\openoa\logging.py:33, in logged_method_call.<locals>._wrapper(self, *args, **kwargs)
     31 logger = logging.getLogger(the_method.__module__)
     32 logger.debug(f"{self.__class__.__name__}#{id(self)}.{the_method.__name__}: {msg}")
---> 33 return the_method(self, *args, **kwargs)

File ~\AppData\Local\anaconda3\envs\openoa-env\lib\site-packages\openoa\analysis\turbine_long_term_gross_energy.py:255, in TurbineLongTermGrossEnergy.run(self, num_sim, reanalysis_products, uncertainty_scada, wind_bin_threshold, max_power_filter, correction_threshold)
    253     self.filter_sum_impute_scada()  # Setup daily scada data
    254     self.setupturbine_model_dict()  # Setup daily data to be fit using the GAM
--> 255     self.fit_model()  # Fit daily turbine energy to atmospheric data
    256     self.apply_model(i)  # Apply fitting result to long-term reanalysis data
    258 # Log the completion of the run

File ~\AppData\Local\anaconda3\envs\openoa-env\lib\site-packages\openoa\logging.py:33, in logged_method_call.<locals>._wrapper(self, *args, **kwargs)
     31 logger = logging.getLogger(the_method.__module__)
     32 logger.debug(f"{self.__class__.__name__}#{id(self)}.{the_method.__name__}: {msg}")
---> 33 return the_method(self, *args, **kwargs)

File ~\AppData\Local\anaconda3\envs\openoa-env\lib\site-packages\openoa\analysis\turbine_long_term_gross_energy.py:519, in TurbineLongTermGrossEnergy.fit_model(self)
    516     df["energy_imputed"] = df["energy_imputed"] * self._run.scada_data_fraction
    518     # Consider wind speed, wind direction, and air density as features
--> 519     mod_results[t] = functions.gam_3param(
    520         windspeed_col="WMETR_HorWdSpd",
    521         wind_direction_col="WMETR_HorWdDir",
    522         air_density_col="WMETR_AirDen",
    523         power_col="energy_imputed",
    524         data=df,
    525     )
    526 self._model_results = mod_results

File ~\AppData\Local\anaconda3\envs\openoa-env\lib\site-packages\openoa\utils\_converters.py:294, in dataframe_method.<locals>.decorator.<locals>.wrapper(*args, **kwargs)
    292     # Update the args and kwargs as need and call the function
    293     args, kwargs = _update_arguments(args, kwargs, arg_ix_list, data_cols, arg_list)
--> 294     return func(*args, **kwargs)
    296 # When no data is provided, then convert the Series arguments, update args and kwargs,
    297 # appropriately, then call the function
    298 df, arg_list = series_to_df(*arg_list, names=data_cols)

File ~\AppData\Local\anaconda3\envs\openoa-env\lib\site-packages\openoa\utils\power_curve\functions.py:187, in gam_3param(windspeed_col, wind_direction_col, air_density_col, power_col, n_splines, data)
    184 y = data[power_col]
    186 # Fit the model
--> 187 model = LinearGAM(n_splines=n_splines).fit(X, y)
    189 # Wrap the prediction function in a closure to pack input variables
    190 @dataframe_method(data_cols=["windspeed_col", "wind_direction_col", "air_density_col"])
    191 def predict(
    192     windspeed_col: str | pd.Series,
   (...)
    195     data: pd.DataFrame = None,
    196 ):

File ~\AppData\Local\anaconda3\envs\openoa-env\lib\site-packages\pygam\pygam.py:887, in GAM.fit(self, X, y, weights)
    884 self._validate_params()
    886 # validate data
--> 887 y = check_y(y, self.link, self.distribution, verbose=self.verbose)
    888 X = check_X(X, verbose=self.verbose)
    889 check_X_y(X, y)

File ~\AppData\Local\anaconda3\envs\openoa-env\lib\site-packages\pygam\utils.py:234, in check_y(y, link, dist, min_samples, verbose)
    212 """
    213 tool to ensure that the targets:
    214 - are in the domain of the link function
   (...)
    230 y : array containing validated y-data
    231 """
    232 y = np.ravel(y)
--> 234 y = check_array(
    235     y,
    236     force_2d=False,
    237     min_samples=min_samples,
    238     ndim=1,
    239     name='y data',
    240     verbose=verbose,
    241 )
    243 with warnings.catch_warnings():
    244     warnings.simplefilter("ignore")

File ~\AppData\Local\anaconda3\envs\openoa-env\lib\site-packages\pygam\utils.py:203, in check_array(array, force_2d, n_feats, ndim, min_samples, name, verbose)
    201 n = array.shape[0]
    202 if n < min_samples:
--> 203     raise ValueError(
    204         '{} should have at least {} samples, '
    205         'but found {}'.format(name, min_samples, n)
    206     )
    208 return array

ValueError: y data should have at least 1 samples, but found 0

EDIT: I put the traceback in python code bracket to make it easier for me to read.

@RHammond2
Copy link
Collaborator

Hi @ajayxcel, would you be able to share the first timestamp, last timestamp, and the frequency of your data? I'd like to be able to recreate this to a certain extent to understand the nature of the issue.

@RHammond2 RHammond2 added the analysis-methods Relating to analysis methods. label Apr 19, 2024
@ajayxcel
Copy link
Author

Hi @RHammond2, for the la_haute_borne data, I used between the timestamps 2014-01-02T00:00:00+01:00 and 2016-01-01T00:50:00+01:00. So, I deleted the data for the day 2014-01-01 for all the 4 turbines and ran the code. Thanks for looking into this @RHammond2. I appreciate it.

@RHammond2
Copy link
Collaborator

Thanks for sharing, @ajayxcel, I should be able to take a look at this next week to see what's going on.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
analysis-methods Relating to analysis methods.
Projects
None yet
Development

No branches or pull requests

2 participants