Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: added docs entries for amos components #121

Merged
merged 3 commits into from
Jan 15, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
::: src.sdk.python.rtdip_sdk.pipelines.data_quality.data_manipulation.spark.dimensionality_reduction
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
::: src.sdk.python.rtdip_sdk.pipelines.data_quality.data_manipulation.spark.interval_filtering
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
::: src.sdk.python.rtdip_sdk.pipelines.data_quality.data_manipulation.spark.k_sigma_anomaly_detection
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
::: src.sdk.python.rtdip_sdk.pipelines.data_quality.data_manipulation.spark.normalization.denormalization
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
::: src.sdk.python.rtdip_sdk.pipelines.data_quality.data_manipulation.spark.normalization.normalization
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
::: src.sdk.python.rtdip_sdk.pipelines.data_quality.data_manipulation.spark.normalization.normalization_mean
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
::: src.sdk.python.rtdip_sdk.pipelines.data_quality.data_manipulation.spark.normalization.normalization_minmax
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
::: src.sdk.python.rtdip_sdk.pipelines.data_quality.data_manipulation.spark.normalization.normalization_zscore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
::: src.sdk.python.rtdip_sdk.pipelines.machine_learning.spark.data_binning
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
::: src.sdk.python.rtdip_sdk.pipelines.machine_learning.spark.linear_regression
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@
from pyspark.ml.evaluation import RegressionEvaluator
from ..interfaces import MachineLearningInterface
from ..._pipeline_utils.models import Libraries, SystemType
from typing import Optional


class LinearRegression(MachineLearningInterface):
Expand Down Expand Up @@ -61,15 +62,15 @@ def libraries():
def settings() -> dict:
return {}

def split_data(self, train_ratio: float = 0.8):
def split_data(self, train_ratio: float = 0.8) -> tuple[DataFrame, DataFrame]:
"""
Splits the dataset into training and testing sets.

Args:
train_ratio (float): The ratio of the data to be used for training. Default is 0.8 (80% for training).

Returns:
DataFrame: Returns the training and testing datasets.
tuple[DataFrame, DataFrame]: Returns the training and testing datasets.
"""
train_df, test_df = self.df.randomSplit([train_ratio, 1 - train_ratio], seed=42)
return train_df, test_df
Expand All @@ -96,18 +97,17 @@ def predict(self, prediction_df: DataFrame):
prediction_df,
)

def evaluate(self, test_df: DataFrame):
def evaluate(self, test_df: DataFrame) -> Optional[float]:
"""
Evaluates the trained model using RMSE.

Args:
test_df (DataFrame): The testing dataset to evaluate the model.

Returns:
float: The Root Mean Squared Error (RMSE) of the model.
Optional[float]: The Root Mean Squared Error (RMSE) of the model or None if the prediction columnd doesn't exist.
"""
# Check the columns of the test DataFrame
print(f"Columns in test_df: {test_df.columns}")
test_df.show(5)

if self.prediction_col not in test_df.columns:
Expand Down
Loading