-
-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added MEA and accuracy to the GitHub workflow #34
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
accuracy_score: 0.75 | ||
MAE: 0.3333333333333333 |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
accuracy_score: 0.75 | ||
MAE: 0.3333333333333333 |
Original file line number | Diff line number | Diff line change | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
@@ -0,0 +1,22 @@ | ||||||||||
from sklearn.metrics import accuracy_score, mean_absolute_error | ||||||||||
import pandas | ||||||||||
from os.path import basename | ||||||||||
|
||||||||||
# Load the preprocessed test data CSV into a DataFrame | ||||||||||
storybooks_csv_path = '../step1_prepare/step1_3_storybooks_test.csv' | ||||||||||
storybooks_dataframe = pandas.read_csv(storybooks_csv_path) | ||||||||||
val_y = storybooks_dataframe['reading_level'] | ||||||||||
|
||||||||||
# Load Predicted values from step3_2_predictions.csv | ||||||||||
val_predictions = pandas.read_csv('../step3_predict/step3_2_predictions.csv') | ||||||||||
Comment on lines
+5
to
+11
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🛠️ Refactor suggestion Enhance robustness of data loading process. While the data loading process is straightforward, consider the following improvements:
Here's a suggested improvement: import os
from pathlib import Path
# Use environment variables or construct paths relative to the script location
BASE_DIR = Path(__file__).resolve().parent.parent
storybooks_csv_path = os.path.join(BASE_DIR, 'step1_prepare', 'step1_3_storybooks_test.csv')
predictions_csv_path = os.path.join(BASE_DIR, 'step3_predict', 'step3_2_predictions.csv')
try:
storybooks_dataframe = pandas.read_csv(storybooks_csv_path)
val_y = storybooks_dataframe['reading_level']
val_predictions = pandas.read_csv(predictions_csv_path)
# Validate dataframe structure
assert 'reading_level' in storybooks_dataframe.columns, "Missing 'reading_level' column in test data"
assert val_predictions.shape[1] == 1, "Predictions should be a single column"
except FileNotFoundError as e:
print(f"Error: {e}. Please ensure the CSV files exist in the correct location.")
exit(1)
except AssertionError as e:
print(f"Error: {e}. The structure of the CSV files is not as expected.")
exit(1) This modification improves the script's robustness and provides clearer error messages if issues arise. |
||||||||||
|
||||||||||
accuracy = accuracy_score(val_y, val_predictions) | ||||||||||
print(basename(__file__), f'accuracy_score: {accuracy}') | ||||||||||
|
||||||||||
mae = mean_absolute_error(val_y, val_predictions) | ||||||||||
print(basename(__file__), f'accuracy_score: {mae}') | ||||||||||
Comment on lines
+16
to
+17
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Fix typo in MAE print statement. The MAE calculation is correct, but there's a typo in the print statement. Please apply the following fix: - print(basename(__file__), f'accuracy_score: {mae}')
+ print(basename(__file__), f'MAE: {mae}') This change ensures that the output correctly identifies the metric as MAE instead of accuracy_score. 📝 Committable suggestion
Suggested change
|
||||||||||
|
||||||||||
# Save the results to a file for the GitHub workflow to read | ||||||||||
with open('Metrics_output.txt', 'w') as f: | ||||||||||
f.write(f'accuracy_score: {accuracy}\n') | ||||||||||
f.write(f'MAE: {mae}\n') | ||||||||||
Comment on lines
+19
to
+22
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🛠️ Refactor suggestion Enhance error handling and file path management for results saving. While the use of a context manager for file operations is good practice, consider the following improvements:
Here's a suggested improvement: import os
from datetime import datetime
# Use an environment variable or a config file to set the output directory
output_dir = os.environ.get('METRICS_OUTPUT_DIR', '.')
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
output_file = os.path.join(output_dir, f'Metrics_output_{timestamp}.txt')
try:
with open(output_file, 'w') as f:
f.write(f'accuracy_score: {accuracy}\n')
f.write(f'MAE: {mae}\n')
print(f"Metrics successfully written to {output_file}")
except IOError as e:
print(f"Error writing to file: {e}") This modification improves error handling, uses a more robust file path, and includes a timestamp in the filename to preserve historical results. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update deprecated
set-output
command and add error handling.The step correctly extracts the metrics, but there are a few improvements we can make:
::set-output
syntax is deprecated. Use the$GITHUB_OUTPUT
environment file instead.awk
instead ofgrep
for more robust parsing.Here's a suggested improvement:
This change:
awk
for more robust parsing.$GITHUB_OUTPUT
syntax for setting outputs.📝 Committable suggestion
🧰 Tools
🪛 yamllint