Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PUBLIC-2591: App code extraction, test improvement #16

Merged
merged 3 commits into from
Aug 2, 2024

Conversation

ialarmedalien
Copy link
Collaborator

@ialarmedalien ialarmedalien commented Aug 2, 2024

  • moved fasttree code out into a separate file
  • used vcrpy to fake workspace/report/shock responses
  • added some tests for the output of the fasttree run

I have the following PRs planned:

  • update Dockerfile to python 3.12
  • switch tests over to pytest, add in a couple more tests
  • fix bug where UPAs are not rendered correctly
  • tidy up, add any documentation necessary

@ialarmedalien ialarmedalien self-assigned this Aug 2, 2024
@ialarmedalien ialarmedalien changed the base branch from update_dockerfile to main August 2, 2024 03:36
@ialarmedalien ialarmedalien changed the base branch from main to update_dockerfile August 2, 2024 03:38
if hasattr(cls, "ws_name"):
try:
cls.ws_client.delete_workspace({"workspace": cls.ws_name})
except:

Check notice

Code scanning / CodeQL

Except block handles 'BaseException' Note test

Except block directly handles BaseException.
Comment on lines +120 to +121
# for field in msa_in.keys():
# log(console, "MSA key: '"+field+"'")

Check notice

Code scanning / CodeQL

Commented-out code Note

This comment appears to contain commented-out code.
Comment on lines +134 to +135
# for row_id in row_order:
# log(console, "row_id: '"+row_id+"' default_row_label: '"+default_row_labels[row_id]+"'")

Check notice

Code scanning / CodeQL

Commented-out code Note

This comment appears to contain commented-out code.
Comment on lines +207 to +210
# with open(input_msa_file_path, 'r') as input_msa_file_handle:
# for line in input_msa_file_handle:
# #log(console,"MSA_LINE: '"+line+"'") # too big for console
# log(invalid_msgs,"MSA_LINE: '"+line+"'")

Check notice

Code scanning / CodeQL

Commented-out code Note

This comment appears to contain commented-out code.
# tree_attributes will always be None.
# goodness knows what this was supposed to check for
if tree_attributes is not None:
output_tree["tree_attributes"] = tree_attributes

Check warning

Code scanning / CodeQL

Unreachable code Warning

This statement is unreachable.
@@ -2,9 +2,6 @@ name: KBase SDK Tests

on:
pull_request:
branches:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

run the tests on every PR

pip install --upgrade pip && \
# install the python requirements
# Note: You must use PyQt5==5.11.3 on debian
pip install ete3==3.1.2 PyQt5==5.11.3 numpy==1.23.1 pytest coverage pytest-cov vcrpy
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved up the python installation to prior to copying over the repo

@@ -1,23 +1,4 @@
[kb_fasttree]
kbase-endpoint = {{ kbase_endpoint }}
job-service-url = {{ job_service_url }}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

none of the rest of these are used - just the workspace URL and the scratch dir

@@ -0,0 +1,411 @@
"""
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

standard kbase app module - adding it here as I plan to update the base image in the next PR

@@ -0,0 +1,738 @@
"""Implementation of the fasttree app."""
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The majority of this is a big ol' cut and paste from the impl file. I separated out a couple of functions to make it easier to mock certain things for testing. There's a whole load of cruft still in the code that I will deal with in an upcoming PR.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll comment anything that I did change

Comment on lines +21 to +27
def generate_run_id() -> str:
"""Generate an ID for this fasttree run, based on the current date and time.

:return: temporally unique run ID
:rtype: str
"""
return datetime.now(tz=timezone.utc).strftime("%Y%m%d_%H%M%S")
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new: previously this app produced files with either a UUID or a timestamp as part of the filename. Instead of doing that, the app now creates a directory (in the scratch dir) with the timestamp as its name, and adds the files sans UUID/timestamp to that directory. It means that app generates a predictable set of files, which makes testing easier, and if for some reason you run the app numerous times in a row, each run will have its output in a separate directory.

Comment on lines +81 to +84
# create a directory for all files related to this run to live in
run_id = generate_run_id()
run_dir = Path(config["scratch"]) / run_id
run_dir.mkdir(parents=True, exist_ok=True)
Copy link
Collaborator Author

@ialarmedalien ialarmedalien Aug 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all new files go into this directory

I switched over to using Path from pathlib for os-related stuff whilst I was updating file names and adding the new run directory

if intree_type_name != "Tree":
raise ValueError("Cannot yet handle intree type of: " + intree_type_name)

intree_newick_file_path = run_dir / f"{intree_name}.newick"
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

simplified this file name

Comment on lines +557 to +558
# timestamp = int((datetime.utcnow() - datetime.utcfromtimestamp(0)).total_seconds() * 1000)
# html_output_dir = run_dir / f"output_html.{timestamp}"
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed in upcoming PR

Comment on lines -59 to -70
def now_ISO(self):
now_timestamp = datetime.now()
now_secs_from_epoch = (now_timestamp - datetime(1970,1,1)).total_seconds()
now_timestamp_in_iso = datetime.fromtimestamp(int(now_secs_from_epoch)).strftime('%Y-%m-%d_%T')
return now_timestamp_in_iso

def log(self, target, message):
message = '['+self.now_ISO()+'] '+message
if target is not None:
target.append(message)
print(message)
sys.stdout.flush()
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

both these functions have been transferred over to fasttree.py

I simplified now_ISO to just record a timestamp for now()

print(message)
sys.stdout.flush()

def get_single_end_read_library(self, ws_data, ws_info, forward):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

surprisingly none of the following four functions are used

# fasttree_cmd.append('-out')
# fasttree_cmd.append(output_newick_file_path)
# BEGIN run_FastTree
config = {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

simplified how the args are passed to the fasttree.py file


self.log(console,"run_FastTree DONE")
#END run_FastTree
returnVal = fasttree.run_fasttree(config, ctx, params)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice diff work, GitHub!

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:octocat:

@@ -0,0 +1 @@
(Desulfotomaculum gibsoniae DSM 7213 - DESGI RS22175:0.12153,Candidatus Desulforudis audaxviator MP104C - DAUD RS11160:0.15587,(((Desulfovibrio vulgaris DP4 - DVUL RS12960:0.0,Desulfovibrio vulgaris str. Hildenborough - DVU0402:0.0,Desulfovibrio vulgaris RCH1 - DEVAL RS01880:0.0):0.04805,Desulfovibrio vulgaris str. 'Miyazaki F' - DVMF RS07895:0.05673)1.000:0.14274,((Candidatus Desulforudis audaxviator MP104C - DAUD RS11155:0.30486,(Desulfotomaculum nigrificans DSM 574 - DESNIDRAFT RS0211670:0.11266,Desulfotomaculum alkaliphilum DSM 12257 - BR02 RS0103515:0.08482)0.953:0.08154)1.000:1.05608,(Desulfotomaculum nigrificans DSM 574 - DESNIDRAFT RS0211675:0.13918,Desulfotomaculum alkaliphilum DSM 12257 - BR02 RS0103520:0.12923)0.971:0.07526)0.995:0.11388)1.000:0.08045);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

expected output from the sample file in test/data

@@ -0,0 +1 @@
(21855%2f4%2f1.f%3aDESGI_RS22175:0.12153,21855%2f2%2f1.f%3aDAUD_RS11160:0.15587,(((21855%2f8%2f1.f%3aDVUL_RS12960:0.0,21855%2f6%2f1.f%3aDVU0402:0.0,21855%2f7%2f1.f%3aDEVAL_RS01880:0.0):0.04805,21855%2f9%2f1.f%3aDVMF_RS07895:0.05673)1.000:0.14274,((21855%2f2%2f1.f%3aDAUD_RS11155:0.30486,(21855%2f3%2f1.f%3aDESNIDRAFT_RS0211670:0.11266,21855%2f5%2f1.f%3aBR02_RS0103515:0.08482)0.953:0.08154)1.000:1.05608,(21855%2f3%2f1.f%3aDESNIDRAFT_RS0211675:0.13918,21855%2f5%2f1.f%3aBR02_RS0103520:0.12923)0.971:0.07526)0.995:0.11388)1.000:0.08045);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these should be UPAs -- this will be fixed in an upcoming PR

@@ -0,0 +1,4 @@
[kb_fasttree]
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

config for running tests

else:
print("Test workspace was deleted")

def get_ws_client(self):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

turned all these function names and variables into the proper pythonic versions

def get_context(self):
return self.ctx

def get_msa_ref(self):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was part of the test itself but I moved it to be a separate function to make the test a bit clearer and to allow more tests to be run using the same msa_ref

self.assertEqual(created_obj_0_info[NAME_I], obj_out_name)
self.assertEqual(created_obj_0_info[TYPE_I].split("-")[0], obj_out_type)
# patch generate_run_id to return self.run_id
with patch("kb_fasttree.fasttree.generate_run_id", return_value=self.run_id):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ensures that the test output is in a predictable location

assert created_obj_0_info[1] == obj_out_name
assert created_obj_0_info[2].split("-")[0] == obj_out_type

# check that the expected files are in the expected places
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New test assertions below -- these check that the expected files are produced with the expected content. This allows the fasttree app code to be edited with at least a little confidence that any changes in output are deliberate.

try:
cls.ws_client.delete_workspace({"workspace": cls.ws_name})
except:
print("Workspace already deleted")
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added this in to handle the case where the server responses are being played back by vcrpy and thus the call to delete the workspace is not needed

for sub_dir_name in file_list:
for f in file_list[sub_dir_name]:
check_non_zero_file_size(run_dir / sub_dir_name / f)
if not f.endswith("pdf") and not f.endswith("png"):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't check binary files as the compare_files function only looks at text files

@ialarmedalien ialarmedalien changed the title separate fasttree code out into a different file. Attempt to make the… App code extraction, test improvement Aug 2, 2024
@ialarmedalien ialarmedalien changed the title App code extraction, test improvement PUBLIC-2591: App code extraction, test improvement Aug 2, 2024
Copy link

@briehl briehl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine. Didn't really review the actual fasttree code, I'm assuming 95+% was copy-pasted. The other changes seem good. 👍

pip install --upgrade pip && \
# install the python requirements
# Note: You must use PyQt5==5.11.3 on debian
pip install ete3==3.1.2 PyQt5==5.11.3 numpy==1.23.1 pytest coverage pytest-cov vcrpy
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No requirements file?
And do the dev requirements need versioning? Probably not, realistically, but I figured I should mention.


self.log(console,"run_FastTree DONE")
#END run_FastTree
returnVal = fasttree.run_fasttree(config, ctx, params)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:octocat:

raise ValueError('Method run_FastTree return value ' +
'returnVal is not type dict as required.')
raise ValueError(
"Method run_FastTree " + "return value returnVal " + "is not type dict as required."
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That could maybe get squashed together. Not sure if that's generated code at this point, though, just post-auto-formatted.

@ialarmedalien ialarmedalien merged commit ebe34b8 into update_dockerfile Aug 2, 2024
1 check passed
@ialarmedalien ialarmedalien deleted the marginally_better_tests branch August 2, 2024 22:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants