-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
creating function main #4
base: main
Are you sure you want to change the base?
Conversation
There is an error in io.py that I corrected. I also feel that there is a much better way to handle user input for the package in my main file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the main issue is to use argparse
for the cli inputs and not input
.
Take a look at pdfmorph/talk to Andrew to see how it is done.
@@ -102,7 +104,7 @@ def load_input_signals(file_path=None): | |||
for item in directory_path.iterdir(): | |||
if item.is_file(): | |||
data = loadData(item.resolve()) | |||
if current_grid and current_grid != data[:, 0]: | |||
if len(current_grid) != 0 and (current_grid != data[:, 0]).any(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a comment line would help here with the intent. I.e., why do we ignore and not handle if the data are on a different grid. If I remember correctly t here was a physics reason (mathematically it should be straightforward). Just recording for future people.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will add a comment line to explain this
diffpy/snmf/stretchednmfapp.py
Outdated
if not directory_path: | ||
directory_path = None | ||
|
||
data_type = input("Specify the data type ('xrd' or 'pdf'): ") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this can often be obtained from file header.
diffpy/snmf/stretchednmfapp.py
Outdated
if data_type != 'xrd' and data_type != 'pdf': | ||
raise ValueError("The data type must be 'xrd' or 'pdf'") | ||
|
||
component_amount = input("\nEnter the amount of components to obtain:") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should probably be "number" of components. "amount" sounds like a quantity of a continuous quantity, like amount of sand or amount of flour.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will change the naming.
diffpy/snmf/stretchednmfapp.py
Outdated
try: | ||
component_amount = int(component_amount) | ||
except TypeError: | ||
raise TypeError("Please enter an integer greater than 0") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
put this in the input instructions rather than have the poor user have to enter quantities and have the program fail to find out they were supposed to put in an integer...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. I will do this
I understand. For argparse should I create a separate function in io.py or put it in main |
I would look in PDFmorph for inspiration there.....there is something
called pdfmorphapp.py I think where the "app" part of it (i.e., the cli in
that case) is held.
S
…On Wed, Jul 19, 2023 at 2:50 PM Adeolu Ajayi ***@***.***> wrote:
I understand. For argparse should I create a separate function in io.py or
put it in main
—
Reply to this email directly, view it on GitHub
<#4 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABAOWUO3HXCHDNBCI5TYOTLXQ7JZJANCNFSM6AAAAAAZOXODRU>
.
You are receiving this because you commented.Message ID: <diffpy/diffpy.
***@***.***>
--
Simon Billinge
Professor, Columbia University
|
btw, your current |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 you got it.....
…raw data is lifted
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls see comments
diffpy/snmf/stretchednmfapp.py
Outdated
) | ||
parser.add_argument('-v', '--version', action='version', help='Print the software version number') | ||
parser.add_argument('-d', '--directory', type=str, | ||
help="Directory containing experimental data. Ensure it is in quotations or apostrophes.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why does it have to be in quotes? Call it maybe input-directory
Give it a default value of None
then later if args.data_directory == None
set it to the cwd. Mention this in the help text.
I think it will end up being a good idea to add an argument output-directory
as an optional argument. If it is not specified, maybe have the default behavior to either dump the results into input-directory or create a directory snmf_results
that hangs off the input-directory?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
by default, the default value of an optional argument is None (so it would be redundant to explicitly set that). The load_input_signal function handles the case where it is None as you described.
When I was using it, the cli wouldn't work when I specified a directory and it wasn't in quotes. I can see if I can fix this in the code.
I will rename the variables and add an argument as you described. Should I make a new function in io.py to handle dumping the results in a new directory?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, sounds good.
The most important thing is that the users know what is happening. They don't need to know "the default value is None and this is handled in the code" they need to know the arg is optional and the default behavior is that it uses the current working directory.
It is always a good idea to make your intentions clearer for other devs who come later, so I could definitely argue in favor of explicitly defining default to be None. It just reduces cognitive overload on the person coming later....they don't have to scratch their head and think, "what is going on?" and having to remember what the default behavior of argparse is....
diffpy/snmf/stretchednmfapp.py
Outdated
parser.add_argument('-d', '--directory', type=str, | ||
help="Directory containing experimental data. Ensure it is in quotations or apostrophes.") | ||
|
||
parser.add_argument('component_number', type=int, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest maybe components
to make it easier for the users. Later in the code change it number_of_components
to make it clearer for future developers. I think component_number
has a different meaning, for example, component-1 component-2 etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. I will rename it here and in the rest of the code.
diffpy/snmf/stretchednmfapp.py
Outdated
parser.add_argument('component_number', type=int, | ||
help="The number of component signals to obtain from experimental " | ||
"data. Must be an integer greater than 0.") | ||
parser.add_argument('data_type', type=str, choices=['xrd', 'pdf'], help="The type of the experimental data.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in the cli, always use a -
rather than a _
. Later argparse will replace it with _
for the variable name. So data-type
would be here, but we would access it as args.data_type
later. Like capitalization...if we always do it the same way it prevents aggravation later.
For this it is often present in the file header, so maybe better to make this optional, but raise an exception and ask the user to rerun specifying it if you can't find it in the file header.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand. I will make it an optional argument and rename it as you described.
…onality for getting data type from file extension
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please see comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looking great! Good work!
diffpy/snmf/io.py
Outdated
@@ -44,14 +44,16 @@ def initialize_variables(data_input, component_amount, data_type, sparsity=1, sm | |||
|
|||
component_matrix_guess = np.random.rand(signal_length, component_amount) | |||
weight_matrix_guess = np.random.rand(component_amount, moment_amount) | |||
stretching_matrix_guess = np.ones(component_amount, moment_amount) + np.random.randn(component_amount, | |||
stretching_matrix_guess = np.ones((component_amount, moment_amount)) + np.random.randn(component_amount, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these should probably not be in io.py. I would suggest to make functions somewhere in subroutines that initialize specific arrays then here write something like:
stretching_matrix_guess = initialize_arrays(number_of_components, number_of_moments)
or sthg like that.
In the function in subroutines.py, put into docstring what the arrays are and how the decision were made on how to do it. Currently the code is a bit too hard to read as it is....I think that by component_amount
, based on earlier conversations, you mean number_of_components
. I have no idea what moment_amount
means, nor why you initialize it with ones and then add some kind of random component, so a docstring could be veyr helpful here (but in subroutines, not in io.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will create that function on a new branch. I will also rename the variables to make it more clear.
diffpy/snmf/io.py
Outdated
@@ -113,4 +117,10 @@ def load_input_signals(file_path=None): | |||
grid_array = np.column_stack(grid_list) | |||
grid_vector = np.unique(grid_array, axis=1) | |||
values_array = np.column_stack(values_list) | |||
return grid_vector, values_array | |||
if file_extension in {'.gr','.chi'}: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't hard-code this here. make globals at the top near the imports, something like PDF_CONTAINING_FILE_EXTENSION
and DIFFRACTION_CONTAINING_FILE_EXTENSIONS
. To make it easier to read use ()
or []
rather than {}
so it is explicityly a tuple/list.
Always try and use this structure, it makes the code easier to maintain (if I come up with a new file extension I want to add, I just add it once at the top!)
.chi
contains diffraction data, not PDF
I think using file extensions for this is probably a bad idea. For example, iq just means I(Q) but it could be x-ray, electron or neutron data in principle. xye
is a file format and could contain anything.
for our files (.iq, sq, gr, cgr) to the extent possible we use a header and metadata about the data is in the header. This is where you try and get default values from. Long is already doing this in the pdfitc code, so maybe try and steal from there. I am not sure if you have access, I can give it to you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand. I will look at the pdfitc code to see if there's a better way of doing this. I will let you know if I am not able to access it.
diffpy/snmf/stretchednmfapp.py
Outdated
prog="stretched_nmf", | ||
description="Stretched Nonnegative Matrix Factorization" | ||
) | ||
parser.add_argument('-v', '--version', action='version', help='Print the software version number') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
put this at the bottom, it will be rarely used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will do that.
diffpy/snmf/stretchednmfapp.py
Outdated
) | ||
parser.add_argument('-v', '--version', action='version', help='Print the software version number') | ||
parser.add_argument('-i', '--input-directory', type=str, default=None, | ||
help="Directory containing experimental data. Default before will cause the program to use the current working directory as the input directory.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about:
"Directory containing experimental data. Defaults to current working directory."
Were you able to resolve the issue with the quotes? It is ok ot put that back if needed, but is a bit ugly in a user interface if it can be avoided.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have not. I will see if I can find a solution.
diffpy/snmf/stretchednmfapp.py
Outdated
parser.add_argument('-i', '--input-directory', type=str, default=None, | ||
help="Directory containing experimental data. Default before will cause the program to use the current working directory as the input directory.") | ||
parser.add_argument('-o', '--output-directory', type=str, | ||
help="The directory where the results will be dumped. Default behavior will create a new directory named 'smnf_results' inside the input directory.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dumped
is a developer work. Maybe written
for the users?
For consistency maybe adopt uniform language (lowers cognitive overload of the user), so if we used Defaults to current working directory.
above, use Defaults to '<input_directory>/snmf_results'"
.
I am not sure how you implemented it, but don't dump it off cwd
but off input_directory
. The code could be run from anywhere but the results should logically be associated with the inputs.
There was a typo in your help string (smnf
), double check this typo isn't anywhere else in the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have changed the help string and will check for typos
def main(): | ||
args = create_parser() | ||
|
||
grid, data_input, data_type = load_input_signals(args.input_directory) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is maybe clearer to put the default logic here that turns input_directory from None to cwd. Or else, make it an optional argument in load_input_data.
Again, it is about code readability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will do this.
diffpy/snmf/stretchednmfapp.py
Outdated
args = create_parser() | ||
|
||
grid, data_input, data_type = load_input_signals(args.input_directory) | ||
if args.data_type is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for readability I would separate the action of handling the default behavior and calling the function. These are just general comments to make your code more readable, I hope you are taking them on board and reusing later...
What I mean is:
grid, data_input, data_type = load_input_signals(args.input_directory)
if args.data_type:
data_type = args.data_type
variables = initialize_variables(data_input, args.components, data_type)
if not args.data_type:
try:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see what you mean. For example, it's not good practice to have the function call inside the if statement if that's not needed.
diffpy/snmf/stretchednmfapp.py
Outdated
|
||
weights_matrix = variables["weight_matrix_guess"] | ||
component_matrix = variables["component_matrix_guess"] | ||
stretching_factor_matrix = variables["stretching_matrix_guess"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you did this for half the variables and not the others. I would just not do this and get the values from the dict below.
If you lint the code with black (whatever you do, do that on a separate branch! it makes the code review impossible to mix linting edits with functional edits!) it will arrange this in a nice readable way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand, I will remove this.
I will look into setting up black to make my code more readable and better formatted.
… changed help string of output-directory default behavior for input_directory handled in main,
…d "moment_amount" to "number_of_moments" in io.py
…ariables in initialize_variables result,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
small comments. Please read them. We can discuss how to do the refactor from the nice chart you made and shared.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please see my comment
@@ -18,6 +18,8 @@ def create_parser(): | |||
help="The directory where the results will be written. Defaults to '<input_directory>/snmf_results'.") | |||
parser.add_argument('-t', '--data-type', type=str, choices=['powder_diffraction', 'pdf'], | |||
help="The type of the experimental data.") | |||
parser.add_argument('-l', '--lift', type=float, default=1, | |||
help="The factor that determines how much the data is lifted. By default, the data will be vertically translated to make the minimum value 0.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about "The lifting factor. Data will be lifted by lifted data = data - min(data)*lift. Default is 1"
btw, it occurs to me that if the min(data) is positive, the "lift" will "lower" the data to zero if lift = 1 and will lower it to below zero if lift > 1. This is probably undesirable behavior. Make sure you have tests for this eventuality that give the behavior you want.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will make this changes.
The same thought occurred to me. If the min(data) is positive, it's not apparent to me that the data would need to be lifted. My idea is that if the minimum is positive, then do nothing to the data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is reasonable, though why not allow the user to lift the data if she wants to?
Maybe make sure the lift is a lift, so it adds the absolute value of min(data)*lift to the data, something like that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let me know what to do here. There is really too much going on for me to merge this PR, so it would be better to split things onto more different PRs, one for ach different bits of functionality.
On thing we don't want to forget is to refactor number_of_moments
to number_of_inputs
or something. I know that "moments" will confuse me (and others) in the future.
If you want I can merge this and you can make PRs fixing things. I don'r really mind doing that for ugly "main" functions that work but are ugly and you are gradually going through them clenaing things one thing at a time but want them to keep working "ugly" as you go.
I understand. I think it will be better for me to make a bunch of new PRs. Do I make these PRs one at a time or all at once? Also, if we go with the approach where I have a ComponentSignal object, some of the functions have to change. |
No description provided.