creating function main #4

aajayi-21 · 2023-06-21T13:10:44Z

No description provided.

aajayi-21 · 2023-07-18T16:49:10Z

There is an error in io.py that I corrected. I also feel that there is a much better way to handle user input for the package in my main file.

sbillinge

the main issue is to use argparse for the cli inputs and not input.

Take a look at pdfmorph/talk to Andrew to see how it is done.

sbillinge · 2023-07-19T04:56:02Z

diffpy/snmf/io.py

@@ -102,7 +104,7 @@ def load_input_signals(file_path=None):
    for item in directory_path.iterdir():
        if item.is_file():
            data = loadData(item.resolve())
-            if current_grid and current_grid != data[:, 0]:
+            if len(current_grid) != 0 and (current_grid != data[:, 0]).any():


I think a comment line would help here with the intent. I.e., why do we ignore and not handle if the data are on a different grid. If I remember correctly t here was a physics reason (mathematically it should be straightforward). Just recording for future people.

I will add a comment line to explain this

diffpy/snmf/stretchednmfapp.py

sbillinge · 2023-07-19T04:57:27Z

diffpy/snmf/stretchednmfapp.py

+    if not directory_path:
+        directory_path = None
+
+    data_type = input("Specify the data type ('xrd' or 'pdf'): ")


this can often be obtained from file header.

sbillinge · 2023-07-19T04:58:29Z

diffpy/snmf/stretchednmfapp.py

+    if data_type != 'xrd' and data_type != 'pdf':
+        raise ValueError("The data type must be 'xrd' or 'pdf'")
+
+    component_amount = input("\nEnter the amount of components to obtain:")


should probably be "number" of components. "amount" sounds like a quantity of a continuous quantity, like amount of sand or amount of flour.

I will change the naming.

sbillinge · 2023-07-19T04:59:57Z

diffpy/snmf/stretchednmfapp.py

+    try:
+        component_amount = int(component_amount)
+    except TypeError:
+        raise TypeError("Please enter an integer greater than 0")


put this in the input instructions rather than have the poor user have to enter quantities and have the program fail to find out they were supposed to put in an integer...

Ok. I will do this

aajayi-21 · 2023-07-19T12:50:18Z

I understand. For argparse should I create a separate function in io.py or put it in main

sbillinge · 2023-07-19T14:07:32Z

I would look in PDFmorph for inspiration there.....there is something called pdfmorphapp.py I think where the "app" part of it (i.e., the cli in that case) is held. S

…

On Wed, Jul 19, 2023 at 2:50 PM Adeolu Ajayi ***@***.***> wrote: I understand. For argparse should I create a separate function in io.py or put it in main — Reply to this email directly, view it on GitHub <#4 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABAOWUO3HXCHDNBCI5TYOTLXQ7JZJANCNFSM6AAAAAAZOXODRU> . You are receiving this because you commented.Message ID: <diffpy/diffpy. ***@***.***>

-- Simon Billinge Professor, Columbia University

sbillinge · 2023-07-19T16:16:26Z

btw, your current main.py is acting as the "app" so stick with that I would say.

sbillinge

👍 you got it.....

…raw data is lifted

sbillinge

pls see comments

sbillinge · 2023-07-20T08:07:13Z

diffpy/snmf/stretchednmfapp.py

+    )
+    parser.add_argument('-v', '--version', action='version', help='Print the software version number')
+    parser.add_argument('-d', '--directory', type=str,
+                        help="Directory containing experimental data. Ensure it is in quotations or apostrophes.")


why does it have to be in quotes? Call it maybe input-directory Give it a default value of None then later if args.data_directory == None set it to the cwd. Mention this in the help text.

I think it will end up being a good idea to add an argument output-directory as an optional argument. If it is not specified, maybe have the default behavior to either dump the results into input-directory or create a directory snmf_results that hangs off the input-directory?

by default, the default value of an optional argument is None (so it would be redundant to explicitly set that). The load_input_signal function handles the case where it is None as you described.

When I was using it, the cli wouldn't work when I specified a directory and it wasn't in quotes. I can see if I can fix this in the code.

I will rename the variables and add an argument as you described. Should I make a new function in io.py to handle dumping the results in a new directory?

ok, sounds good.

The most important thing is that the users know what is happening. They don't need to know "the default value is None and this is handled in the code" they need to know the arg is optional and the default behavior is that it uses the current working directory.

It is always a good idea to make your intentions clearer for other devs who come later, so I could definitely argue in favor of explicitly defining default to be None. It just reduces cognitive overload on the person coming later....they don't have to scratch their head and think, "what is going on?" and having to remember what the default behavior of argparse is....

sbillinge · 2023-07-20T08:09:48Z

diffpy/snmf/stretchednmfapp.py

+    parser.add_argument('-d', '--directory', type=str,
+                        help="Directory containing experimental data. Ensure it is in quotations or apostrophes.")
+
+    parser.add_argument('component_number', type=int,


I suggest maybe components to make it easier for the users. Later in the code change it number_of_components to make it clearer for future developers. I think component_number has a different meaning, for example, component-1 component-2 etc.

Ok. I will rename it here and in the rest of the code.

sbillinge · 2023-07-20T08:12:42Z

diffpy/snmf/stretchednmfapp.py

+    parser.add_argument('component_number', type=int,
+                        help="The number of component signals to obtain from experimental "
+                             "data. Must be an integer greater than 0.")
+    parser.add_argument('data_type', type=str, choices=['xrd', 'pdf'], help="The type of the experimental data.")


in the cli, always use a - rather than a _. Later argparse will replace it with _ for the variable name. So data-type would be here, but we would access it as args.data_type later. Like capitalization...if we always do it the same way it prevents aggravation later.

For this it is often present in the file header, so maybe better to make this optional, but raise an exception and ask the user to rerun specifying it if you can't find it in the file header.

I understand. I will make it an optional argument and rename it as you described.

…onality for getting data type from file extension

sbillinge

please see comment.

diffpy/snmf/stretchednmfapp.py

…string

sbillinge

looking great! Good work!

sbillinge · 2023-07-22T06:31:23Z

diffpy/snmf/io.py

@@ -44,14 +44,16 @@ def initialize_variables(data_input, component_amount, data_type, sparsity=1, sm

    component_matrix_guess = np.random.rand(signal_length, component_amount)
    weight_matrix_guess = np.random.rand(component_amount, moment_amount)
-    stretching_matrix_guess = np.ones(component_amount, moment_amount) + np.random.randn(component_amount,
+    stretching_matrix_guess = np.ones((component_amount, moment_amount)) + np.random.randn(component_amount,


these should probably not be in io.py. I would suggest to make functions somewhere in subroutines that initialize specific arrays then here write something like:

stretching_matrix_guess = initialize_arrays(number_of_components, number_of_moments)

or sthg like that.

In the function in subroutines.py, put into docstring what the arrays are and how the decision were made on how to do it. Currently the code is a bit too hard to read as it is....I think that by component_amount , based on earlier conversations, you mean number_of_components. I have no idea what moment_amount means, nor why you initialize it with ones and then add some kind of random component, so a docstring could be veyr helpful here (but in subroutines, not in io.

I will create that function on a new branch. I will also rename the variables to make it more clear.

sbillinge · 2023-07-22T06:40:16Z

diffpy/snmf/io.py

@@ -113,4 +117,10 @@ def load_input_signals(file_path=None):
    grid_array = np.column_stack(grid_list)
    grid_vector = np.unique(grid_array, axis=1)
    values_array = np.column_stack(values_list)
-    return grid_vector, values_array
+    if file_extension in {'.gr','.chi'}:


don't hard-code this here. make globals at the top near the imports, something like PDF_CONTAINING_FILE_EXTENSION and DIFFRACTION_CONTAINING_FILE_EXTENSIONS. To make it easier to read use () or []rather than {} so it is explicityly a tuple/list.

Always try and use this structure, it makes the code easier to maintain (if I come up with a new file extension I want to add, I just add it once at the top!)

.chi contains diffraction data, not PDF

I think using file extensions for this is probably a bad idea. For example, iq just means I(Q) but it could be x-ray, electron or neutron data in principle. xye is a file format and could contain anything.

for our files (.iq, sq, gr, cgr) to the extent possible we use a header and metadata about the data is in the header. This is where you try and get default values from. Long is already doing this in the pdfitc code, so maybe try and steal from there. I am not sure if you have access, I can give it to you.

I understand. I will look at the pdfitc code to see if there's a better way of doing this. I will let you know if I am not able to access it.

sbillinge · 2023-07-22T06:44:00Z

diffpy/snmf/stretchednmfapp.py

+        prog="stretched_nmf",
+        description="Stretched Nonnegative Matrix Factorization"
+    )
+    parser.add_argument('-v', '--version', action='version', help='Print the software version number')


put this at the bottom, it will be rarely used.

I will do that.

sbillinge · 2023-07-22T06:45:27Z

diffpy/snmf/stretchednmfapp.py

+    )
+    parser.add_argument('-v', '--version', action='version', help='Print the software version number')
+    parser.add_argument('-i', '--input-directory', type=str, default=None,
+                        help="Directory containing experimental data. Default before will cause the program to use the current working directory as the input directory.")


how about:
"Directory containing experimental data. Defaults to current working directory."
Were you able to resolve the issue with the quotes? It is ok ot put that back if needed, but is a bit ugly in a user interface if it can be avoided.

I have not. I will see if I can find a solution.

sbillinge · 2023-07-22T06:49:42Z

diffpy/snmf/stretchednmfapp.py

+    parser.add_argument('-i', '--input-directory', type=str, default=None,
+                        help="Directory containing experimental data. Default before will cause the program to use the current working directory as the input directory.")
+    parser.add_argument('-o', '--output-directory', type=str,
+                        help="The directory where the results will be dumped. Default behavior will create a new directory named 'smnf_results' inside the input directory.")


dumped is a developer work. Maybe written for the users?

For consistency maybe adopt uniform language (lowers cognitive overload of the user), so if we used Defaults to current working directory. above, use Defaults to '<input_directory>/snmf_results'".

I am not sure how you implemented it, but don't dump it off cwd but off input_directory. The code could be run from anywhere but the results should logically be associated with the inputs.

There was a typo in your help string (smnf), double check this typo isn't anywhere else in the code.

I have changed the help string and will check for typos

diffpy/snmf/stretchednmfapp.py

sbillinge · 2023-07-22T06:58:24Z

diffpy/snmf/stretchednmfapp.py

+def main():
+    args = create_parser()
+
+    grid, data_input, data_type = load_input_signals(args.input_directory)


I think it is maybe clearer to put the default logic here that turns input_directory from None to cwd. Or else, make it an optional argument in load_input_data.

Again, it is about code readability.

I will do this.

sbillinge · 2023-07-22T07:03:47Z

diffpy/snmf/stretchednmfapp.py

+    args = create_parser()
+
+    grid, data_input, data_type = load_input_signals(args.input_directory)
+    if args.data_type is not None:


for readability I would separate the action of handling the default behavior and calling the function. These are just general comments to make your code more readable, I hope you are taking them on board and reusing later...

What I mean is:

grid, data_input, data_type = load_input_signals(args.input_directory) if args.data_type: data_type = args.data_type variables = initialize_variables(data_input, args.components, data_type)

if not args.data_type:
try:

I see what you mean. For example, it's not good practice to have the function call inside the if statement if that's not needed.

diffpy/snmf/stretchednmfapp.py

sbillinge · 2023-07-22T07:07:47Z

diffpy/snmf/stretchednmfapp.py

+
+    weights_matrix = variables["weight_matrix_guess"]
+    component_matrix = variables["component_matrix_guess"]
+    stretching_factor_matrix = variables["stretching_matrix_guess"]


you did this for half the variables and not the others. I would just not do this and get the values from the dict below.

If you lint the code with black (whatever you do, do that on a separate branch! it makes the code review impossible to mix linting edits with functional edits!) it will arrange this in a nice readable way.

I understand, I will remove this.

I will look into setting up black to make my code more readable and better formatted.

… changed help string of output-directory default behavior for input_directory handled in main,

…d "moment_amount" to "number_of_moments" in io.py

…ariables in initialize_variables result,

sbillinge

small comments. Please read them. We can discuss how to do the refactor from the nice chart you made and shared.

diffpy/snmf/stretchednmfapp.py

sbillinge

please see my comment

sbillinge · 2023-07-26T19:46:15Z

diffpy/snmf/stretchednmfapp.py

@@ -18,6 +18,8 @@ def create_parser():
                        help="The directory where the results will be written. Defaults to '<input_directory>/snmf_results'.")
    parser.add_argument('-t', '--data-type', type=str, choices=['powder_diffraction', 'pdf'],
                        help="The type of the experimental data.")
+    parser.add_argument('-l', '--lift', type=float, default=1,
+                        help="The factor that determines how much the data is lifted. By default, the data will be vertically translated to make the minimum value 0.")


how about "The lifting factor. Data will be lifted by lifted data = data - min(data)*lift. Default is 1"

btw, it occurs to me that if the min(data) is positive, the "lift" will "lower" the data to zero if lift = 1 and will lower it to below zero if lift > 1. This is probably undesirable behavior. Make sure you have tests for this eventuality that give the behavior you want.

I will make this changes.

The same thought occurred to me. If the min(data) is positive, it's not apparent to me that the data would need to be lifted. My idea is that if the minimum is positive, then do nothing to the data.

That is reasonable, though why not allow the user to lift the data if she wants to?

Maybe make sure the lift is a lift, so it adds the absolute value of min(data)*lift to the data, something like that.

sbillinge

let me know what to do here. There is really too much going on for me to merge this PR, so it would be better to split things onto more different PRs, one for ach different bits of functionality.

On thing we don't want to forget is to refactor number_of_moments to number_of_inputs or something. I know that "moments" will confuse me (and others) in the future.

If you want I can merge this and you can make PRs fixing things. I don'r really mind doing that for ugly "main" functions that work but are ugly and you are gradually going through them clenaing things one thing at a time but want them to keep working "ugly" as you go.

aajayi-21 · 2023-07-31T15:04:00Z

I understand. I think it will be better for me to make a bunch of new PRs. Do I make these PRs one at a time or all at once?

Also, if we go with the approach where I have a ComponentSignal object, some of the functions have to change.

aajayi-21 added 5 commits June 21, 2023 08:09

Initial commit creating main

36fce4d

Merge branch 'main' of github.com:diffpy/diffpy.snmf into create_main

2ffed0c

reformatted

42dac41

setup library behavior

dfa4910

updated main, fixed io

b982382

sbillinge reviewed Jul 19, 2023

View reviewed changes

added create_parser function to handle application input

6d8be90

sbillinge reviewed Jul 19, 2023

View reviewed changes

added the maximum iterations variable and the condition for when the …

6fb2879

…raw data is lifted

sbillinge reviewed Jul 20, 2023

View reviewed changes

aajayi-21 added 3 commits July 20, 2023 13:58

renamed parameters and added output directory parameter. Added functi…

07cd3ec

…onality for getting data type from file extension

Merge branch 'main' into create_main

1be46c2

added import statement, synced main with branch

32bf269

sbillinge reviewed Jul 21, 2023

View reviewed changes

diffpy/snmf/stretchednmfapp.py Outdated Show resolved Hide resolved

aajayi-21 added 3 commits July 21, 2023 16:03

added functionality to test cli app

e527a19

added explicit default value for --input-directory and modified help …

865251e

…string

changed import statement to explicitly import functions

8f6b94e

sbillinge reviewed Jul 22, 2023

View reviewed changes

aajayi-21 added 4 commits July 22, 2023 11:16

reformatted io.py, moved version argument to bottom of create_parser,…

a174475

… changed help string of output-directory default behavior for input_directory handled in main,

changed "component_amount" to "number_of_components" in io.py. change…

8b26f4f

…d "moment_amount" to "number_of_moments" in io.py

fixed typo out-directory help string

a024bb5

changed 'xrd' to 'power_diffraction' choices for data_type, renamed v…

1331805

…ariables in initialize_variables result,

sbillinge reviewed Jul 26, 2023

View reviewed changes

diffpy/snmf/stretchednmfapp.py Show resolved Hide resolved

added a lifting factor optional argument

7d4c97f

sbillinge reviewed Jul 26, 2023

View reviewed changes

sbillinge reviewed Jul 31, 2023

View reviewed changes

creating function main #4

Are you sure you want to change the base?

creating function main #4

Conversation

aajayi-21 commented Jun 21, 2023

aajayi-21 commented Jul 18, 2023

sbillinge left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aajayi-21 commented Jul 19, 2023

sbillinge commented Jul 19, 2023 via email

sbillinge commented Jul 19, 2023

sbillinge left a comment

Choose a reason for hiding this comment

sbillinge left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aajayi-21 Jul 20, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sbillinge left a comment

Choose a reason for hiding this comment

sbillinge left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sbillinge left a comment

Choose a reason for hiding this comment

sbillinge left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sbillinge left a comment

Choose a reason for hiding this comment

aajayi-21 commented Jul 31, 2023

aajayi-21 Jul 20, 2023 •

edited

Loading