-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
creating function main #4
base: main
Are you sure you want to change the base?
Changes from 10 commits
36fce4d
2ffed0c
42dac41
dfa4910
b982382
6d8be90
6fb2879
07cd3ec
1be46c2
32bf269
e527a19
865251e
8f6b94e
a174475
8b26f4f
a024bb5
1331805
7d4c97f
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -44,14 +44,16 @@ def initialize_variables(data_input, component_amount, data_type, sparsity=1, sm | |
|
||
component_matrix_guess = np.random.rand(signal_length, component_amount) | ||
weight_matrix_guess = np.random.rand(component_amount, moment_amount) | ||
stretching_matrix_guess = np.ones(component_amount, moment_amount) + np.random.randn(component_amount, | ||
stretching_matrix_guess = np.ones((component_amount, moment_amount)) + np.random.randn(component_amount, | ||
moment_amount) * 1e-3 | ||
|
||
diagonals = [np.ones(moment_amount - 2), -2 * np.ones(moment_amount - 2), np.ones(moment_amount - 2)] | ||
smoothness_term = .25 * scipy.sparse.diags(diagonals, [0, 1, 2], shape=(moment_amount - 2, moment_amount)) | ||
|
||
hessian_helper_matrix = scipy.sparse.block_diag([smoothness_term.T @ smoothness_term] * component_amount) | ||
sequence = np.arange(moment_amount * component_amount).reshape(component_amount, moment_amount).T.flatten() | ||
|
||
hessian_helper_matrix = hessian_helper_matrix.tocsr() | ||
hessian_helper_matrix = hessian_helper_matrix[sequence, :][:, sequence] | ||
|
||
return { | ||
|
@@ -99,10 +101,12 @@ def load_input_signals(file_path=None): | |
values_list = [] | ||
grid_list = [] | ||
current_grid = [] | ||
file_extension = "" | ||
for item in directory_path.iterdir(): | ||
if item.is_file(): | ||
file_extension = Path(item).suffix | ||
data = loadData(item.resolve()) | ||
if current_grid and current_grid != data[:, 0]: | ||
if len(current_grid) != 0 and (current_grid != data[:, 0]).any(): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think a comment line would help here with the intent. I.e., why do we ignore and not handle if the data are on a different grid. If I remember correctly t here was a physics reason (mathematically it should be straightforward). Just recording for future people. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I will add a comment line to explain this |
||
print(f"{item.name} was ignored as it is not on a compatible grid.") | ||
continue | ||
else: | ||
|
@@ -113,4 +117,10 @@ def load_input_signals(file_path=None): | |
grid_array = np.column_stack(grid_list) | ||
grid_vector = np.unique(grid_array, axis=1) | ||
values_array = np.column_stack(values_list) | ||
return grid_vector, values_array | ||
if file_extension in {'.gr','.chi'}: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. don't hard-code this here. make globals at the top near the imports, something like Always try and use this structure, it makes the code easier to maintain (if I come up with a new file extension I want to add, I just add it once at the top!)
I think using file extensions for this is probably a bad idea. For example, iq just means I(Q) but it could be x-ray, electron or neutron data in principle. for our files (.iq, sq, gr, cgr) to the extent possible we use a header and metadata about the data is in the header. This is where you try and get default values from. Long is already doing this in the pdfitc code, so maybe try and steal from there. I am not sure if you have access, I can give it to you. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I understand. I will look at the pdfitc code to see if there's a better way of doing this. I will let you know if I am not able to access it. |
||
data_type = 'pdf' | ||
elif file_extension in {'iq','.xy','.xye','.xrd'}: | ||
data_type = 'xrd' | ||
else: | ||
data_type = None | ||
return grid_vector, values_array,data_type |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,43 @@ | ||
import numpy as np | ||
import argparse | ||
|
||
from diffpy.snmf.io import load_input_signals, initialize_variables | ||
from diffpy.snmf.subroutines import * | ||
aajayi-21 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
|
||
def create_parser(): | ||
parser = argparse.ArgumentParser( | ||
prog="stretched_nmf", | ||
description="Stretched Nonnegative Matrix Factorization" | ||
) | ||
parser.add_argument('-v', '--version', action='version', help='Print the software version number') | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. put this at the bottom, it will be rarely used. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I will do that. |
||
parser.add_argument('-i', '--input-directory', type=str, | ||
help="Directory containing experimental data. Has a default value of None which sets the input as your current working directory.") | ||
parser.add_argument('-o', '--output-directory', type=str, | ||
help="The directory where the results will be dumped. Default behavior will create a new directory named 'smnf_results' inside the input directory.") | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
For consistency maybe adopt uniform language (lowers cognitive overload of the user), so if we used I am not sure how you implemented it, but don't dump it off There was a typo in your help string ( There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I have changed the help string and will check for typos |
||
parser.add_argument('-t', '--data-type', type=str, choices=['xrd', 'pdf'], | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think for our purposes, the choices should be Out of interest, how is this info used? If we want this snmf to be useful more broadly (e.g., for Raman or NMR data or X-ray fluoresence or something) we may want to figure out in a more abstract way what these inputs are for and describe them that way. we can use the diffraction case as an example. e.g., if it is used later to determine if the data are sparse or not we may want to mention that here. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I will change the choices and explain what PDF means. The information is used later on. It is used to determine if the data is sparse or not. It also changes the way the algorithm runs. For example, power_diffraction data is not lifted and when the data is powder diffraction the algorithm will run stretched_nmf first and then run sparse_stretched_nmf. |
||
help="The type of the experimental data.") | ||
parser.add_argument('components', type=int, | ||
sbillinge marked this conversation as resolved.
Show resolved
Hide resolved
|
||
help="The number of component signals to obtain from experimental " | ||
"data. Must be an integer greater than 0.") | ||
args = parser.parse_args() | ||
return args | ||
|
||
|
||
def main(): | ||
args = create_parser() | ||
|
||
grid, data_input, data_type = load_input_signals(args.input_directory) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think it is maybe clearer to put the default logic here that turns input_directory from None to cwd. Or else, make it an optional argument in load_input_data. Again, it is about code readability. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I will do this. |
||
if args.data_type is not None: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. for readability I would separate the action of handling the default behavior and calling the function. These are just general comments to make your code more readable, I hope you are taking them on board and reusing later... What I mean is:
if not args.data_type: There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I see what you mean. For example, it's not good practice to have the function call inside the if statement if that's not needed. |
||
variables = initialize_variables(data_input, args.components, args.data_type) | ||
else: | ||
variables = initialize_variables(data_input, args.components, data_type) | ||
|
||
if variables["data_type"] == 'pdf': | ||
sbillinge marked this conversation as resolved.
Show resolved
Hide resolved
|
||
lifted_data = data_input - np.ndarray.min(data_input[:]) | ||
maxiter = 300 | ||
return lifted_data | ||
|
||
|
||
if __name__ == "__main__": | ||
main() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these should probably not be in io.py. I would suggest to make functions somewhere in subroutines that initialize specific arrays then here write something like:
or sthg like that.
In the function in subroutines.py, put into docstring what the arrays are and how the decision were made on how to do it. Currently the code is a bit too hard to read as it is....I think that by
component_amount
, based on earlier conversations, you meannumber_of_components
. I have no idea whatmoment_amount
means, nor why you initialize it with ones and then add some kind of random component, so a docstring could be veyr helpful here (but in subroutines, not in io.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will create that function on a new branch. I will also rename the variables to make it more clear.