-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
corrections for small samples #37
Comments
Hi, have you looked into Logomaker's pseudocount correction parameter in the function alignment_to_matrix(), when creating a matrix? Please take a look and let me know if that's the type of correction you meant. logomaker/logomaker/src/matrix.py Line 467 in 76aae02
|
Hi, atareen Thanks for your reply. Setting pseudocount = 0 does partly solve my question. It forces probability calculation to use characters only in each column rather than including additional random (maybe?) characters. When the columns have few gaps, it worked well. However, if a column in an aligned sequence has many gaps, it will generate an extremely high probability for the characters. Setting pseudocount = 0.1 or a higher number can reduce the probability, but it will include some other characters like the default. Is there a way to calculate the probability for each character in a column by including the gaps, but without adding pseudocounts? I tried to look at the parameters in the logomaker.alignment_to_matrix function, but did not figure out a solution. The Weblogo tool does not have this problem.
|
Hi, I think I'll need to see an example regarding what you're asking for, with code some and synthetic/artificial data, to be able to help. Can you provide an example or notebook? |
Hi, atareen Please find the attached files. It has six files: python code.txt contains the codes for generating a logo. aa_WebLogo.png is generated by Weblogo (https://weblogo.berkeley.edu/logo.cgi) using the same sequence above. Thank you. |
When handling a small number of sequences, Logomaker takes into account all characters from all columns, which generates less meaningful outputs.
Is there a way to add corrections for this?
For example, Weblogo (https://weblogo.berkeley.edu/logo.cgi) has an option for Small Sample Correction.
The text was updated successfully, but these errors were encountered: