-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
25 detect sensitive fields #134
Conversation
Oh, didn't expect it would publish the package on a PR branch, especially w/ failing tests. I used 0.3.0a2 as the version, so doubt anyone will grab this. Not sure what the status of 0.3.0 is... Seems like we're in some sort of alpha still? Code in docx_wrangling & pdf_wrangling is causing the mypy failures. Didn't make any changes to those files. Seems like the changes are from the pdf_context_extract merge... Any ideas if those are crucial and need fixing? I'm not familiar w/ mypy but I can dig into it if it's a priority. |
That GH action is a bit misleading, nothing is published until you make a GH tag. The action still runs to make sure it can package correctly, but it doesn't publish the new version (you can see that here). You can check https://pypi.org/project/formfyxer/#history to see that there's not a new version there. IMO 0.3.0a2 is fine, it still need more testing to make sure it can be used as a dependency without issue (i.e. the problems we had when redeploying RateMyPDF were all issues with FormFyxer). It'd be nice if you could look into the mypy issues, but if you can't, I can fix them this weekend. Those issues pop up because our version of mypy increments, and it gets better at finding mismatching types that could cause issues. |
Realizing the mypy issues might be because it looks like you merged https://github.com/SuffolkLITLab/FormFyxer/tree/pdf_context_extract into your branch, which wasn't in main before. Idk how well reviewed that code is (even though I wrote it, lol). Might be worth getting it, but I'd want to make a separate PR and review for that, which should also fix up some of the mypy issues. I can also do that this weekend. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few things I think you should change, unless @nonprofittechy has a better vision for the feature that I don't know about yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but I agree with Bryce's feedback within the limits of what we have time to do. Also we should take him up on the offer to finish getting tests passing this weekend. You can ping me again if useful or @BryceStevenWilley, you should also be able to add the approving review when you've finished the tests.
So mypy on main and #137 is fixed. Looking a bit closer, I don't think this PR needs #137 though? @codestronger, can you confirm that and if so, drop or revert 2c01ae0 and 9bfea33 from this PR, i.e. https://git-scm.com/book/en/v2/Git-Tools-Rewriting-History? |
Sounds good. I can rebase this PR on the current main. @nonprofittechy merged the pdf_context_extract branch back in Feb and it's been live as far as I'm aware. Others might have pulled it down too. Was it safe to rewrite it out of the history? At minimum, we should merge the new version of the pdf_context_extract before deploying my changes, so that we don't lose any functionality from a deploy. |
b5eb3b3
to
7ec8290
Compare
I don't remember why I merged this branch in February, unfortunately. |
@BryceStevenWilley I'll proceed with the plan that assumes pdf_context_extract was an accidental merge. We discussed the various downstream users of FormFyxer and as far as we can tell, no one is depending on the features in that branch. I also plan to officially take the 0.3.0 version number for this branch. Will also review the dependency updates I made since that was done to bring in everything being used by the code that was deployed. Now that we've separated out pdf_context_extract, maybe we don't need all of those yet. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Diff looks much better to me! +1 for changing to 0.3.01, and +1 for merging.
Will also review the dependency updates I made since that was done to bring in everything being used by the code that was deployed
I think those are still necessary, those dependencies started being used in an earlier commit I think.
Footnotes
-
We were using alphas because this package has been notoriously difficult to use as a dependency, and without proper testing it can cause issues. I think these changes will be tested enough at this point. ↩
… dictionary of sensitive data types, with a list of the matching field names.
Ah, okay! I'll leave the dependencies alone then. In any case, we'll need them in the future. |
Updates for detecting sensitive fields: SuffolkLITLab/RateMyPDF#25