25 detect sensitive fields #134

codestronger · 2024-06-13T04:59:39Z

Updates for detecting sensitive fields: SuffolkLITLab/RateMyPDF#25

codestronger · 2024-06-13T05:38:44Z

Oh, didn't expect it would publish the package on a PR branch, especially w/ failing tests. I used 0.3.0a2 as the version, so doubt anyone will grab this. Not sure what the status of 0.3.0 is... Seems like we're in some sort of alpha still?

Code in docx_wrangling & pdf_wrangling is causing the mypy failures. Didn't make any changes to those files. Seems like the changes are from the pdf_context_extract merge... Any ideas if those are crucial and need fixing? I'm not familiar w/ mypy but I can dig into it if it's a priority.

BryceStevenWilley · 2024-06-13T12:19:50Z

That GH action is a bit misleading, nothing is published until you make a GH tag. The action still runs to make sure it can package correctly, but it doesn't publish the new version (you can see that here). You can check https://pypi.org/project/formfyxer/#history to see that there's not a new version there. IMO 0.3.0a2 is fine, it still need more testing to make sure it can be used as a dependency without issue (i.e. the problems we had when redeploying RateMyPDF were all issues with FormFyxer).

It'd be nice if you could look into the mypy issues, but if you can't, I can fix them this weekend. Those issues pop up because our version of mypy increments, and it gets better at finding mismatching types that could cause issues.

BryceStevenWilley · 2024-06-13T12:41:03Z

Realizing the mypy issues might be because it looks like you merged https://github.com/SuffolkLITLab/FormFyxer/tree/pdf_context_extract into your branch, which wasn't in main before. Idk how well reviewed that code is (even though I wrote it, lol). Might be worth getting it, but I'd want to make a separate PR and review for that, which should also fix up some of the mypy issues. I can also do that this weekend.

BryceStevenWilley

A few things I think you should change, unless @nonprofittechy has a better vision for the feature that I don't know about yet.

CHANGELOG.md

formfyxer/lit_explorer.py

nonprofittechy

LGTM, but I agree with Bryce's feedback within the limits of what we have time to do. Also we should take him up on the offer to finish getting tests passing this weekend. You can ping me again if useful or @BryceStevenWilley, you should also be able to add the approving review when you've finished the tests.

BryceStevenWilley · 2024-06-17T02:36:19Z

So mypy on main and #137 is fixed. Looking a bit closer, I don't think this PR needs #137 though? @codestronger, can you confirm that and if so, drop or revert 2c01ae0 and 9bfea33 from this PR, i.e. https://git-scm.com/book/en/v2/Git-Tools-Rewriting-History?

codestronger · 2024-06-17T04:47:43Z

So mypy on main and #137 is fixed. Looking a bit closer, I don't think this PR needs #137 though? @codestronger, can you confirm that and if so, drop or revert 2c01ae0 and 9bfea33 from this PR, i.e. https://git-scm.com/book/en/v2/Git-Tools-Rewriting-History?

Sounds good. I can rebase this PR on the current main. @nonprofittechy merged the pdf_context_extract branch back in Feb and it's been live as far as I'm aware. Others might have pulled it down too. Was it safe to rewrite it out of the history? At minimum, we should merge the new version of the pdf_context_extract before deploying my changes, so that we don't lose any functionality from a deploy.

…icket.

nonprofittechy · 2024-06-17T15:04:54Z

I don't remember why I merged this branch in February, unfortunately.

codestronger · 2024-06-17T21:20:01Z

I don't remember why I merged this branch in February, unfortunately.

@BryceStevenWilley I'll proceed with the plan that assumes pdf_context_extract was an accidental merge. We discussed the various downstream users of FormFyxer and as far as we can tell, no one is depending on the features in that branch.

I also plan to officially take the 0.3.0 version number for this branch. Will also review the dependency updates I made since that was done to bring in everything being used by the code that was deployed. Now that we've separated out pdf_context_extract, maybe we don't need all of those yet.

BryceStevenWilley

Diff looks much better to me! +1 for changing to 0.3.0¹, and +1 for merging.

Will also review the dependency updates I made since that was done to bring in everything being used by the code that was deployed

I think those are still necessary, those dependencies started being used in an earlier commit I think.

We were using alphas because this package has been notoriously difficult to use as a dependency, and without proper testing it can cause issues. I think these changes will be tested enough at this point. ↩

… dictionary of sensitive data types, with a list of the matching field names.

codestronger · 2024-06-18T05:20:33Z

Diff looks much better to me! +1 for changing to 0.3.01, and +1 for merging.

Will also review the dependency updates I made since that was done to bring in everything being used by the code that was deployed

I think those are still necessary, those dependencies started being used in an earlier commit I think.

Footnotes

We were using alphas because this package has been notoriously difficult to use as a dependency, and without proper testing it can cause issues. I think these changes will be tested enough at this point. ↩

Ah, okay! I'll leave the dependencies alone then. In any case, we'll need them in the future.

codestronger requested review from colarusso, BryceStevenWilley and nonprofittechy June 13, 2024 04:59

BryceStevenWilley reviewed Jun 13, 2024

View reviewed changes

nonprofittechy reviewed Jun 14, 2024

View reviewed changes

BryceStevenWilley mentioned this pull request Jun 17, 2024

Fix mypy typing issues #136

Merged

codestronger added 2 commits June 16, 2024 22:00

Detect sensitive fields.

8a9f073

Rename to "Driver's License Number" for consistency w/ the original t…

7ec8290

…icket.

codestronger force-pushed the 25-detect-sensitive-fields branch from b5eb3b3 to 7ec8290 Compare June 17, 2024 05:01

BryceStevenWilley approved these changes Jun 18, 2024

View reviewed changes

Change sensitive fields to sensitive data types. We will now return a…

9205985

… dictionary of sensitive data types, with a list of the matching field names.

codestronger merged commit c50a8a0 into main Jun 19, 2024
3 checks passed

codestronger deleted the 25-detect-sensitive-fields branch June 19, 2024 18:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

25 detect sensitive fields #134

25 detect sensitive fields #134

codestronger commented Jun 13, 2024

codestronger commented Jun 13, 2024

BryceStevenWilley commented Jun 13, 2024

BryceStevenWilley commented Jun 13, 2024

BryceStevenWilley left a comment

nonprofittechy left a comment

BryceStevenWilley commented Jun 17, 2024

codestronger commented Jun 17, 2024

nonprofittechy commented Jun 17, 2024

codestronger commented Jun 17, 2024

BryceStevenWilley left a comment

codestronger commented Jun 18, 2024

Footnotes

25 detect sensitive fields #134

25 detect sensitive fields #134

Conversation

codestronger commented Jun 13, 2024

codestronger commented Jun 13, 2024

BryceStevenWilley commented Jun 13, 2024

BryceStevenWilley commented Jun 13, 2024

BryceStevenWilley left a comment

Choose a reason for hiding this comment

nonprofittechy left a comment

Choose a reason for hiding this comment

BryceStevenWilley commented Jun 17, 2024

codestronger commented Jun 17, 2024

nonprofittechy commented Jun 17, 2024

codestronger commented Jun 17, 2024

BryceStevenWilley left a comment

Choose a reason for hiding this comment

Footnotes

codestronger commented Jun 18, 2024

Footnotes