Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NIFI-12270: Add new provenance event: UPLOAD #8094

Closed
wants to merge 5 commits into from

Conversation

Lehel44
Copy link
Contributor

@Lehel44 Lehel44 commented Nov 30, 2023

Summary

NIFI-12270
See discussion

Tracking

Please complete the following tracking steps prior to pull request creation.

Issue Tracking

Pull Request Tracking

  • Pull Request title starts with Apache NiFi Jira issue number, such as NIFI-00000
  • Pull Request commit message starts with Apache NiFi Jira issue number, as such NIFI-00000

Pull Request Formatting

  • Pull Request based on current revision of the main branch
  • Pull Request refers to a feature branch with one commit containing changes

Verification

Please indicate the verification steps performed prior to pull request creation.

Build

  • Build completed using mvn clean install -P contrib-check
    • JDK 21

Licensing

  • New dependencies are compatible with the Apache License 2.0 according to the License Policy
  • New dependencies are documented in applicable LICENSE and NOTICE files

Documentation

  • Documentation formatting appears as expected in rendered files

@mattyb149
Copy link
Contributor

For uploading local files isn't that what the SEND is for? Or is this to upload from an external location into an external system without ever storing locally? If that's the use case please update the Jira so it's clearer what the new event type is for, thanks!

@joewitt
Copy link
Contributor

joewitt commented Dec 21, 2023

Similar to Matt's comment why does a SEND event not cover this case sufficiently?

In this case it is that NiFi is effectively being used to take a given external reference and send that data to another system all without having stored locally (within nifi) the bytes. Never the less NiFi is the system that sends the data it seems and that is the purpose of the SEND event. A send without having had a CREATE/RECEIVE is fair game now given this paradigm... that is ok.

I think we should avoid adding an event type unless the need/case is really strong.

@pvillard31
Copy link
Contributor

This PR is following @markap14 's recommendation in this email thread:
https://lists.apache.org/thread/hwt87hz4679vbmtv9yg78dw38jrwc861

@joewitt
Copy link
Contributor

joewitt commented Dec 21, 2023

Ahh that discussion context is helpful - would make sense to have that already linked in either the JIRA itself or this PR. If it was there and I missed it sorry about that. Otherwise, we need to put links to such good/useful discussions on these to help others who might review later.

To the issue at hand, I do not share the view we should create a different event type here. The meaning of 'SEND' is that NiFi is responsible for having taken the bits of a thing and sent it to some external system. That is precisely what is happening here. Now there is value in knowing as well then that NiFi itself never actually stored the bits. This is known though already from the provenance trail of this object. Presumably it never had a CREATE/RECEIVE and the bytes will of course have never been populated other than perhaps a link/reference.

Given there was already a decent discussion and some apparent consensus and clearly this was a lot of work I am not saying I am a -1 on this but it does add more code/more maintenance/more semantics and I'm not clear what the value add will be in that sense. If the issue of size is being worked around here then ideally we'd have a different/simpler answer.

If we continue with this as-is the description of a SEND should ensure devs use that when sending content from within NiFi to an external system and an upload only when sending external content to an external system. Because as written in teh docs/guide right now UPLOAD looks like a subcase of SEND which it is not.

Copy link
Contributor

@markap14 markap14 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution, @Lehel44! Overall, I think it's reasonable to add the new UPLOAD event but see inline comments about implementation details.

@Lehel44
Copy link
Contributor Author

Lehel44 commented Dec 21, 2023

@joewitt I added the discussion link to the Jira, thanks for pointing out. The main issue here was that the SEND event used the size of the incoming FlowFile but instead of a copy of this FlowFile an external file was uploaded and sent to the external system, therefore its size should have been represented instead of the FlowFile's size. From the documentation: "SEND: Emits a Provenance Event of type that indicates that a copy of the given FlowFile was sent to an external destination.". The UPLOAD event would describe an external resource being uploaded to an external destination. While the SEND event automatically uses the size of the FlowFile argument when creating the event, the UPLOAD takes a FlowFile for necessary event attributes (flowfile UUID etc.), but also takes a FileResource object which represents the external resource and uses its attributes when constructing the event: for instance the size and the path are will be displayed in the provenance event.

Copy link
Contributor

@exceptionfactory exceptionfactory left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update @Lehel44. I noticed that FileResource was changed from an interface to a class, but this is a breaking change at the nifi-api level, so it should be reverted. I also recommend only adding a minimal number of new methods to ProvenanceReporter, instead of all possible versions.

@Lehel44
Copy link
Contributor Author

Lehel44 commented Jan 24, 2024

Thanks for the review @markap14 @exceptionfactory. Is this one good to go in?

Copy link
Contributor

@exceptionfactory exceptionfactory left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working through the feedback and scoping down the number of new methods @Lehel44, the latest version looks good. +1 merging

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants