Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add FieldMask utilities to Message types #1505

Merged
merged 33 commits into from
Aug 16, 2024

Conversation

pouyayarandi
Copy link
Contributor

@pouyayarandi pouyayarandi commented Nov 18, 2023

Summary

This PR adds a bunch of utilities that enable message types to modify some of their fields (by Google well-known type, FieldMask). Here are added APIs:

extension Message {

  /// Checks whether the given path is valid for Message type.
  public static func isPathValid(_ path: String) -> Bool

  /// Merges fields specified in a FieldMask into another message.
  public mutating func merge(with source: Self, fieldMask: Google_Protobuf_FieldMask) throws

  /// Removes from 'message' any field that is not represented in the given FieldMask.
  public mutating func trim(fieldMask: Google_Protobuf_FieldMask) -> Bool
}

extension Google_Protobuf_FieldMask {

  /// Initiates a field mask with all fields of the message type.
  public init<M>(allFieldsOf messageType: M.Type)

  /// Initiates a field mask from some particular field numbers of a message.
  public init<M>(fieldNumbers: [Int], of messageType: M.Type) throws

  /// Adds a path to FieldMask after checking whether the given path is valid.
  public mutating func addPath<M>(_ path: String, of messageType: M.Type) throws

  /// Converts a FieldMask to the canonical form.
  public var canonical: Google_Protobuf_FieldMask { get }

  /// Creates a union of two FieldMasks.
  public func union(_ mask: Google_Protobuf_FieldMask) -> Google_Protobuf_FieldMask

  /// Creates an intersection of two FieldMasks.
  public func intersect(_ mask: Google_Protobuf_FieldMask) -> Google_Protobuf_FieldMask

  /// Creates a FieldMasks with paths of the original FieldMask that does not included in mask.
  public func subtract(_ mask: Google_Protobuf_FieldMask) -> Google_Protobuf_FieldMask

  /// Returns true if the given path is covered by the FieldMask.
  public func contains(_ path: String) -> Bool

  /// Checks whether the given FieldMask is valid for type M.
  public func isValid<M>(for messageType: M.Type) -> Bool
}

Implementation

This feature is empowered by _ProtoNameProviding to map proto field numbers to names. This enables us to have the aforementioned inits of FieldMask. Also, for masking a message two Decoders have been implemented: GetPathDecoder and SetPathDecoder. The first one gets the value of a field by its path, the latter sets a value to a field of the message by the path. In override function, value will be captured by GetPathDecoder and then, it will be set by the SetPathDecoder.

Use case

Assuming we have a caching system which prevent some fields of a response from recalculation. Backend gives us a proto message and tells that some of its fields should be cached (i.e. field number 1). Next time, when we are going to make a request, we evaluate our client-side's cache and send the cached field numbers in the header of the request (i.e. we send back 1 as a field number that its value is already cached). As a result, backend response with leaving those fields empty (or filled with default values). And in client-side, we need to modify those field numbers with cached values.

For this purpose, first of all, we can store the original backend response somewhere, and in the rest of requests we can override the responses with the cached proto message:

// Defines a mask field which includes particular paths based on field numbers
let mask = Google_Protobuf_FieldMask(fieldNumbers: cachedFieldNumbers, of: MyMessage.self)

// Get response by telling backend not to recalculate those fields and backend leave those fields unset
var response: MyMessage = await network.getResponse(exclude: cachedFieldNumbers)

// Get cached response from storage
let cachedResponse: MyMessage = cache.getCache()

// Merge those fields of response with the cached one.
try response.merge(to: cachedResponse, fieldMask: mask)

Testing

Unit tests have been added for the new feature, and all tests have passed.

Considerations

  • Merge options and trim options have not been implemented as they require some generated code.
  • Dot pattern is supported in paths. Path a.b refers to field b of the field a in the message.
  • Merges will occur in order of FieldMask's paths.

@thomasvl
Copy link
Collaborator

At quick glance you seem to have two distinct proposals in here (they don't seem connected in implementation), if that's the case, I'd suggest splitting this into two PRs so there can be easier discussion on them independently. It likely would help to also provide some examples of what usecases you are trying to enable.

@pouyayarandi pouyayarandi changed the title Add proto reflection Add applying methods generator Nov 20, 2023
@pouyayarandi
Copy link
Contributor Author

Hi @thomasvl, the protobuf mirroring feature has been deleted from this PR, so the current one is a proposal just for applying. Also the use case section is added to the description.

@thomasvl
Copy link
Collaborator

Handing off to Apple folks.

Idle thought - Since Decoder is public, could this entirely implemented via a custom decoder that feeds field numbers and then the values via the decoding apis? Then there wouldn't have to be any more codegen to support it? The library code itself would likely be bigger. But if that is possible, this could be entirely done as a layer on top of SwiftProtobuf and not actually need library additions.

Copy link
Member

@FranzBusch FranzBusch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So far we have not exposed any of the actual underlying proto definition in our generated code such as proto message name, field numbers or similar AFAIK. This would be diverting from this. Doesn't mean that I am against that but @thomasvl and @tbkka you did most of the work on SwiftProtobuf and are probably better at judging this.

I am also wondering if this is something that other languages do provide.

@pouyayarandi pouyayarandi changed the title Add applying methods generator Add applying methods Nov 27, 2023
@tbkka
Copy link
Collaborator

tbkka commented Nov 27, 2023

I just saw the new implementation as a standalone decoder. That has the big advantage of not requiring any additional code generation, so it won't increase generated code size and can therefore be available by default everywhere. This is the right way to implement this functionality.

The real question, then, is whether this new functionality is right for this particular library. As a rule, we do try to stick closely to the protobuf specification and precedents set by other implementations. So I have a couple of questions to help better understand how this proposal relates to other prior art:

  • How would you do this in the C++ protobuf implementation? Java? Python? Go? C#?

  • Can the FieldMask concept that's in the protobuf specification be used for this kind of application? (We have never fully implemented FieldMask in SwiftProtobuf; if that could provide this support, then it might be worth implementing that instead.)

Related: Your ProtobufMirror idea should be implementable as a new encoder, again without requiring any new code generation.

@pouyayarandi
Copy link
Contributor Author

@tbkka Thanks for your response.

  • How would you do this in the C++ protobuf implementation? Java? Python? Go? C#?

The concept is the same in all the languages you've mentioned but when it comes to implementation, we would see different APIs based on the nature of each language. As I found in most of them, they use Descriptor and Reflection. Here are some examples:

  • Can the FieldMask concept that's in the protobuf specification be used for this kind of application? (We have never fully implemented FieldMask in SwiftProtobuf; if that could provide this support, then it might be worth implementing that instead.)

Maybe, but AFAIK, it just helps us to tell the backend which fields should be excluded, when the client receives the response we have to replace somehow a bunch of values with those ones we've cached before. Here we need an API to set values by field numbers.

Related: Your ProtobufMirror idea should be implementable as a new encoder, again without requiring any new code generation.

The ProtobufMirror feature had been implemented by visitors and it had no generated code.

@thomasvl
Copy link
Collaborator

I believe FieldMask usually also has helpers that work with copying things from one message to another message - "Copy just these fields". And by having it within the library it would have access to the field names data that currently isn't exposed.

To really make these Apply methods useful would likely be figuring out how to include the field numbers in the generated files for folks. My initial thought would be an enum doesn't really help because you want Int values and you don't want to keep having to do rawValue. So likely just Int constants scoped in some way. But unlike a C based languages where an enum of just using #define is zero runtime cost, I'm not sure what would work for Swift to make the field number constant not add to binary size since would be public and thus if things get linked into a framework they would all become exported symbols.

@tbkka
Copy link
Collaborator

tbkka commented Nov 28, 2023

Here we need an API to set values by field numbers.

There are a number of inter-related features here that I have in mind. Since many of these can be implemented in terms of the others, it's worth thinking about them together to figure out how we want to provide them:

  • Descriptors. SwiftProtobuf does not currently embed descriptors into generated messages, mostly due to concerns about binary size.
  • Setting values by field number and/or name. You've shown how we could implement each of these directly with the current decoder APIs.
  • FieldMask utilities. We currently only have support for constructing FieldMask messages. We also want someday to have utilities that use a FieldMask to: remove certain fields from a message; keep only certain fields; copy certain fields from one message to another. (This would be another approach for implementing some of the field caching you describe.). We'd also like utilities that can construct and/or modify FieldMasks: build a FieldMask with every field in a particular message, build a FieldMask from a list of field numbers, intersect or union two FieldMasks.

If we had APIs to query, set, and clear fields by name and by number, those could be used to implement the FieldMask utilities. They could also be used as building blocks for Descriptor-based utilities. I think those APIs could all be built using the techniques you've been exploring.

(And coincidentally, in #1504, Max is proposing a different way to organize the traversal logic that might make the above more efficient.)

@pouyayarandi
Copy link
Contributor Author

pouyayarandi commented Nov 28, 2023

To sum up, we'd better implement some utilities as extensions on the FieldMask and Messages. Here are some of those utils I found helpful:

extension Message {

  /// Returns a field mask with all fields of the message type.
  static var fieldMask: Google_Protobuf_FieldMask

  /// Builds a field mask from some particular field numebers of a message
  /// - Parameter fieldNumbers: Field numbers of paths
  /// - Returns: Field mask that include paths of corresponding field numbers
  static func fieldMask(fieldNumbers: [Int]) -> Google_Protobuf_FieldMask

  /// Builds a field mask by excluding some paths
  /// - Parameter paths: Paths to be excluded
  /// - Returns: Field mask that does not include the paths
  static func fieldMask(excludedPaths paths: [String]) -> Google_Protobuf_FieldMask

  /// Builds a field mask by reversing another mask
  /// - Parameter mask: Mask to be reversed
  /// - Returns: Field mask that does not include the `mask` paths
  static func fieldMask(reverse mask: Google_Protobuf_FieldMask) -> Google_Protobuf_FieldMask

  /// Clears masked fields and keep the other fields unchanged
  /// - Parameter mask: Field mask which determines what fields shoud be cleared
  mutating func mask(by mask: Google_Protobuf_FieldMask)

  /// Clears all fields of a message except particular fields refered in mask
  /// - Parameter mask: Fields should be ignored in clearance
  mutating func mask(except mask: Google_Protobuf_FieldMask)

  /// Overrides value of masked fields in original message with the input message
  /// - Parameters:
  ///   - message: Message which overrides some fields of the original message
  ///   - mask: Field mask which determines what fields shoud be overriden
  mutating func override(with message: Self, by mask: Google_Protobuf_FieldMask)
 
  func masked(by mask: Google_Protobuf_FieldMask) -> Self
  func masked(except mask: Google_Protobuf_FieldMask) -> Self
  func overriden(with message: Self, by mask: Google_Protobuf_FieldMask) -> Self
}

extension Google_Protobuf_FieldMask {
  func union(_ mask: Google_Protobuf_FieldMask) -> Self
  func intersection(_ mask: Google_Protobuf_FieldMask) -> Self
}

If you agree with these APIs, I could implement it. Please let me know if you have any other idea. @tbkka @thomasvl

@thomasvl
Copy link
Collaborator

I haven't had a chance to really think about these or compare to other apis; but one general question that does come to mind:

Would the static functions make more sense to be in an extension on Google_Protobuf_FieldMask and have them take a Message.Type (instead of being on Message)?

I'm not sure which makes for a more swifty api, but maybe something to think about. They may actually make more sense to developers as convenience initializers?

@tbkka
Copy link
Collaborator

tbkka commented Nov 29, 2023

Can these all be implemented with the information we generate today? It seems like the name map might have everything you need for this? Or does this require something like #1504 first, in order to expose the field information in a more easily-digestible form?

@pouyayarandi
Copy link
Contributor Author

pouyayarandi commented Nov 29, 2023

Can these all be implemented with the information we generate today?

Yes it could be implemented already by _ProtoNameProviding. But obviously something like #1504 makes it much easier

@pouyayarandi
Copy link
Contributor Author

@tbkka @thomasvl New implementation is added for some field mask utils. Note that some other utils like (intersecting, union, etc.) would better to get implemented separately.

@pouyayarandi pouyayarandi changed the title Add applying methods Add FieldMask utilities Dec 3, 2023
Copy link
Collaborator

@thomasvl thomasvl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Read back through everything to try and catch back up on the code.

Sources/SwiftProtobuf/Message+FieldMask.swift Show resolved Hide resolved
Sources/SwiftProtobuf/Message+FieldMask.swift Outdated Show resolved Hide resolved
Sources/SwiftProtobuf/NameMap.swift Show resolved Hide resolved
Sources/SwiftProtobuf/PathDecoder.swift Outdated Show resolved Hide resolved
Sources/SwiftProtobuf/PathDecoder.swift Outdated Show resolved Hide resolved
Sources/SwiftProtobuf/PathVisitor.swift Outdated Show resolved Hide resolved
@pouyayarandi
Copy link
Contributor Author

Comments are fixed

Copy link
Collaborator

@thomasvl thomasvl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty good, kicked off the CI on the current state of things.

@tbkka can you take a look again to provide final decision since it is a new addition to the library.

@thomasvl
Copy link
Collaborator

@pouyayarandi looks like atleast one unittest test is failing.

@pouyayarandi
Copy link
Contributor Author

I changed my code a little bit, but I couldn't reproduce the failing test on my local machine and all tests passed. So it may fail again. Can you please re-run pipeline? @thomasvl

@pouyayarandi
Copy link
Contributor Author

pouyayarandi commented May 16, 2024

It failed again 😢
Do you have any idea why it's happening? I run make test on my machine and everything seems okay. In addition, in some checks the failing test passes
Is there any way to run the exact check on my machine to see the error?

@thomasvl
Copy link
Collaborator

It failed again 😢 Do you have any idea why it's happening? I run make test on my machine and everything seems okay. In addition, in some checks the failing test passes Is there any way to run the exact check on my machine to see the error?

So it's only failing on linux?

You could try using Docker to use do a local linux build (can't say I have any experience with that).

The failure seems to be around group merging? You could try making a temporary test case out of it to try and zero in on things. i.e. - make a test that just merges two messages with the fields in question, but doesn't use the FieldMask support, does that work as intended? Ensure the calculated mask has what you expect, etc.

You can also tweak .github/workflows/build.yml to have it run on your branch name (and disable all but one of the checks needed), and enable CI on your own repo, then, then you can simply upload and let github test for you. If you do this, maybe clone this branch into another one while you test things out, so each upload doesn't also update this PR, and go back to this branch when you've been able to figure it out.

@pouyayarandi
Copy link
Contributor Author

Bug fixed @thomasvl

Problem was about singularGroup field of the message. Since this field name is .unique which means its proto name and json name differs, it caused the bug. I made some changes which just allow using proto names for merging purposes. Please re-run the tests they must fully passed now same as my forked repo.

@thomasvl
Copy link
Collaborator

Bug fixed @thomasvl

Problem was about singularGroup field of the message. Since this field name is .unique which means its proto name and json name differs, it caused the bug. I made some changes which just allow using proto names for merging purposes. Please re-run the tests they must fully passed now same as my forked repo.

Make sure you are synced up to HEAD on main, there as a change around groups as part of the Protobuf Editions work a few weeks ago, so more forms are accepted in TextFormat. So that could impact things.

@pouyayarandi
Copy link
Contributor Author

Make sure you are synced up to HEAD on main

I fixed the bug on another branch which I rebased with synced fork of your main and then, cherry-picked the fix commit here. So I guess it should be synced with the latest changes

@pouyayarandi
Copy link
Contributor Author

Hi @thomasvl, just for follow-up; do you have any updates?

@thomasvl
Copy link
Collaborator

Hi @thomasvl, just for follow-up; do you have any updates?

I think we're still waiting on the Apple folks (@tbkka most likely) to look since this is a new addition to the library.

Copy link
Collaborator

@tbkka tbkka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good! The code is well-organized and easy to read. There are a few places where performance could be improved, but we can iterate on that later if that proves to be a problem for people.

I have just a few small suggestions:

  • Inline a couple of trivial helpers
  • A couple of small naming adjustments to clarify the public API.

It would also be good to update Documentation/API.md with information about this, but I think that should just be a separate PR.

Sources/SwiftProtobuf/Message+FieldMask.swift Outdated Show resolved Hide resolved
Sources/SwiftProtobuf/Message+FieldMask.swift Outdated Show resolved Hide resolved
Sources/SwiftProtobuf/Message+FieldMask.swift Outdated Show resolved Hide resolved
@pouyayarandi
Copy link
Contributor Author

Thank you @tbkka, all comments were fixed

@tbkka tbkka merged commit 3a4c97d into apple:main Aug 16, 2024
10 checks passed
@tbkka
Copy link
Collaborator

tbkka commented Aug 16, 2024

Merged! I'm looking forward to the next PR that updates Documentation/API.md with information about how people can use this. ;-)

@thomasvl
Copy link
Collaborator

@pouyayarandi thanks again for all the work here!

@thomasvl
Copy link
Collaborator

thomasvl commented Aug 20, 2024

@pouyayarandi & @tbkka – Not sure why I didn't think of it before, but this morning I got wondering if with the new info, should any of the old initializers get revisited? Specifically –

  1. Should the proto name based ones get deprecated to provide failable ones that do validation?
  2. The JSON one has a call out around validation, can that now be done?

@tbkka
Copy link
Collaborator

tbkka commented Aug 20, 2024

There are two kinds of validation we might do and a couple of different places to do it:

  • We could validate that the paths are syntactically well-formed. That is, reject paths = ["😀"]
  • We could validate that the paths are valid for a particular message type

We can do any of the above validation in a few different places:

  • We could assert in initializers
  • We could have validating initializers that throw when they detect a problem
  • We could validate when the FieldMask is serialized or deserialized
  • We could validate when the FieldMask is used to mask a particular message

I think that asserting/validating initializers are probably the least important here, though they might simplify some use cases. It's more important to validate when (de)serializing or masking.

@thomasvl
Copy link
Collaborator

I think that asserting/validating initializers are probably the least important here, though they might simplify some use cases. It's more important to validate when (de)serializing or masking.

I was about to say I don't think we can do this one, but it just dawned on me, I guess the initializers can't do it either. The problem is we don't know the Message the proto names/JSON names are supposed to match, we'd need all new initializers that also take a Message.Type, the current free string forms can't really do anything I think sort of the paths = ["😀"] type issues. 😦

@tbkka
Copy link
Collaborator

tbkka commented Aug 20, 2024

People need to be able to build FieldMask protos without validating against a message type. (Because there are always going to be cases where the message type that's going to be masked isn't actually on the local system.)

We must validate against a message type when we use the FieldMask against an actual message. I think in most cases, that's actually the right place to do that validation.

@thomasvl
Copy link
Collaborator

Yea so I guess this work doesn't change anything. But what we could do is validated all the strings to ensure they are valid field names according to the proto language spec.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants