Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Orleans client fails to load contract dll randomly #9214

Open
mastoj opened this issue Nov 2, 2024 · 11 comments
Open

Orleans client fails to load contract dll randomly #9214

mastoj opened this issue Nov 2, 2024 · 11 comments
Labels
area-silos category for all the silos related issues

Comments

@mastoj
Copy link

mastoj commented Nov 2, 2024

I have a small example/POC I'm working on where I want to demonstrate how orleans can make life easier and also more robust.

If you want to go straight to the code it is here: https://github.com/mastoj/monostore/tree/random-error

The setup is that I have an API as a Orleans Client, and then I have two different workers as silos, one for cart and one for product. I also have an API project for cart and product that the main API project references to keep the cart api definitions close to the rest of the cart implementation. The cart/product API as then referencing their own contract folder which defines the contracts for the API and grains.

So basically I have the below for cart (same for product):

   API (Client) -> Cart API -> Contract
   Cart Worker (Silo) -> Contract

When I make a call to the API to create a cart, https://github.com/mastoj/monostore/blob/a9abe53968a5ee17acc929c86a36ea29e8fc7cfd/src/cart/requests/requests.http#L5, it fails randomly with the exception

System.TypeLoadException: Unable to load MonoStore.Cart.Contracts.Grains.ICartGrain,MonoStore.Cart.Contracts from assembly MonoStore.Cart.Contracts
---> System.IO.FileNotFoundException: Could not load file or assembly 'MonoStore.Cart.Contracts, Culture=neutral, PublicKeyToken=null'. The system cannot find the file specified.

The exception is thrown in the API project, so the request never reaches the silo.

Everything is set up with Aspire, but I don't think that should impact how dlls are loaded.

@ReubenBond
Copy link
Member

ReubenBond commented Nov 2, 2024

I can reproduce this. Thank you for putting it together. This is a limitation of heterogenous clusters currently. The workaround is to add all contract assemblies to all silos (all gateways, which is all silos in this case).

The limitation is at the RPC layer. I have a branch to fix this, but it's not in a mergeable state just yet.
After adding the Cart contract reference to the Product service, the request completes successfully:
Image

The reason that it works sometimes even without this is that the client might send the request to a compatible gateway. cc @benjaminpetit: we could change client routing to pick compatible gateways while we prepare the true fix.

@ReubenBond ReubenBond added the area-silos category for all the silos related issues label Nov 2, 2024
@ReubenBond ReubenBond added this to the .NET 10 Planning milestone Nov 2, 2024
@mastoj
Copy link
Author

mastoj commented Nov 2, 2024

Thanks for quick response.

To clarify, I only need to reference the contract, not the grain implementation?

@ReubenBond
Copy link
Member

Yes, that's correct

@mastoj
Copy link
Author

mastoj commented Nov 2, 2024

Had some issues with the orleansdashboard, my guess is that it is related.

@mastoj
Copy link
Author

mastoj commented Nov 2, 2024

I actually have the issue from time to time even after adding the references. I have added a reference to all contracts projects to all my silos and the api. Some times it does work, then all of a sudden it fails. The changes can be seen in this PR: https://github.com/mastoj/monostore/pull/1/files

@mastoj
Copy link
Author

mastoj commented Nov 5, 2024

@ReubenBond , do you maybe know why I still see it. I find it very confusing because after starting up the cluster it can fail for a couple of requests, but when changing the id of the cart I try to create a couple of times it start working.

@ledjon-behluli
Copy link
Contributor

@ReubenBond I just hit this too, and I have a homogenous environment.

@mastoj
Copy link
Author

mastoj commented Dec 19, 2024

My issue is not solved reliably. If I wait a little bit it start working most of the time.

@ledjon-behluli
Copy link
Contributor

My bad, actually my issue relates to #8200

@mastoj
Copy link
Author

mastoj commented Dec 31, 2024

@ReubenBond is it expected that the workaround only would work after a while... I still get the error, but after a couple of seconds the implementation is found.

@miguelhasse
Copy link

@ReubenBond I have a working sample for placement filters where this scenario happens as well in a homogenous environment.
My test code is not in a repo but I can provide it to you if you wish.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-silos category for all the silos related issues
Projects
None yet
Development

No branches or pull requests

4 participants