Replies: 18 comments
-
Originally yes when I thought we could replace the keys with ULID, this was incorrect. However, after the improved understanding, I still feel like the keying system can be made more efficient through content-hashing. Is my explanation of asset cache incorrect or incomplete? |
Beta Was this translation helpful? Give feedback.
-
(Sorry removed my comment because I thought this was an old issue :) @VinhTruongAmbient answer was to that comment) |
Beta Was this translation helpful? Give feedback.
-
How would the hashing trait work for f32? the Additionally: how would we guarantee sufficient entropy for the hashed u128? Since it is not pure random there is a high chance that collisions occur |
Beta Was this translation helpful? Give feedback.
-
@VinhTruongAmbient Hm so honestly this just seems "different" rather than strictly "better" to me. But the bigger problem here is that I don't see what real-world problem we're addressing here? You mentioned performance (cpu and memory), but I don't see any data to back that up? "Optimizing performance" without measuring is not something we should indulge in. There's a million different ways to do things, so I'm hesitant to change things unless it's very clear it's actually making something better. |
Beta Was this translation helpful? Give feedback.
-
We might need to create our own hash trait which can hash 128 bits is a lot of entropy so collisions should be exceedingly rare, but if we wanted to be really paranoid a 256-bit would definitely be enough. In my previous company, we had databases with billions of elements with SHA256 as the key. |
Beta Was this translation helpful? Give feedback.
-
We're also doing things like loading a model, which points to a material, which points to a texture, all of which is loaded through the asset cache right now and de-duplicated (you can load the same model and/or material and/or texture from multiple locations in code and it still resolves to the same load), which I don't think content-addressed could solve at all (it would need a separate system, which would bring us back to square one). |
Beta Was this translation helpful? Give feedback.
-
I do think this improves how the developer interacts with the asset cache, for example not needing to go through As for performance, I think we have different viewpoints on what it means to "optimize performance". I wouldn't count this as actual optimization but rather it's about not doing work that doesn't give anything beneficial to us unless I am missing something from my understanding. As for providing data, I can set up a benchmark that really stresses the asset cache, but this feels like a pointless exercise because the cost of string formatting and allocations are real given that we understand how the processor and memory work. I grant that the title says "more efficient" which maybe is focused too much on the performance, but the points in the larger posts still hold. |
Beta Was this translation helpful? Give feedback.
-
I think this should scenario behave the same right? Because the |
Beta Was this translation helpful? Give feedback.
-
My question was rather the entropy of the hash functions, as (especially if we write our own) has functions tend to sacrifice some entropy/ for speed. Using or reusing the existing hasher would be very problematic since the hashing algorithm and stabilty and usage of quality over speed is subject to change. Using random with u128 is fine, but not string hashing or similar |
Beta Was this translation helpful? Give feedback.
-
Ah, I wasn't suggesting we write our own hash function, but just the trait that uses some other hash implementation to do it. There should be a plenty of options out there we can experiment with. SHA-256 works for sure since that's what I have used before for gigantic datasets, but the performance is not great compared to non-cryptographic hashes. |
Beta Was this translation helpful? Give feedback.
-
Should we perhaps convert this to a discussion? |
Beta Was this translation helpful? Give feedback.
-
Ah ok, I see, I misunderstood the proposal then; so basically this is just to move from debug to hashing for asset keys and to cache the result of that hashing? |
Beta Was this translation helpful? Give feedback.
-
Yeah, that's right. Sorry for being misleading, I updated my original post for clarification. |
Beta Was this translation helpful? Give feedback.
-
Hm but if that's the case isn't this just about performance optimization then? The dev ux right now is just that you need to implement Debug, with the proposed it would be to implement Hash (and probably PartialEq?) so doesn't seem that different? |
Beta Was this translation helpful? Give feedback.
-
Yeah, I didn't intend this to be a hugely different API that forces significant rewrites. I think the change can be implemented pretty easily because it's pretty much the same idea as Debug-formatting your struct and using the formatted String as the key. I do think the fact that the proposed u128 hash type is Also, I proposed that you should be able to keep this u128 hash key when you insert the asset for the first time. Any subsequent accesses to the asset cache can then re-use the same key and skip the hashing routine. Yes, it improves performance, but why do the same work again when you don't need to 🙂. |
Beta Was this translation helpful? Give feedback.
-
Alright, so in that case there are two things to consider here: First; I'm not 100% sure but I think you might get collisions if you just use the hash. I suspect that you would anyway need to be able to fall back to a full PartialEq, but then you'll have to bring the key with you (so you won't get Copy anyway). Second; On the performance part, I want to make sure we don't get into the habit of assuming things when it comes to performance. That this is called a "More efficient AssetCache keying system" or that it talks about "Clearly inefficient" is premature; we really don't know that at this point. This suggestion is in my mind a hypothesis or an idea that might improve performance, and we should talk about it in those terms; i.e. "Here's something I think we should test to see if we can improve performance". Until we've measured I don't think we should ever say that something is actually better. |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
Side-topic: we are currently assuming that a specific key always maps to a specific value, but this assumption might be false if, for example, the file under the URL has changed. I don't know if the plan is to implement change detection (for hot reloading) into |
Beta Was this translation helpful? Give feedback.
-
AssetCache
introductionAssetCache
is one of the core storage abstractions which a piece of data can be persisted into. Note the word "Asset" has no relation to the "Asset Pipeline", because this asset cache can store anything, including the handle to the GPU instance, an HTTP client, and so on. Accessing this cache is going to happen frequently enough that it is worth considering more efficient solutions.To load something into the cache, one needs to create a type that implements
SyncAssetKey
orAsyncAssetKey
trait. This trait implements the interface to the cache. To form the actual key, these traits assume that the type which implements*AssetKey
trait also implements theDebug
trait. The asset cache then uses callsformat!("{self:?}")
to create the key. The key itself is going to be a dynamically allocatedString
of the length of theDebug
formatting output. Finally, theString
key is wrapped into anArc
, yielding yet another dynamic allocation, presumably to make it more convenient to use in async contexts.An alternative
An alternative way to do keying is to use content hashing instead. We can keep all the
*AssetKey
types mostly intact, but make them implement a hashing trait that produces anu128
for example. Thisu128
should be returned to the user so any future accesses to the cache don't have to re-hash the key. We could also make it strongly typed by new-typingu128
into something like:This fixes all three issues listed above.
Note that we are not hashing the value that the key points to, we are hashing the key itself. For example:
In this scenario, the
TextureFromUrl
itself is hashed, and this hash is then used to interact with the asset cache.Singleton resources
This presents an interesting issue for "singleton assets" like the GPU itself. The current solution is to Debug-format a unit struct like
pub struct GpuKey;
, which returns the string"GpuKey"
. We could workaround it by doing the same thing by hashing the stringGpuKey
and everything continues to work. It does however feel a bit silly when these singleton resources should be directly accessible (without hashmap lookup), but that's a separate issue.Beta Was this translation helpful? Give feedback.
All reactions