-
-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MutableMap
and similar collections are not guaranteeing the type of the collection
#502
Comments
Fixing this would mean a huge performance penalty, and a BC break that requires a major release. I personally do not use collections often, so i'm not really sure. @veewee wdyt? |
cc @tkquizlet |
From a static analysis POV is not a BC but a bugfix. It even breaks when you do I can try a couple of test implementations and benchmark them 😊 |
I must agree that this behaviour has bitten me in production as well on regular arrays and if we can fix it, that would be nice. However I do agree that this is a BC break because people might get unexpected exceptions on code that worked just fine previously in a minor version. Also the performance consideration might be a blocker for people heavily using the maps: we should avoid that this basic building block becomes slow and therefore unusable. Is there some smart way we could enforce this without too much of an overhead of actually parsing the types? |
I was not thinking about parsing types at all, but to maintain two inner lists, one for the key and another for the value. I think it should not be such a traumatic overhead |
That would mean you have to iterate internally to find the value for a key, which is not very performant and not what you'dd expect from a map. |
The alternative would be to use a non-cryptographic hashing algorithm and do the hash table in userland. I'm very aware that it will cause performance degradation. Still, if performance to that point were my issue, I would probably not rely on this library but on the DS extension or something similar. |
I was thinking about iterating the collection with a foreach, as the key exposed would be casted anyway, but it seems like it wouldn't through generators: https://3v4l.org/ucV2I The problem here is that calling the Fixing some of this things would be breaking changes, but there are some issues that could be fixed while being compatible! |
I was thinking: we could always prefix the key of the internal array with a specific string and store the actual (key, value) as a tuple instead. That would fix the issue and could limit the performance impact as well. However. I don't think we can really fix |
This is quite easy to implement as mentioned via a hash, example: $keys: array<string, K>;
$values: array<string, V>;
set:
$hash = compute_hash($key);
$this->keys[$hash] = $key;
$this->values[$hash] = $value;
get:
$hash = compute_hash($key);
return $this->values[$hash];
contains:
$hash = compute_hash($key);
return isset($this->values[$hash]);
remove:
$hash = compute_hash($key);
unset($this->keys[$hash], $this->values[$hash]);
to_iterator:
foreach($this->keys as $hash => $key) {
yield $key => $this->values[$hash];
}
to_array:
iterator_to_array($this->toIterator());
we could also have the hasher be configureable, so one could do: however, this is going to introduce a "big" performance overhead. another option: implement this in C or Rust and use FFI 😅 ( this is a semi-joke, because i do want to use FFI in Psl, but making it a required extension would be bad for a lot of people. ) |
I had the same hashing idea but I fear the performance too, and the tuple is a great idea indeed! ~~There's a downside: if you declare a $vector->add(1, "foo");
$vector->add("1", "bar"); I was thinking: how slow can be to serialize an int or a string, and use the serialized string as the key?~~ Forget it, I was heading out of the hospital and I missed out some parts of the post. |
I can do a benchmark and a test implementation 😊 |
Describe the bug
Suppose you define a map that only allows
string
as key and set a string that PHP would cast to an integer as an array key (any integer-ish string that doesn't start with zero). In that case, it will return aninteger
instead of the promised type by the static analysis.To Reproduce
Expected behavior
I would expect this output:
I wonder if you think that this should be implemented differently to avoid such bugs as I do. I prefer a slightly slower/less memory-efficient but correct, runtime-safe implementation.
The text was updated successfully, but these errors were encountered: