Sim: Faster type coercion in `toID` #10619

larry-the-table-guy · 2024-10-14T20:42:10Z

Derivative of #10606

Stats

4*1024*1024 calls per row.

# uniq ... is a running total.
All others are per batch.

Stats during npm run full-test

id === ''	id === text	not new text	# uniq text	# uniq id
0.05%	64.66%	94.40%	8758	4789
0.01%	60.48%	97.89%	8834	4847
0.00%	40.75%	100.00%	9011	5009
0.58%	42.14%	100.00%	9069	5058
0.14%	47.23%	100.00%	9069	5058
0.00%	50.29%	100.00%	9069	5058
0.00%	39.74%	100.00%	9069	5058
0.00%	43.55%	100.00%	9070	5058
0.00%	35.38%	100.00%	9070	5058
0.00%	15.77%	100.00%	9070	5058
0.00%	52.12%	100.00%	9070	5058
0.00%	52.12%	100.00%	9070	5058
0.00%	52.07%	100.00%	9070	5058
0.00%	45.55%	100.00%	9070	5058
0.00%	30.35%	100.00%	9071	5058
0.00%	32.78%	100.00%	9072	5058
0.00%	55.63%	100.00%	9072	5058
0.00%	42.64%	100.00%	9072	5058
9.35%	40.16%	100.00%	9077	5061
1.14%	59.77%	100.00%	9087	5070
0.00%	47.81%	100.00%	9087	5070
0.00%	47.68%	100.00%	9087	5070
0.00%	48.33%	100.00%	9087	5070
0.00%	48.32%	100.00%	9087	5070
0.00%	48.25%	100.00%	9087	5070
0.00%	49.06%	100.00%	9087	5070
0.00%	49.30%	100.00%	9087	5070

About 100 million total calls.
Pretty high proportion of id === text, which means IDs are being passed to toID a lot.

profiling code

let empty_count = 0;
let id_same_as_text_count = 0;
const uniq_text = new Set();
let dup_text_count = 0;
const uniq_id = new Set();
let call_count = 0;
/** snip */
export function toID(text: any): ID {
        call_count++;
        if (typeof text !== 'string') {
                if (text) text = text.id || text.userid || text.roomid || text;
                if (typeof text === 'number') text = '' + text;
                else if (typeof text !== 'string') return '';
        }
        const id = text.toLowerCase().replace(/[^a-z0-9]+/g, '') as ID;

        if (text === '') ++empty_count;
        if (id === text) ++id_same_as_text_count;
        if (uniq_text.has(text)) ++dup_text_count;
        uniq_text.add(text);
        uniq_id.add(id);
        if (call_count === (1024 * 1024 * 4)) {
                console.log(); // the extra newline and letters are for grepping
                console.log("a", empty_count / call_count, id_same_as_text_count / call_count);
                console.log("b", dup_text_count / call_count, uniq_text.size, uniq_id.size);
                id_same_as_text_count = dup_text_count = call_count = empty_count = 0;
        }

        return id;
}

larry-the-table-guy · 2024-10-14T20:53:23Z

Ok, technically, this breaks for boolean and others. But there's no good reason to be passing those in the first place imo. Simple fix is to do '' + text more broadly, but I'll just put this on the backburner.

Edit: aaand now I remember the original code doesn't handle bool anyway. Sigh. any is not a good type for a parameter...

Slayer95 · 2024-10-14T23:16:27Z

The reason why any is the parameter type, is that toID is often used as a handy sanitizer. So, if an untrusted peer sends, say, a JSON object where a contained value is expected to be a non-empty string, then by using toId the whole validation can be simplified a lot. Supporting numbers is a nice extension to round up the concept of identifier.

So, the fact that toID is often called with arbitrary data as its parameter brings about my comment yesterday regarding the huge complexity that changing callsites not to pass IDs to toID would entail.

larry-the-table-guy · 2024-10-14T23:20:53Z

I understand that's the intent but based on #10549 (comment)
The vast majority of the callsites already know they have a string.
And some know they have an ID, yet call methods like DexSpecies.get.

The function's contract is just unnecessarily weak.
IMO type coercion should have been separate from the lower+replace logic.
But it's not worth changing now.

Slayer95 · 2024-10-14T23:30:19Z

The vast majority of the callsites already know they have a string.

That's true.

And some know they have an ID, yet call methods like DexSpecies.get
But it's not worth changing now.

Yea. Changing that means that each of these functions would need a strict and a lax variant, thus doubling mental overhead from function names. And god forbid refactoring bugs, and having to think about Hidden Power *.

larry-the-table-guy · 2024-10-14T23:51:24Z

mental overhead

To me, the more pressing source of that is how many states the system can be in.
More possible types -> more states.
It's also why I think the lazy loading in Dex is meh. It adds another state to consider. I mean, at this very moment, there's a bug in DexStats - it reads dex.gen before that's been written.

I think a simple system comes from predictable data flow. Lazy loading and any types just don't help with that, IMO.
/endrant

Anyway, the fact that I can't meaningfully profile this ATM indicates I should be working on other problems.

Zarel · 2024-10-29T12:20:21Z

For the record, I do admit it was probably a mistake to use this function in hot code. I don't mind a separate nameToID function for known strings. toID probably shouldn't be used in inner loops at all, although that might be a lost cause at this point. Blame... uh, the lack of any way to enforce type safety when I first wrote Showdown, I guess.

Sim: Faster type coercion in toID

b4a6d37

larry-the-table-guy marked this pull request as draft October 14, 2024 20:52

larry-the-table-guy mentioned this pull request Oct 27, 2024

Test: Correct several format IDs #10633

Merged

Return early on numbers in toID

f462716

larry-the-table-guy force-pushed the sim/toID-faster-type-coercion branch from 51f29d3 to f462716 Compare November 1, 2024 01:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sim: Faster type coercion in `toID` #10619

Sim: Faster type coercion in `toID` #10619

larry-the-table-guy commented Oct 14, 2024

larry-the-table-guy commented Oct 14, 2024 •

edited

Loading

Slayer95 commented Oct 14, 2024 •

edited

Loading

larry-the-table-guy commented Oct 14, 2024

Slayer95 commented Oct 14, 2024 •

edited

Loading

larry-the-table-guy commented Oct 14, 2024 •

edited

Loading

Zarel commented Oct 29, 2024

Sim: Faster type coercion in toID #10619

Are you sure you want to change the base?

Sim: Faster type coercion in toID #10619

Conversation

larry-the-table-guy commented Oct 14, 2024

Stats

larry-the-table-guy commented Oct 14, 2024 • edited Loading

Slayer95 commented Oct 14, 2024 • edited Loading

larry-the-table-guy commented Oct 14, 2024

Slayer95 commented Oct 14, 2024 • edited Loading

larry-the-table-guy commented Oct 14, 2024 • edited Loading

Zarel commented Oct 29, 2024

Sim: Faster type coercion in `toID` #10619

Sim: Faster type coercion in `toID` #10619

larry-the-table-guy commented Oct 14, 2024 •

edited

Loading

Slayer95 commented Oct 14, 2024 •

edited

Loading

Slayer95 commented Oct 14, 2024 •

edited

Loading

larry-the-table-guy commented Oct 14, 2024 •

edited

Loading