Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH spread MySQL ORM bind parameters #11434

Closed
wants to merge 1 commit into from

Conversation

NightJar
Copy link
Contributor

@NightJar NightJar commented Oct 21, 2024

Removing the rest of the "bit of hackery" as it is now unnecessary.

The ORM rewrite supporting parameterised queries was merged in July 2014 and released in October 2015. The official PHP support at the time was 5.3.2+ which excluded the spread operator from being used (introduced in PHP 5.6), meaning all this variable shuffling had to happen due to the PHP docs note about using call_user_func_array as this code did (but has since been made reudnant by use of the spread operator in 437e53f).

Ten years later none of this is relevant anymore, so we can save some useless data shuffling from every parameterised query by just directly using the spread operator.

Because the rest of the "bit of hackery" is now unnecessary

The ORM rewrite supporting parameterised queries  was merged in July 2014
d8e9af8

and relased in October 2015.
https://github.com/silverstripe/silverstripe-framework/releases/tag/3.2.0

The official PHP support at the time was 5.3.2+
https://web.archive.org/web/20140616033159/http://www.silverstripe.org/system-requirements

Which excluded the spread operator introduce in PHP 5.6 from being used.
https://www.php.net/manual/en/migration56.new-features.php#migration56.new-features.splat

Ten years later none of this is relevant anymore, so we can save some
useless data shuffling from every parameterised query by just directly
using the spread operator.
@emteknetnz
Copy link
Member

In the PR description you've said Which excluded the spread operator introduce in PHP 5.6 from being used

However the spread operator was already being used? $statement->bind_param(...$boundNames);

There's an inline comment that says the hackery is required // Because mysqli_stmt::bind_param arguments must be passed by reference i.e. the leading &

So seems like this PR is removing something that's required

@emteknetnz emteknetnz closed this Oct 22, 2024
@NightJar
Copy link
Contributor Author

NightJar commented Oct 22, 2024

So seems like this PR is removing something that's required

Then why did all the tests pass?

I've adjusted the PR to make use of markdown, hopefully it can be easier to understand. Please reconsider, @emteknetnz

@emteknetnz
Copy link
Member

Unit tests won't cover every edge case.

There's a note on the PHP docs saying Care must be taken when using mysqli_stmt_bind_param() in conjunction with [call_user_func_array()](https://www.php.net/manual/en/function.call-user-func-array.php). Note that mysqli_stmt_bind_param() requires parameters to be passed by reference, whereas [call_user_func_array()](https://www.php.net/manual/en/function.call-user-func-array.php) can accept as a parameter a list of variables that can represent references or values.

Does this matter? I don't have certainty about that. This PR may be probably works fine, though it time it's introducing a risk of regression for seemingly no real benefit.

@NightJar
Copy link
Contributor Author

NightJar commented Oct 23, 2024

I don't know why you're brining this up. The code does not use call_user_func_array - neither before nor after the change.

This PR removes unnecessary code from a low level querying class. If it broke querying (e.g. by triggering a fatal error), I expect nearly every test that uses the database to fail as parameterised queries are preferred by the ORM.

@emteknetnz
Copy link
Member

emteknetnz commented Oct 23, 2024

I'm honestly not sure what the consequences of changing this are. I've done some local testing and this PR does change the behaviour of the method with edge case use. It's outside the normal use of this method however it's public API so a project somewhere may be doing something like this and I don't want to accidentally break someones site

// app/_config.php

use SilverStripe\ORM\DB;
use SilverStripe\ORM\Connect\MySQLDatabase;
use SilverStripe\ORM\Connect\MySQLiConnector;

/**@var MySQLDatabase $database */
$database = DB::get_conn();
/** @var MySQLiConnector $connector */
$connector = $database->getConnector();
/** @var mysqli_stmt $statement */
$statement = $connector->prepareStatement('SELECT * FROM SiteTree WHERE ID = ?', $success);
$id = 1;
$connector->bindParameters($statement, ['i', &$id]);
$id = 2;
$statement->execute();
var_dump($statement->get_result()->fetch_assoc()['Title']);
die;

// without PR = string(4) "Home"
// with PR = string(8) "About Us"

While it's a low-level, there would only be a very small performance benefit from merging this PR, so I'd prefer to leave this as is as presumably when this code was initially written there was some level of understanding and thought put into respecting the variable references

@NightJar
Copy link
Contributor Author

I'm honestly not sure what the consequences of changing this are.

None.

presumably when this code was initially written there was some level of understanding

When this code was written it used call_user_func_array.

@NightJar
Copy link
Contributor Author

I updated the PR message again. Is it clearer now?

@emteknetnz
Copy link
Member

It does, that makes a bit more sense, though there's no functional difference between $statement->bind_param(...$boundNames); and call_user_func_array(array($statement, 'bind_param'), $boundNames);, with the test code above it will still output string(4) "Home", meaning the code that this PR is removing is still doing the thing that was original intended when call_user_func_array() was being used

I don't want to remove code that's doing what's it's supposed to do for so little performance benefit, for instance assuming 1,000 parameterized queries are run on a web request

// app/_config.php

$start = microtime(true);
$parameters = ['abc', 'def', 'xyz'];
for ($j = 0; $j < 1000; $j++) {
    $boundNames = [];
    $parametersCount = count($parameters ?? []);
    for ($i = 0; $i < $parametersCount; $i++) {
        $boundName = "param$i";
        $$boundName = $parameters[$i];
        $boundNames[] = &$$boundName;
    }
}
$end = microtime(true);
$diff = $end - $start;
var_dump($diff);
die;

// float(0.00021719932556152344) // or 0.2 milliseconds

@NightJar
Copy link
Contributor Author

NightJar commented Oct 24, 2024

With the test code above the author is explicitly manipulating the value before excecution, otherwise there is no reason to pass by reference.
I don't think that's a use case to be worried about.

I would argue that it should change the output, and file it as a bug that it doesn't.

And Silverstripe runs on average 50 million* queries per request.
(*Hyperbole - lets just say it's very carefree with database trips)

Change the target branch to 6 if you're that worried 🤷

@NightJar
Copy link
Contributor Author

So this is just a hard "no" then?

@emteknetnz
Copy link
Member

I don't think it's worth targeting at 6. While this PR is likely safe, the potential measured upside is minuscule. Websites making ~1K database queries per page view is probably considered quite bad? I can't imagine it being something like 100K queries which would bump the time saving up to 20ms which would be noticeable. If that were the case then the webpage would probably be taking about 30 seconds to render at that point.

I don't think we should be removing code that appears to be doing something which presumably whoever wrote it understands its purpose better than I do, unless there's a good reason to.

@NightJar
Copy link
Contributor Author

🤷‍♂️ OK.

@andrewandante
Copy link
Contributor

I don't understand why this can't be pointed at 6 - feels like there's enough time to battle-test it before it gets released. The code comment literally describes this as hackery; it feels like the risk here is being vastly over-inflated.

To your other point, there are plenty of unoptimised websites in the world, on plenty of cheaply-rented servers. There are dev environments, there are setups so complex we couldn't even comprehend. Surely any improvement, at such a low risk, should be embraced here, rather than rebuffed?

I'm all for safety and caution, but this feels like overkill, and has the potential to discourage open-source contributions, when something so seemingly innocuous - with a considerable lead time - is immediately closed.

@emteknetnz
Copy link
Member

The code comment literally describes this as hackery

The code comment is

 // Because mysqli_stmt::bind_param arguments must be passed by reference
 // we need to do a bit of hackery

The code comment is saying we NEED to do the hackery because the arguments MUST be passed a certain way.

Look it's obviously ugly looking code, though it's apparently doing something so I'd rather just leave it there. This PR isn't fixing a bug and there's no real performance benefit, so it seems like purpose of this PR was just to remove some ugly looking code.

@andrewandante
Copy link
Contributor

andrewandante commented Oct 30, 2024

The comment, and code, is from 10 years ago. The "need" has the context of the time, the "hackery" is still true 😄 I'd note that @NightJar has done all the hard work in figuring out the impact and implications of this change already.

Is "cleaning up code and a minor performance improvement" not a high enough bar for contribution? I'm not sure what your argument is there. Would this be accepted if there was a test case that alleviated your fear that it is still "doing something"?

@emteknetnz
Copy link
Member

Is "cleaning up code and a minor performance improvement" not a high enough bar for contribution?

That's a good reason to contribute.

In this case though I'm really not sure about that what's been submitted, it seems like it's possibly regressive.

As mentioned in my testing above, the behaviour of the method has changed as a result of this PR, meaning the "redundant code" that's been removed isn't redundant, it's actually doing something which it's supposed to do (at least it intends to do something it's supposed to).

I did validate in my testing that it didn't matter if call_user_func_array() or if the spread operator was used, so the hackery appears to still be doing what it originally intended to do 10 years ago. So I can't simply say this code is redundant.

Now, if instead that hackery has been doing the WRONG thing this whole time, then this PR is fixing a bug and should be merged into 5.2. However, I'm not seeing what the bug here is. For that we really need a submitted issue with replication steps first, and the submitted pull-request should have a unit-test to validate the fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants