Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Experiment: Refine ASCII column definitions #857

Draft
wants to merge 1 commit into
base: develop
Choose a base branch
from

Conversation

theory
Copy link
Collaborator

@theory theory commented Jan 5, 2025

Columns that store SHA hash hex strings or URIs can be ASCII or use "C" collation. Update each engine to use the appropriate encoding or collation for such columns.

Issues that remain:

MySQL

MySQL has trouble comparing strings of different types. On 5-5-5.6 it complains:

Illegal mix of collations (utf8mb4_unicode_ci,COERCIBLE) and (ascii_general_ci,IMPLICIT) for operation '='
DBD::MariaDB::db do failed: Illegal mix of collations (utf8mb4_unicode_ci,COERCIBLE) and (ascii_general_ci,IMPLICIT) for operation '='
Trace begun at /home/runner/work/sqitch/sqitch/lib/App/Sqitch/Role/DBIEngine.pm line 712

Cockroach also has this problem, complaining:

psql:/home/runner/work/sqitch/sqitch/lib/App/Sqitch/Engine/cockroach.sql:101: ERROR:  unsupported comparison operator: <collatedstring{en-US-u-va-posix}> = <string>
"psql" unexpectedly returned exit value 3

Oracle

Oracle requires MAX_STRING_SIZE=STANDARD, but it seems to require restarting the server, and I don't know how to do that for the Docker image.

Vertica

Has no obvious method to support column-based collation or encoding.

SQLite

SQLite supports a COLLATE keyword, but it defaults ti BINARY, which is what we'd want for SHA hash hex strings and URIs anyway. We could use NOCASE for SHA sums, but Sqitch always lowercases them anyway.

Future Work

Overall it might be better to provide a way for users to select a collation or encoding to use and then apply it.

But in the end I'm not sure it's worth the effort.

Columns that store SHA hash hex strings or URIs can be ASCII or use "C"
collation. Update each engine to use the appropriate encoding or
collation for such columns.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant