Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

blobs: prevent NULL contentType in db #1355

Open
wants to merge 17 commits into
base: master
Choose a base branch
from

Conversation

alxndrsn
Copy link
Contributor

@alxndrsn alxndrsn commented Jan 13, 2025

Set blob."contentType" values to application/octet-stream if no mime type is supplied on upload.

Related: #1351

What has been done to verify that this works as intended?

CI.

Why is this the best possible solution? Were any other approaches considered?

Discussion on #1351.

How does this change affect users? Describe intentional changes to behavior and behavior that could have accidentally been affected by code changes. In other words, what are the regression risks?

This will change Content-Type headers for files uploaded without one from the string null to the string application/octet-stream.

Does this change require updates to the API documentation? If so, please update docs/api.yaml as part of this PR.

Yes.

Before submitting this PR, please make sure you have:

  • run make test and confirmed all checks still pass OR confirm CircleCI build passes
  • verified that any code from external sources are properly credited in comments or that everything is internally sourced

alxndrsn added 2 commits January 13, 2025 06:20
Set blob."contentType" values to application/octet-stream if no mime type is supplied on upload.

Related: getodk#1351
@alxndrsn alxndrsn marked this pull request as draft January 13, 2025 06:26
@alxndrsn alxndrsn marked this pull request as ready for review January 13, 2025 07:27
@alxndrsn alxndrsn requested review from ktuite and lognaturel January 13, 2025 07:50
@alxndrsn alxndrsn changed the title blobs: prevent default contentType column blobs: set default contentType value in db Jan 13, 2025
@alxndrsn alxndrsn changed the title blobs: set default contentType value in db blobs: set default contentType in db Jan 13, 2025
@alxndrsn alxndrsn changed the title blobs: set default contentType in db blobs: prevent NULL contentType in db Jan 13, 2025
.eslintrc.json Outdated Show resolved Hide resolved
test/db-migrations/utils.js Outdated Show resolved Hide resolved
test/db-migrations/utils.js Outdated Show resolved Hide resolved
@lognaturel
Copy link
Member

The outcome here looks as I would expect it. Would the migration possibly be time-consuming if an install has a lot of blobs? Or would it only be the case if there are lots of blobs with null content types?

@lognaturel lognaturel removed their request for review January 21, 2025 22:24
@alxndrsn
Copy link
Contributor Author

The outcome here looks as I would expect it. Would the migration possibly be time-consuming if an install has a lot of blobs? Or would it only be the case if there are lots of blobs with null content types?

What would be realistic numbers?

I've run on a fresh db, and got the following numbers:

total row count null row count migration duration / ms
10,000 2 822
10,000 2 812
10,000 2 785
10,000 1 760
10,000 1 738
10,000 0 760
10,000 0 708
10,000 0 702
10,000 0 698
10,000 0 695
100,000 5 1,457
100,000 16 1,456
100,000 14 1,461
100,000 12 1,409
100,000 10 1,531
100,000 7 1,473
1,000,000 97 31,096
1,000,000 93 45,222
1,000,000 108 11,535

@lognaturel
Copy link
Member

What would be realistic numbers?

There's a very big range, including up to the several millions so we would want to document this as possibly time-consuming in the release notes. It feels to me like the way to set the default values is really inefficient but I'm not sure whether there's an alternative - https://github.com/getodk/central-backend/pull/1355/files#r1925735636

@alxndrsn
Copy link
Contributor Author

It feels to me like the way to set the default values is really inefficient

How about this?

  UPDATE blobs SET "contentType"='application/octet-stream' WHERE "contentType" IS NULL;
  ALTER TABLE blobs
    ALTER COLUMN "contentType" SET DEFAULT 'application/octet-stream',
    ALTER COLUMN "contentType" SET NOT NULL

New results:

total row count null row count migration duration / ms
10,000 0 594
10,000 0 618
10,000 1 585
10,000 1 590
10,000 1 594
10,000 1 595
10,000 2 598
10,000 3 582
10,000 3 605
10,000 4 589
100,000 11 685
100,000 12 617
100,000 13 621
100,000 6 608
100,000 6 612
100,000 7 646
1,000,000 105 1,013
1,000,000 105 989
1,000,000 114 1,270

@alxndrsn
Copy link
Contributor Author

@lognaturel if it's helpful, we could include an additional test for the speed of the migration, e.g. this checks that 1,000,000 can be migrated in less than 5s:

https://github.com/alxndrsn/odk-central-backend/compare/application-octet-instead-of-null-content-type...alxndrsn:odk-central-backend:application-octet-instead-of-null-content-type-timing-test?expand=1

@lognaturel
Copy link
Member

Sweet, thanks! I don't feel like we need to have a test since we don't change migrations and I can't imagine we'll layer more onto this one. With these new timings I don't think we need to document anything about the migration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants