Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Collection and resource ID handling altered to be case-insensitive for issue #49. #51

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

pfyvie
Copy link

@pfyvie pfyvie commented Oct 26, 2016

Collection ID case insensitivity appears straightforward, requiring that the regular expression in setupRoutes.js have case insensitivity enabled.

However, resource ID case insensitivity requires a little opinionation as to how IDs are stored internally. A possible solution, proposed here, is to opt to store them lowercase, which requires a little adjustment of IDs in the test objects.

New method tests have been added to confirm case insensitivity of collections alone, resources within collections, and collections when resources are being accessed within.

@pfyvie pfyvie closed this Oct 26, 2016
@pfyvie
Copy link
Author

pfyvie commented Oct 26, 2016

Pardon; unexpected differences between Travis' output and my own output have revealed some additional problems. I'll re-open this PR when they're resolved. :)

@pfyvie pfyvie reopened this Oct 26, 2016
@coveralls
Copy link

coveralls commented Oct 26, 2016

Coverage Status

Coverage decreased (-0.4%) to 99.469% when pulling 684b0de on pfyvie:case-insensitivity into 7e54018 on Wizcorp:master.

@coveralls
Copy link

coveralls commented Oct 26, 2016

Coverage Status

Coverage decreased (-0.4%) to 99.469% when pulling 684b0de on pfyvie:case-insensitivity into 7e54018 on Wizcorp:master.

@ronkorving
Copy link

Thanks for your contribution. I will give you some feedback in-line.

One overarching missing part though (which will hurt existing users) is that this is supposed to be based on how Express has been configured (case sensitive routing vs case insensitive routing). See the original issue for that requirement.

@@ -108,7 +108,7 @@ class ResponseContext {
url += '/' + encodeURIComponent(id);
}

this.addHeader('location', url);
this.addHeader('location', url.toLowerCase());

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the incoming URL is case insensitive, why lower case this?


const reCollection = new RegExp('^' + path + '(\\.[a-zA-Z]+)?/?$');
const reCollection = new RegExp('^' + path + '(\\.[a-zA-Z]+)?/?$', 'i');

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're making this regexp case insensitive, the optional file extension is now double-insensitive.

@@ -21,7 +21,7 @@ class Beer {
}

createId() {
this.id = this.name;
this.id = this.name.toLowerCase();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would indicate that every user of this library would always have to make sure that the IDs they generate are lower case, or the system stops working. That doesn't seem very user friendly.

@pfyvie
Copy link
Author

pfyvie commented Nov 2, 2016

I seem to have misunderstood the original issue discussion. Following the current case sensitivity set in Express would indeed be a friendlier approach, one which I will re-work the contribution to fit.

The requirement that users follow a lowercase resource ID scheme within their own resource classes was unintended and will be corrected. This does raise the related issue of what should be treated as a identifier conflict and how such conflict should be prevented/handled. If a resource class's developer is unsure at implementation time as to whether their class will be used in case-sensitive or case-insensitive mode, they must ensure that the IDs their class generates do not depend on case sensitivity. For example, identifier & Identifier would not conflict in case-sensitive mode, but would in case-insensitive mode. This has potential to produce unintended, unwarned behaviour if a user should wish to run in a different case sensitivity later. (Whether a user should rely on case to distinguish IDs in the first place may not be a matter for this library, only what behaviour such approaches should produce.)

Two main approaches I see are:

  • Accept identifiers of any case scheme in either mode, and use generated IDs as map keys per normal, without alteration. Then, decide on what parameter -> key transform to use in case-insensitive mode (i.e. POSTing a resource that is assigned id:CapitalResource to a case-insensitive instance would persist the element with id:capitalresource) and accept the consequences of any misses that may result if modes are switched. This approach may produce a situation where an item is persisted into a case-sensitive store in case-sensitive mode, but can no longer be accessed if later run in case-insensitive mode.
  • Accept identifiers of any case scheme in either mode, but always transform collection keys to a defined case within Collection.js so that e.g. Identifier and identifier would access the same collection element. This produces identical behaviour in both modes, but prevents a user from using identifier case to distinguish between collection members if for some reason they want to.

The second approach trades away flexibility to buy friendly protection, but removes the point of making case sensitivity configurable at all. The first maintains flexibility but produces a situation where a user may shoot themselves in the foot because Least Surprise suggests case insensitivity in URLs. This is ameliorated somewhat by Express having case insensitivity as the default.

I favour the first option, with proper notes in the README. What do you think, @ronkorving?

@ronkorving
Copy link

I think you're right that our options are limited, and each has its drawbacks. I'm with you, in that option 1 seems preferable. If I understand your approach correctly:

  • If Express is set to be case-insensitive, we lowercase all map keys, though IDs in objects remain untouched.
  • A lookup will be lower cased in order to pull the object out of the collection map.
  • If Express is configured as case sensitive, we don't do any transformations.

The drawback being you can create conflicting objects (during a first load, we could throw an error on that, insisting that users must fix the files manually) if you change config from case sensitive to insensitive. I'm okay with that drawback.

@pfyvie
Copy link
Author

pfyvie commented Nov 13, 2016

Yes, that's a correct summary of the approach. It's turning out much cleaner and friendlier.

Since there's going to be some winding back in the next few commits, how about I close this PR, squash my aggregate changes down to a neater set, and then open a new PR so the final history graph is cleaner?

@pfyvie
Copy link
Author

pfyvie commented Nov 13, 2016

I discovered a behaviour that seems worth clarifying, as it affects ID integrity and will affect what behaviour tests are written for: During a collection-level PUT, resources that don't already exist are created before being added to the collection map. put.js currently calls the resource's constructor here and passes the collection key as the id parameter, meaning that the existing id property of the PUT object(s) is/are overwritten.

For example, given a collection PUT of an item that doesn't already exist:
{ demolen: { id: 'DeMolenOtherId', name: 'De Molen - Hop en Liefde', rating: 5 } }

The object added to the collection map will instead be:
demolen: { id: 'demolen', name: 'De Molen - Hop en Liefde', rating: 5 }

It seems to me that an ID property within a PUT resource should be respected over the collection key that comes with it. (A client should pass a collection where resource IDs and collection keys match, but they might not.) In this case, a PUT of our example:
{ demolen: { id: 'DeMolenOtherId', name: 'De Molen - Hop en Liefde', rating: 5 } }
would instead be committed to the collection map (in case-insensitive mode) as
demolenotherid: { id: 'DeMolenOtherId', name: 'De Molen - Hop en Liefde', rating: 5 }
or (in case-sensitive mode) as
DeMolenOtherId: { id: 'DeMolenOtherId', name: 'De Molen - Hop en Liefde', rating: 5 }

Do you think the current behaviour (ID of created resource during PUT = collection key) should be kept, or changed to prioritise IDs already in submitted resources?

@coveralls
Copy link

coveralls commented Nov 14, 2016

Coverage Status

Coverage decreased (-0.1%) to 99.703% when pulling 50e18ec on pfyvie:case-insensitivity into 284b19b on Wizcorp:master.

@coveralls
Copy link

coveralls commented Nov 14, 2016

Coverage Status

Coverage decreased (-0.1%) to 99.702% when pulling 7a660bf on pfyvie:case-insensitivity into 284b19b on Wizcorp:master.

@ronkorving
Copy link

Rather than closing the PR and opening a new one, feel free to just squash the commits in this PR. I prefer to have the comment history sticking around for other contributors and users so they can understand why decisions have been made the way they have.

Regarding IDs, express-rested does not insist that objects have an id property. Implementations may generate hashes of their data as an ID, or do anything they please. The only 2 behaviors that are set in stone are:

  • to support POST, you need to implement a createId method.
  • your constructor must be able to accept (and probably store as a property) a previously created id.

express-rested has no business looking at a resource.id property (although our unit tests of course may do anything they please).

@pfyvie
Copy link
Author

pfyvie commented Nov 15, 2016

Will do; a squashed version will be force-pushed to this PR's branch. The final should be 4-5 commits long rather than a single squash, as the changes within are easier to read as multiple commits.

If I understand the principle of express-rested not insisting upon an id property being present, does this mean the current collection PUT behaviour is incorrect? That is, when
{ littlecreatures: { name: 'Little Creatures Pale Ale', rating: 5 } }
does not already exist and is collection-PUT, it currently persists as
littlecreatures: { id: 'littlecreatures', name: 'Little Creatures Pale Ale', rating: 5 }
but should persist as
littlecreatures: { name: 'Little Creatures Pale Ale', rating: 5 }?

I've currently changed the PUT behaviour to have an in-object id property (if included) take priority over the key. If the above paragraph's behaviour is what's desired instead of this or the current master branch's behaviour (id written to match key), I can remove the id property being set upon persistence at all.

@ronkorving
Copy link

The old (pre-PR) behavior of PUT should be correct. This PR introduces a read on the id property, which is incorrect (or at least, something I would like to avoid at all cost).

…for ID case sensitivity to base Collection class, updated test cases to reflect forced lowercase in resource key names.
…ion, GET, HEAD and some POST cases (including a noSubRouter option for testing use with bare Express app). Pulled case sensitivity tests out of method test file. Fixed key names of method & indexes test objects, fixed a method test’s item comparisons to be correct on the basis of the items themselves, regardless of key case sensitivity.
@coveralls
Copy link

coveralls commented Nov 16, 2016

Coverage Status

Coverage increased (+0.007%) to 99.851% when pulling c41efd9 on pfyvie:case-insensitivity into 284b19b on Wizcorp:master.

@pfyvie
Copy link
Author

pfyvie commented Nov 16, 2016

I've reverted the behaviour of PUT to pre-PR and adjusted the case sensitivity tests accordingly. The commits have been squashed to a pleasingly compact set.

Of interest: In chasing the very last uncovered line (line 31 of setupRoutes.js), I found that it may never actually be executed. The error event of req in the getBodyStr(req, res, cb) function of setupRoutes.js looks like it should trigger where an invalid chunk of body data is received as part of the request. (When the whole request is plainly invalid e.g. invalid content-type header or missing body, getBodyStr is never reached. When the body is syntactically valid but semantically invalid e.g. invalid JSON, getBodyStr completes OK and parsing errors are thrown elsewhere.)

I created a new request function in HttpClient to perform a POST that would be valid until characters of a different encoding were suddenly injected:

postAbort(t, path, data, cb) {
    const req = request(this.url('POST', path), function (res) {
        respond(t, res, cb);
    });
    req.on('error', function (err) {
        log('HttpClient - Request error:', err);
    });
    req.write(serialize(data), function () {
        log('HttpClient.postAbort - Wrote good data OK.');
        req.write(serialize(data), 'utf16le');
        log('HttpClient.postAbort - Wrote bad data OK.');
        req.end();
    });
}

I also used an alternative version that hijacked the request socket directly and injected latin1 (binary) data mid-way through the UTF-8 request, and other methods such as aborting the request midway through the body.

Express grabs the malformed request error (e.g. HPE_INVALID_CHUNK_SIZE for bad encoding, HPE_INVALID_EOF_STATE for client abort) and hangs up the socket before letting the error bubble further. The error isn't put through req.on('error' because it goes through req.socket.on('error' instead. Even if we listened to req.socket.on, the 400 would never reach the client because Express hangs up the socket before the 400 is written through:

RESTED 26021: HttpClient.postAbort - Wrote good data OK.
RESTED 26021: HttpClient.postAbort - Writing bad data.
RESTED 26021: HttpClient.postAbort - Wrote bad data OK.
RESTED 26021: getBodyStr - Received data: {"example":"this data is good"}
RESTED 26021: { [Error: Parse Error] bytesParsed: 168, code: 'HPE_INVALID_CHUNK_SIZE' }
RESTED 26021: getBodyStr - Error. Responding with 400.
RESTED 26021: HttpClient - Request error: { [Error: socket hang up] code: 'ECONNRESET' }

tl;dr I think it'd be safe to remove these lines in getBodyStr:

req.on('error', function () {
    res.sendStatus(400); // Bad request
});

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants