Addition of `UPPER`, `LOWER`, `STRLEN`, `SUBSTR`, ... #10

gmantele · 2018-07-05T08:25:25Z

I've noticed that ADQL-2.1 offers a function lower(VARCHAR)->VARCHAR.
Why not also having the function upper(VARCHAR)->VARCHAR?

Both functions are supported in all DBMS that I have checked (Postgres, Oracle, MySQL, MS-SQLServer, SQLite, H2, ...).

If, yes, why not also supporting other very common string functions like length (or strlen), trim, replace, substr, instr, ...?

I am asking because now that my ADQL Library (v1.4, but still supporting only ADQL-2.0) really supports reserved keywords, a lot of users and implementers are complaining about the fact that it is not any more possible to have UDF named lower or upper (which are reserved SQL keywords).
In some way, it forces implementers to use a different name which will then make all TAP implementations have a different name for the same function. I do not think it is a good idea to encourage such thing.

Anyway, I am now speaking about these common string functions but I am quite sure that other so-called reserved keywords of ADQL-2.x forbid TAP implementers to have some common useful functions. Maybe it would worth checking these names and see if functions associated with these names exist in most/all DBMS and if yes, integrate them in ADQL-2.1 (or 2.2 if already too late for 2.1).

The text was updated successfully, but these errors were encountered:

gmantele · 2018-07-05T08:33:24Z

@Zarquan answered:

Because lower() works (mostly) with Unicode, but upper() does not work so well for Unicode ?
http://unicode.org/faq/casemap_charprop.html#10

Both are implemented in all DBMS that I have checked. After a quick research with our friend Google, it seems there are issues with unicode characters in most of the DBMS when using upper(...) and even sometimes also with lower(...). From what I have seen, each time it can be fixed by setting the appropriate collate and/or character set in the used database. I think that as long as databases are created in UTF-8, there should not be any important issue.

Besides, it seems it is not a MUST in the ADQL-2.1 draft to be able to support nicely unicode characters (see 4.4.1. LOWER). Already for lower(...), there is a recommendation - and not a requirement - for the algorithm to use when unicode characters are encountered (probably because all DBMS surely have a different - usually unknown - way to deal with such characters):

Since case folding is a nontrivial operation in a multi-encoding world, ADQL requires standard behaviour for the ASCII characters,
and recommends following algorithm R2 described in Section 3.13, "Default Case Algorithms" of The Unicode Consortium (2012)
for characters outside the ASCII set.

So, I am not sure it is the role of ADQL to define how to deal with such special characters. Although I agree, having these functions returning exactly the same string for the same input would be what we want. Then, it would be interested to try with the same string containing several unicode and non unicode characters on all DBMS we want to be supported by ADQL.

There is a science case for having one of the transforms, either upper() or lower(), to transform two strings into the same case before comparing them. There isn't such a strong case for having both upper() and lower().

I agree, but it is still weird to have one but not the other while both are always supported by DBMS. Users will ask at some point why ADQL defines only one of them....and I am not sure they will be very comprehensive if we say it is because some DBMS may not support correctly Unicode characters.

Anyway, if ADQL-2.1 definitely defines only lower(...), I would suggest to add in the IVOA document a comment explaining why only lower(...) and not upper(...)

Maybe, other IVOA members have an opinion on this. I don't know what you planned to speak about at the interop, but maybe you can squeeze this small discussion topic (maybe along with the other possible common functions).

However, as long as someone does the work to prove they are available in some form on all the target platforms then I have no objection to them being included.

I'm sure there will be a few similar name related issues, but in general, yep. For example, I'd prefer to use strlen() rather than length(), to avoid future confusion between string length, array length and byte length.

I agree. strlen seems much more appropriate for language evolutions.

They can be proposed for 2.1, but if they have issues I don't want to hold up 2.1 to include them, so they might get bounced to 2.2 or 2.3.

I completely agree on that. I just ask all of that now because of the delay we generally have before a new version of an IVOA standard ^^ I just wanted to raise this point before 2.1 is released just in case it is a minor thing to add (especially for adding a comment explaining why not upper(...) as well).

msdemlei · 2018-07-05T14:57:09Z

Hi,

On Thu, Jul 05, 2018 at 01:25:25AM -0700, Grégory Mantelet wrote: If, yes, why not also supporting other very common string functions like `length` (or `strlen`), `trim`, `replace`, `substr`, `instr`, ...?

Huh -- these are trivial when the DB supports them, hard otherwise, so I'm not sure I'm a big fan of having them, but...

I am asking because now that my ADQL Library (v1.4, but still supporting only ADQL-2.0) really supports reserved keywords, a lot of users and implementers are complaining about the fact that it is not any more possible to have UDF named `lower` or `upper` (which are reserved SQL keywords).

...if we reserve the names for fairly common operations I think it stands to reason that we should define what we think the semantics should be. Unicode issues aside (and we already have appropriate disclaimers for these), describing the ones Grégory mentions should be relative trivial (I suppose getting inspiration from the postgres documentation would almost do the trick). Having still more optional features is of course ugly -- but for these I don't forsee extra complications for validators, syntax highlighters, or whatever, in particular when names are reserved already.

…

-- Markus

gmantele added the enhancement label Jul 5, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Addition of `UPPER`, `LOWER`, `STRLEN`, `SUBSTR`, ... #10

Addition of `UPPER`, `LOWER`, `STRLEN`, `SUBSTR`, ... #10

gmantele commented Jul 5, 2018

gmantele commented Jul 5, 2018

msdemlei commented Jul 5, 2018 via email

Addition of UPPER, LOWER, STRLEN, SUBSTR, ... #10

Addition of UPPER, LOWER, STRLEN, SUBSTR, ... #10

Comments

gmantele commented Jul 5, 2018

gmantele commented Jul 5, 2018

msdemlei commented Jul 5, 2018 via email

Addition of `UPPER`, `LOWER`, `STRLEN`, `SUBSTR`, ... #10

Addition of `UPPER`, `LOWER`, `STRLEN`, `SUBSTR`, ... #10