Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implement ExtPos for fixed expressions; is CCONJ possible for "as opposed to", "rather than", "instead of"? #530

Closed
nschneid opened this issue Jun 3, 2024 · 28 comments

Comments

@nschneid
Copy link
Contributor

nschneid commented Jun 3, 2024

The Core Group decided it would be a good idea for treebanks to specify how each fixed expression functions externally via ExtPos in the MISC column.

This is already implemented for a few expressions in EWT. We might as well expand to all of them. If the external deprel is correct, it can be used to infer the ExtPos (which is one of ADV, ADP, CCONJ SCONJ).

@AngledLuffa
Copy link
Contributor

Can you give a bit more explanation on what ExpPos means in this case or how the external deprel will be represented?

@nschneid
Copy link
Contributor Author

nschneid commented Jun 3, 2024

External POS: https://universaldependencies.org/en/feat/ExtPos.html

For example, "instead" is individually an ADV, but where it attaches as mark, it is due to the fixed expression "instead of" acting as SCONJ. So in those cases it would receive ExtPos=SCONJ.

@amir-zeldes
Copy link
Contributor

(BTW this has also been implemented in GUM)

@bguil
Copy link

bguil commented Jun 3, 2024

BTW2: In v2.14, most of the treebanks that use ExtPos put the ExtPos feature in the FEATS column.
This includes the SUD native corpora, English-EWT, UD_Portuguese-Bosque and UD_Portuguese-GSD.

For consistency, it would be nice to have the same policy in others such as English-GUM.

@nschneid
Copy link
Contributor Author

nschneid commented Jun 3, 2024

For EWT I've just moved it to MISC following @dan-zeman's statement that FEATS should be reserved for properties of individual words, not larger units.

@amir-zeldes
Copy link
Contributor

Yes, it's in MISC in GUM for the same reason.

@nschneid
Copy link
Contributor Author

nschneid commented Jun 3, 2024

New issue about standardizing ExtPos at the universal level: UniversalDependencies/docs#1037

@nschneid
Copy link
Contributor Author

nschneid commented Jun 4, 2024

Implemented in the above commit.

I've made some small updates to the English fixed docs: see #317.

One question:

  • He couldn't tell when things were becoming more unstable as opposed to less.
    • What's the correct ExtPos here? The give an example of "as opposed to" as case, but I'm wondering if this is CCONJ-like, cf. "rather than"

@nschneid
Copy link
Contributor Author

nschneid commented Jun 4, 2024

I think "as opposed to" is like "rather than"—its coordination vs. prepositional function depends on context.

  • We were eating (as opposed to drinking): cc (directly connects contrasted elements of like categories)
  • We were eating, as opposed to Sam, who was drinking: case
  • ?As opposed to eating, we were drinking.

@nschneid
Copy link
Contributor Author

@amir-zeldes OK to add CCONJ as a possibility for "as opposed to"?

@amir-zeldes
Copy link
Contributor

I'm not totally sure, but why do we need it to be cc syntactically? I mean, how is it different from like/unlike:

  • Eating, like drinking, is not allowed
  • Eating, unlike drinking, is not allowed
  • Eating and drinking is not allowed

But we don't think "like/unlike" is a cc, right?

@nschneid
Copy link
Contributor Author

My intuition is that "as opposed to" readily connects a wider range of phrase types, and does not always require a prosodic break:

  • AdjPs: The wall is now pale green as opposed to bright yellow.
  • Advs: He is now speaking boldly as opposed to timidly.
  • PPs: Leave the box inside the door as opposed to on the porch.

"Rather than" can be substituted. Having trouble coming up with "like"/"unlike" examples that are good parallels.

@amir-zeldes
Copy link
Contributor

Mm, maybe it's headed that way, but I think it's still not quite a CC yet, since it can be pre-poned. OntoNotes example:

  • As opposed to the $ 1.4 million deficit of the 1987 - 88 season , the 1988 - 89 year concluded with a $ 200,000 surplus

@nschneid
Copy link
Contributor Author

I'm not arguing it's exclusively CCONJ. We concluded that "rather than" is sometimes CCONJ, sometimes ADP/SCONJ.

@nschneid
Copy link
Contributor Author

Here's another piece of evidence—coordination of transitive VBGs in the progressive construction:

  • He is ripping rather than/as opposed to carefully slicing the bread.
  • *He is ripping unlike carefully slicing the bread.

@amir-zeldes
Copy link
Contributor

I'm not arguing it's exclusively CCONJ

Right, I'm just thinking if we allow it to be CCONJ, then this case would probably be tagged that way:

  • The 1988 - 89 year concluded with a $ 200,000 surplus as opposed to the $ 1.4 million deficit of the 1987 - 88 season

But as the original example shows, this case is actually invertible, which suggests it isn't CCONJ. Because it's hard to tell if a case is invertible, and because I think underlyingly this is syntactically still case, I would rather play it safe and just tag it uniformly as case/mark, and not leave room for inter-annotator inconsistencies which would come in if we allow two analyses.

He is ripping unlike carefully slicing the bread.

Yeah, I'm not saying unlike has the same distribution, just that I'm not clear on when "as opposed to" is frontable, which, at least when it is, suggests it's not CCONJ. In that respect it's similar to "unlike", which is also often frontable.

@nschneid
Copy link
Contributor Author

@amir-zeldes
Copy link
Contributor

TBH probably only in that it was put on the list during SD already. If someone had asked me I would have probably raised the same objection about it ("rather than" is very often frontable):

  • Rather than a cat, I decided to get a dog

That shouldn't be CCONJ either IMO.

@nschneid
Copy link
Contributor Author

The rule we decided in #182 was to use case/mark only for the fronted ones, and cc for the others.

@amir-zeldes
Copy link
Contributor

Right - since it was already on the fixed list, I thought it was better to at least have the fronted cases as case/mark, and I still think that's better than some weird inverted conj or something. But in an ideal world, I think neither of these should be conj, so I think the fewer of these things we have the better.

@nschneid
Copy link
Contributor Author

nschneid commented Jul 11, 2024

I just worry that saying it is never CCONJ will paint us into a corner and produce some strange structures.

Here's one I found in COCA:

  • the mother is almost always the caregiver, she rather than the father usually teaches the child what lessons it learns at this stage

Hard for me to avoid reading that with "she rather than the father" as a constituent, and nmod seems wrong as personal pronouns resist most types of PP modification. It could be phrased as a parenthetical (so parataxis I guess), or maybe "rather than the father" can be moved to the clause level with the speaker counting on the pragmatics to be clear, but neither of those fits the word order and punctuation given.

Another one:

  • Regardless of setting, self-talk was employed most frequently during as opposed to prior to or post practice and competition.

"During" is a (transitive) preposition. Prepositions can't take PP modifiers right? Looks to me like the equivalent of [during and not [before or after]] practice.

Also, in terms of paraphrasing, the "as opposed to Y" part can be moved to the end of the clause, but it relies on heavy inference that "during" is in focus:

  • self-talk was employed most frequently during practice and competition, as opposed to prior to or post.

Fronting just sounds like a bizarre misuse of ellipsis:

  • *As opposed to prior to or post, self-talk was employed most frequently during practice and competition.

One more, with adjectives:

  • Election cycles tend to emphasize short-term as opposed to long-term performance.
  • *As opposed to long-term, election cycles tend to emphasize short-term performance.

(UPDATE) OK I lied, one more with determiners/quantifying-adjectives:

  • with a focus on few rather than many artists

@amir-zeldes
Copy link
Contributor

I think these arguments apply to things like "instead of" too -

  • she, instead of the father, usually teaches the child
  • self-talk was employed most frequently during instead of prior to or post practice
  • Election cycles tend to emphasize short-term instead of long-term performance

Would you want "instead of" to also be a CC? I think these are just nmod/obl + case as appropriate - the simpler, more consistent the analysis the better IMO. Just saying "instead of" ->case, "as opposed to" -> case makes it easier to teach and validate.

As for fronting, I think the reason it sounds weird in "As opposed to prior to or post, self-talk was employed.." is that the elided repeated argument of prior does not occur on first mention. This would be much better as:

  • As opposed to prior to practice, self talk was employed post-practice

@nschneid
Copy link
Contributor Author

nschneid commented Jul 11, 2024

Yeah good point, "instead of" patterns similarly to "rather than" and "as opposed to" (there may be slight differences though, I'm not sure).

You really think "during instead of before practice" would be nmod(during, before)? I don't know that we have precedent for an nmod between two prepositions. They are not nominals but rather markers-of-nominals.

@nschneid nschneid changed the title implement ExtPos for fixed expressions implement ExtPos for fixed expressions; is CCONJ possible for "as opposed to", "rather than", "instead of"? Jul 11, 2024
@amir-zeldes
Copy link
Contributor

No, I'd say obl since it's not a nominal. You could also see it as a kind of promotion I guess...

@nschneid
Copy link
Contributor Author

But we wouldn't use a promotion analysis for "during or before practice". Apart from coordination and a few other exceptions, the universal guidelines speak against function words having dependents. The "promotion by head elision" examples given there are pretty different because they do not involve a dependency between two function words.

Anyway for now I'll stick with the existing trees when assigning ExtPos, but I do think the fixed guidelines need to be more consistent for these expressions, and I'm not convinced that "if it starts with a preposition and can introduce an adjunct it can never be CCONJ" is the right way to go. See also "as well as".

@AngledLuffa
Copy link
Contributor

To resurrect this whole thing - I hadn't finished implementing it in PUD. Going back to look at that, I found this example for "such as":

# text = The witching hour starts up at least by the time the scary organ is played, such as in the 60s hit Monster Mash.
17      such    such    ADJ     JJ      Degree=Pos      22      case    22:case _
18      as      as      ADP     IN      _       17      fixed   17:fixed        _
19      in      in      ADP     IN      _       22      case    22:case _
20      the     the     DET     DT      Definite=Def|PronType=Art       22      det     22:det  _
21      60s     60      NOUN    NNS     Number=Plur     22      nmod:unmarked   22:nmod:unmarked        TemporalNPAdjunct=Yes
22      hit     hit     NOUN    VBD     Mood=Ind|Tense=Past|VerbForm=Fin        4       obl     4:obl:in        _

Most of the such as I believe are ExtPos=ADP. Would this one be as well?

@AngledLuffa
Copy link
Contributor

Looking at the "up to" in EWT, there seem to be some which are not labeled:

Some labeled examples:

# text = There is an e-mail by Moussaoui, however, dated July 31, 2001 indicating that he sought to take a crop dusting course that was to last up to 6 months.
29      up      up      ADP     IN      ExtPos=ADV      32      advmod  32:advmod       _
30      to      to      ADP     IN      _       29      fixed   29:fixed        _
31      6       6       NUM     CD      NumForm=Digit|NumType=Card      32      nummod  32:nummod       _
32      months  month   NOUN    NNS     Number=Plur     28      obl     28:obl  SpaceAfter=No

# sent_id = email-enronsent43_01-0122
# text = Questar may be able to purchase material, but some of the items can have up to a 60 day delivery.
16      up      up      ADP     IN      ExtPos=ADV      19      advmod  19:advmod       _
17      to      to      ADP     IN      _       16      fixed   16:fixed        _
18      a       a       DET     DT      Definite=Ind|PronType=Art       21      det     21:det  _
19      60      60      NUM     CD      NumForm=Digit|NumType=Card      20      nummod  20:nummod       _
20      day     day     NOUN    NN      Number=Sing     21      compound        21:compound     _
21      delivery        delivery        NOUN    NN      Number=Sing     15      obj     15:obj  SpaceAfter=No

# sent_id = newsgroup-groups.google.com_alt.animals.cat_01ff709c4bf2c60c_ENG_20040418_040100-0036
# text = Civet cats are trapped and placed in small cages inside darkened sheds, where the temperature is kept up to 110 F by fires.
19      up      up      ADP     IN      ExtPos=ADV      21      advmod  21:advmod       _
20      to      to      ADP     IN      _       19      fixed   19:fixed        _
21      110     110     NUM     CD      NumForm=Digit|NumType=Card      22      nummod  22:nummod       _
22      F       F       PROPN   NNP     Number=Sing     18      obj     18:obj  _

# sent_id = newsgroup-groups.google.com_alt.animals_0084bdc731bfc8d8_ENG_20040905_212000-0136
# text = (Participants pledged to raise at least $100,000 by bundling together cheques of up to $1,000 from friends and family.
15      up      up      ADP     IN      ExtPos=ADV      18      advmod  18:advmod       _
16      to      to      ADP     IN      _       15      fixed   15:fixed        _
17      $       $       SYM     $       _       13      nmod    13:nmod:of      SpaceAfter=No
18      1,000   1000    NUM     CD      NumForm=Digit|NumType=Card      17      nummod  17:nummod       _

Here are some not labeled examples which are kind of similar to others above:

this looks very similar to "up to $1,000"

# sent_id = weblog-blogspot.com_dakbangla_20050311135387_ENG_20050311_135387-0178
# text = (There initially was an outstanding $2 million reward -- under the rewards for justice program, the reward now is up to $5 million.).
23      up      up      ADV     RB      _       25      advmod  25:advmod       _
24      to      to      ADP     IN      _       25      case    25:case _
25      $       $       SYM     $       _       4       parataxis       4:parataxis     SpaceAfter=No
26      5       5       NUM     CD      NumForm=Digit|NumType=Card      27      compound        27:compound     _
27      million million NUM     CD      NumForm=Word|NumType=Card       25      nummod  25:nummod       SpaceAfter=No

not labeled because it was viewed as "heat up" as opposed to "up to", fixed? but above "kept up to" was treated as "kept - up to" instead of "kept up"

# sent_id = newsgroup-groups.google.com_alt.animals.cat_01ff709c4bf2c60c_ENG_20040418_040100-0004
# text = Recently, I read an email that reported the horrific nightmare Civet Cats go through by sick losers who put them in sheds and heat the inside of the sheds up to 110 degrees with fires
31      up      up      ADP     RP      _       25      compound:prt    25:compound:prt _
32      to      to      ADP     IN      _       34      case    34:case _
33      110     110     NUM     CD      NumForm=Digit|NumType=Card      34      nummod  34:nummod       _
34      degrees degree  NOUN    NNS     Number=Plur     25      obl     25:obl:to       _
35      with    with    ADP     IN      _       36      case    36:case _
36      fires   fire    NOUN    NNS     Number=Plur     25      obl     25:obl:with     SpaceAfter=No

these aren't labeled fixed - lots of other "up to" which were not labeled but clearly shouldn't be, here i wonder if these are actually fixed

# sent_id = reviews-369087-0007
# text = The advisor kept me up to date and informed on the progress of my vehicle.
1       The     the     DET     DT      Definite=Def|PronType=Art       2       det     2:det   _
2       advisor advisor NOUN    NN      Number=Sing     3       nsubj   3:nsubj _
3       kept    keep    VERB    VBD     Mood=Ind|Number=Sing|Person=3|Tense=Past|VerbForm=Fin   0       root    0:root  _
4       me      I       PRON    PRP     Case=Acc|Number=Sing|Person=1|PronType=Prs      3       obj     3:obj   _
5       up      up      ADP     IN      _       7       case    7:case  _
6       to      to      ADP     IN      _       7       case    7:case  _
7       date    date    NOUN    NN      Number=Sing     3       obl     3:obl:up_to     _

# sent_id = reviews-122882-0003
# text = The atmosphere alone deserves 4 stars but, the food was not up to par with the price tag and the reputation the restaurant carries.
12      not     not     PART    RB      Polarity=Neg    15      advmod  15:advmod       _
13      up      up      ADP     IN      _       15      case    15:case _
14      to      to      ADP     IN      _       15      case    15:case _
15      par     par     NOUN    NN      Number=Sing     4       conj    4:conj:but      _

@nschneid
Copy link
Contributor Author

nschneid commented Oct 8, 2024

  • up to: According to https://universaldependencies.org/en/dep/fixed.html, quantity "up to" is fixed. Agreed that "up to $5 million" fits. Other "up to" expressions (up to date, up to par) are idiomatic in meaning but probably not structurally weird enough to warrant fixed.
  • such as: "such as in the 60s" is interesting because it goes with "in the 60s hit" which is itself a PP. Some other prepositions ("like", "except", ...) can also do this. I think ExtPos=ADP is probably our best bet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants