Exploring Semantic Spaces - Fundamentals #32

HyunkuKwon · 2021-01-12T18:07:58Z

Post questions here for one or more of our fundamentals readings:

Jurafsky, Daniel and James H. Martin. 2015. Speech and Language Processing. Chapters 15-16 (“Vector Semantics”, “Semantics with Dense Vectors”)

Raychanan · 2021-02-25T04:08:53Z

My question is non-technical. According to the text, human languages have a wide variety of features that are used to convey meaning. And just like the paper included in this week, Caliskan and colleagues found that semantics derived automatically from language corpora contain human-like biases. So I’m curious if we have some techniques to eliminate social biases when constructing our logical representations of sentence meaning? I’m afraid the social biases embedded in human expression would aggravate the social imbalance with the wide use of automatical techniques in the near future.

jacyanthis · 2021-02-25T19:06:39Z

Is there an established way to run word2vec or another embedding algorithm with attention weights? i.e. where the context words are given unequal weights depending on their relevance. I see Sonkar et al 2020 attempt this, but I don't know how viable their approach is and whether it's coded up so we can use it (e.g. in Gensim), or whether it really makes a difference in the semantic space we build.

xxicheng · 2021-02-26T05:30:39Z

Can multilayer perceptron networks be used in unsupervised learning?

romanticmonkey · 2021-02-26T07:29:08Z

I'd like to ask here your opinion on semantic parsing. Many think that semantic parsing is a bulky approach, much less efficient than neural networks, but it is indeed intuitive to human reading. Do you think it's an approach worth investigating on?

jinfei1125 · 2021-02-26T08:23:44Z

How can we choose the best loss functions? If I understand your first lecture this week correctly, you mentioned some are good at understanding the texts, and others are good at predictions. Is there any rule of thumb when choosing loss functions?

k-partha · 2021-02-26T08:39:58Z

Has any CSS research used dynamic contextual embeddings (from an attention-based transformer model) recently? Given the superior performance of BERT and transformer-based models on 'meaning' related benchmarks, I'm wondering what exciting new CSS research paradigms these dynamic embeddings can power.

sabinahartnett · 2021-02-26T11:30:40Z

I understand the foundations and sentence level functionality of semantic parsing as is explained by this piece - but I was hoping to hear a bit more about the types of texts in which it functions best and is most often implemented - could you elaborate on some examples where semantic parsing is most often used and most often useful (i.e what types of differences might we see when texts are more or less correlated? More linguistically consistent?)?

ming-cui · 2021-02-26T11:53:18Z

As the quote of Zhuangzi is originally in Chinese, I am wondering if the embedding models are applicable for Chinese, given that the concepts introduced in Chapter 6 do not work in Chinese. Chinese characters can either express meanings alone or in combination, and there are usually no spaces in a sentence.

MOTOKU666 · 2021-02-26T12:28:39Z

I'm wondering whether tense logic would be easier to deal with in other languages. In particular, "temporal expressions in English are frequently expressed in spatial terms, as is illustrated by the various uses of at, in, somewhere, and near in these examples".

Rui-echo-Pan · 2021-02-26T13:36:30Z

The problem of bias and embeddings appear again, which is an important topic in one of the exemplary readings. What's the practical problem of it in the analysis, and what are techniques to reduce the bias when using embedding?

zshibing1 · 2021-02-26T14:32:14Z

In “Vector Semantics”, why can we assume that "battle", "good", "fool", "wit" are orthogonal to each other and assign them four distinctive dimensions, as in figure 6.2?

Bin-ary-Li · 2021-02-26T15:08:32Z

Is there any literature that examines the qualitative difference between embedding generated by, said, LSA using 300 SVD components and a learned feedforward neural network representation layer?

hesongrun · 2021-02-26T15:46:57Z

The word2vec technique is awesome! The co-occurrence information brings in much richer second-order information of texts! I have a question about the assessment of word-embeddings. What are some common metrics to assess the effectiveness of the embeddings learnt?

william-wei-zhu · 2021-02-26T15:56:37Z

Extending from @ming-cui , I wonder if the usefulness embedding models is stable across other forms of languages.

theoevans1 · 2021-02-26T15:57:16Z

Chapter 6 discusses attempts at debiasing, but notes that bias can not be eliminated. Are there particular forms of bias that are especially difficult to reduce? What kinds of consequences can this have in applications of these methods?

RobertoBarrosoLuque · 2021-02-26T16:16:45Z

If possible could we go more in depth into how word embedding are trained? Say we want to compare how two actors use the same term, should we have a corpus for each then train a wordembedding for each and find the closest words to our term of interest?

Also, using word embeddings is more memory efficient than using tf-idf or count vectors since, unlike the other two, embeddings are dense, yet they seem more computationally inefficient since they require learning complex weights through stochastic gradient descent. How should we weigh these trade-offs when thinking about which word representations to use?

mingtao-gao · 2021-02-26T17:11:15Z

For this week’s reading, I noticed that it is necessary to generate training data for the transition-based dependency parsing. I wonder what is the “appropriate” size of the training set in order to obtain a reliable model? Would this algorithm be robust if we are not able to provide enough information?

egemenpamukcu · 2021-02-26T17:26:28Z

I would like to hear more about the unsupervised vs supervied neurat nets and their different uses in text analysis. Also is there a reason, say for a classification problem, why a researcher would choose to use logistic regression instead of a neural network besides interpretability? And it would be great if you could talk more about the word2vec representation of words and how its results are being processed by neural networks.

lilygrier · 2021-02-26T18:49:18Z

Given the extra computational effort required, in what contexts do word embeddings provide insight above and beyond more straightforward analyses of co-occurrences through looking at k-grams? Are there cases where it actually makes more sense not to use a mathematically intense operation such as word2vec?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Exploring Semantic Spaces - Fundamentals #32

Exploring Semantic Spaces - Fundamentals #32

HyunkuKwon commented Jan 12, 2021

Raychanan commented Feb 25, 2021

jacyanthis commented Feb 25, 2021

xxicheng commented Feb 26, 2021

romanticmonkey commented Feb 26, 2021

jinfei1125 commented Feb 26, 2021

k-partha commented Feb 26, 2021

sabinahartnett commented Feb 26, 2021

ming-cui commented Feb 26, 2021 •

edited

Loading

MOTOKU666 commented Feb 26, 2021

Rui-echo-Pan commented Feb 26, 2021

zshibing1 commented Feb 26, 2021

Bin-ary-Li commented Feb 26, 2021

hesongrun commented Feb 26, 2021 •

edited

Loading

william-wei-zhu commented Feb 26, 2021 •

edited

Loading

theoevans1 commented Feb 26, 2021

RobertoBarrosoLuque commented Feb 26, 2021 •

edited

Loading

mingtao-gao commented Feb 26, 2021

egemenpamukcu commented Feb 26, 2021

lilygrier commented Feb 26, 2021 •

edited

Loading

Exploring Semantic Spaces - Fundamentals #32

Exploring Semantic Spaces - Fundamentals #32

Comments

HyunkuKwon commented Jan 12, 2021

Raychanan commented Feb 25, 2021

jacyanthis commented Feb 25, 2021

xxicheng commented Feb 26, 2021

romanticmonkey commented Feb 26, 2021

jinfei1125 commented Feb 26, 2021

k-partha commented Feb 26, 2021

sabinahartnett commented Feb 26, 2021

ming-cui commented Feb 26, 2021 • edited Loading

MOTOKU666 commented Feb 26, 2021

Rui-echo-Pan commented Feb 26, 2021

zshibing1 commented Feb 26, 2021

Bin-ary-Li commented Feb 26, 2021

hesongrun commented Feb 26, 2021 • edited Loading

william-wei-zhu commented Feb 26, 2021 • edited Loading

theoevans1 commented Feb 26, 2021

RobertoBarrosoLuque commented Feb 26, 2021 • edited Loading

mingtao-gao commented Feb 26, 2021

egemenpamukcu commented Feb 26, 2021

lilygrier commented Feb 26, 2021 • edited Loading

ming-cui commented Feb 26, 2021 •

edited

Loading

hesongrun commented Feb 26, 2021 •

edited

Loading

william-wei-zhu commented Feb 26, 2021 •

edited

Loading

RobertoBarrosoLuque commented Feb 26, 2021 •

edited

Loading

lilygrier commented Feb 26, 2021 •

edited

Loading