Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exploring Semantic Spaces - Orientation #28

Open
HyunkuKwon opened this issue Jan 12, 2021 · 23 comments
Open

Exploring Semantic Spaces - Orientation #28

HyunkuKwon opened this issue Jan 12, 2021 · 23 comments

Comments

@HyunkuKwon
Copy link
Collaborator

Post questions about the following orienting reading:

Kozlowski, Austin, Matt Taddy, James Evans. 2019. “The Geometry of Culture: Analyzing the Meanings of Class through Word Embeddings.” American Sociological Review 85(5):905-949.

@Raychanan
Copy link

According to the paper, the word embedding techniques would greatly help us understand various cultural dimensions. As we just learned about clustering last week, I’m wondering if the technique can also be used in the field of clustering to help us summarize some cultural aspects?

I have one more question. We can surely load text data in the last 1000 years with all the features we have, but the process may take an enormous amount of time. As a usual practice, we would filter some of the features from the social theories. And here is the problem. If we need the social theories to select our features, the technique used to summarize social aspects in this paper would be less meaningful (as it is supposed to help us capture social aspects). Can you please talk about your opinions on this problem?

@yushiouwillylin
Copy link

This paper is insightful as it provides many interesting patterns while fully recognizing their caveats.

I have a question about the word "mediate", which is used repeatedly throughout the paper. Using word embedding we can get that "affluence" and "status" are not entirely orthogonal to any other fields, which is what the authors mean by class "mediator" in this paper. However, I don't quite understand the choice of using "mediate" to describe this result. Is there any sociological theoretical model in mind when this is written? Or it really is just trying to emphasize that these 2 classes are not entirely unrelated with other classes?

@jacyanthis
Copy link

When I've heard Austin present this research, almost all the critiques were in the thread: Why is this research useful if it largely just confirms prior findings from non-computational methods? This seems like a frequent criticism of computational text analysis. How would you respond to this, and in general, how should we respond (e.g. generating questions, increasing confidence in seminal findings, methodological preparation for future research)?

@xxicheng
Copy link

My question is about sampling corpora before employing the word embedding method. As mentioned in the paper, word embeddings need a large corpus containing several million words and are not very flexible for different corpora. So, as @jacyanthis has mentioned, what else research can it be used to besides finding long-term patterns of a specific topic?

Also, what suggestions and tips would you give on sampling texts for word embedding?

@jcvotava
Copy link

Wow - what a cool paper. I've enjoyed the papers we've read so far this class but to be entirely honest this was the one that really made me think that there's something very promising going on here.

My question is about word contexts as sites of contestation in meaning. For instance, suppose that a given word (say "bicycle") has given connotations for one group of the population (ex. urban dwellers) but very different meanings for a different group (ex. rural dwellers) such that these populations would place the word on very different locations in axis like rich-poor, black-white, male-female, etc. Is a semantic-context based approach which simply averages over a large corpus of texts simply going to represent such an object as having middling average values along relevant dimensions, even though actual opinions lie to either extreme? Are such problems likely to actually affect analysis?

@k-partha
Copy link

In a context where the class demographics of text producers is changing through time, do you think some of the changes in stereotypical/biased attitudes as captured through word-embeddings reflect only the changes occurring within certain strata of society? Some additional context - our society is more polarised and 'echo chambered' than ever and some important axes that capture this phenomenon are progressive/backward attitudes and education.

To what extent do you see this as a problem that affects the generalizability of research already published and research to come?

@romanticmonkey
Copy link

Similar to the first question of @k-partha : if we extend this studies further back in time, say 500 years, will our corpus be very biased toward the upper classes? And as a result, might we be studying the cultural changes of the upper classes instead of the changes of a society as a whole?

Also, if we try to go back further in time, how can we make sure the vector space is correctly presenting the language use of the time? In this study, the 1950's study can be used to verify the representativeness, but when we want to go back 500 years, how can we make sure the vector space is representative to the best extent?

Lastly, will the discrepancy between oral and written language cause problems? This paper mentioned in the limitations that some cultural associations might be off-target. I think this mismatch will in particular happen between casual and formal use of language, because books always approach formal language no matter who writes it. How would you suggest that we should deal with this problem? As for studying of 21st century language, do you think social media corpora might help, as they resemble daily conversations more?

@jinfei1125
Copy link

This is a very interesting paper and we can see a lot of interesting results! I enjoy it a lot when reading it. But I am a little confused about the relationship in figure 6--perhaps because I am not familiar with the difference between cultivation and education..... Does cultivation mean knowledge itself while education emphasizes degrees such as bachelor's, master's, or Ph.D.? Thanks!
image

@ming-cui
Copy link

This paper is fantastic! Traditionally, researchers are more likely to use surveys or publicly available data, instead of text data, to study the relationships between variables. While word embedding models seem to focus on corpora per se. I am thinking about if word embedding models can also be used to study the relationships between culture and other constructs (e.g., cooperation) on a large scale. Thank you.

@Rui-echo-Pan
Copy link

It's interesting to see how we can measure the cultural and meaning change of some specific words by putting them into the space of vectors, and examine the dimension relations. It seems that there are some similarities between topic modeling and word embedding, in that they both emphasize vectors to measure the words/contexts for analysis. Could you explain a bit more about the differences?

@zshibing1
Copy link

While most readers would agree that word embedding has the potential to be a truly disruptive method for fields of study such as sociology of culture, it is not entirely clear, at least from this paper, that by using this method we can obtain meaningful results. For example, in this paper, what is so meaningful about the persistence in the close relations among dimensions of cultivation, morality, and education? Does this finding help to resolve any important debate?

@MOTOKU666
Copy link

This is a charming paper that enables me to think about a more efficient way of understanding culture patterns. As the paper mentioned, “opera” is considered more affluent than “jazz” by projecting the word vectors corresponding to “opera” and “jazz” onto the dimension of the space corresponding to affluence. Similarly, the researcher can determine if “jazz” is more masculine or feminine than “opera” by projecting these words onto the gender dimension. According to the paper, this word embedding model is for sure productive, I'm wondering how large the corpora would be to get a meaningful outcome.

@Bin-ary-Li
Copy link

One thing I always worry about is the stability of high-dimensional embedding, as it can be very data-dependent. I wonder what objective method can be used to circumvent that?

@william-wei-zhu
Copy link

Does word embedding have potential unearth latent trends and predict future events?

@hesongrun
Copy link

hesongrun commented Feb 26, 2021

Thanks for the great paper! This is very insightful. I am wondering if we consider word embeddings shift dynamically over time, how do we align the embedding spaces across the difference time slices so that one word has relatively stable embeddings over time? Thanks!

@sabinahartnett
Copy link

While the word embeddings can reveal cultural trends - is there an iterative way to interact with word embeddings and inform their cultural (not only linguistic) similarities?

@theoevans1
Copy link

This paper notes the importance for cultural analysis of analyzing biases rather than aiming to “debias” models (914). In what contexts do Bolukbasi et al. suggest debiasing? Are there examples of instances within cultural analysis in which such a strategy would be useful?

@toecn
Copy link

toecn commented Feb 26, 2021

What do different geometries of the word-space represent in society? How can we think about the changes in those geometries in relation to social change? Different from mapping associations between words, can we think of other geometrical expressions say about social life? For instance, what geometrical expression serve to identify tension in social life (as in ethnic tensions for instance or race or class) or to see the diversity of ideas in a social space?

@dtmlinh
Copy link

dtmlinh commented Feb 26, 2021

The paper mentioned that Google Ngrams is not good at picking up racial associations, are there other sources that would be better at picking up these associations?

@mingtao-gao
Copy link

To apply word embedding to extract context from the corpus, researchers need to determine a k to quantify 'context', as the article states that 'previous studies finding windows of ~8 words producing most consistent results'. Should we choose different k for different types of corpus? What is a good method to find the optimal k?

@egemenpamukcu
Copy link

This was a really interesting paper. I am wondering what would be some of the ways validating the semantic associations revealed by word embeddings? Should we rely on our judgement for validation of these intermediary results or is there a way to get an 'objective' measure of accuracy?

@RobertoBarrosoLuque
Copy link

Is there any work using word embeddings to understand cultural differences across countries? If so could you point us into some of the literature?

@lilygrier
Copy link

I found this reading to be a fascinating application of using word embeddings to explore semantic changes over time. I noticed that the researchers said they started by looking at gender, race, and class, and found only gender and class to yield results, so they chose to explore those. Is trying a lot of different features and seeing what "sticks" standard protocol for this type of exploration? Or is that in some contexts considered p-hacking (or trying to force results where they don't exist)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests