Images, Art & Video - Fundamentals #43

HyunkuKwon · 2021-01-12T18:29:57Z

Goodfellow, Ian, Yoshua Bengio, and Aaron Courville. 2016. Deep learning. Chapter 12.2 “Convolutional Networks.” MIT press: 326-366.

Goodfellow, Ian, Yoshua Bengio, and Aaron Courville. 2016. Deep learning. Chapter 12.2 “Applications: Machine Vision.” MIT press: 447-453.

Raychanan · 2021-03-09T11:03:01Z

Goodfellow and colleagues talked about global and local normalizations in their paper. I’m wondering if the paper “Deep Neural Networks Are More Accurate Than Humans at Detecting Sexual Orientation From Facial Images” by Michal Kosinski and Yilun Wang is an application of the concept of normalization?

In the paper by Kosinski and Wang, the Euclidean distances between the landmarks were normalized in order to account for the differing sizes of the faces in facial images.

Is the normalization process used by Kosinski and Wang the same thing as Goodfellow wrote about in their paper? Also, I don’t know if the purpose of normalization by Goodfellow is to help me compare different metrics/units. Can you please explain more about this?

ming-cui · 2021-03-11T13:35:04Z

When we train CNNs to identify images, should we seriously consider striding and padding?

jacyanthis · 2021-03-11T22:27:05Z

Why aren’t convolutions used much in text classification like BERT, GPT-2, or GPT-3 (yet)? e.g. SqueezeBERT

RobertoBarrosoLuque · 2021-03-11T22:30:15Z

The first couple of sections in the deep learning book chapter are mainly about different hardware implementations for deep neural networks training, deployment, etc. Are their any resources that could be shared with us to learn more about parallel computation and efficient algorithm design to optimize our models?

lilygrier · 2021-03-12T04:28:45Z

How similar are the methods employed in computer vision to the methods employed in audio analysis? It seems to me that the two would be quite different, as soundwaves do not seem analogous to pixels. I'd be interested to hear how these two applications of deep neural networks have co-evolved over time.

xxicheng · 2021-03-12T04:40:45Z

Could you please give us some examples of applying these methods to inequality topics?

jinfei1125 · 2021-03-12T05:23:13Z

When reading the chapter, I am thinking dealing with images is similar to dealing with matrices, and the big difference is that we change the columns and rows of matrics to pixels, and we change the number in matrics to a series of numbers like RGB, contrast, or some computer vision-specific terms. Is this true? I feel like all models in content analysis finally use numbers to represent everything. Images and videos are of the complicated ones. Does this mean dealing with images and videos need more computing power and are more computationally expensive?

k-partha · 2021-03-12T06:11:15Z

Are there methods that produce contextual embeddings in images, similar to that in language models like BERT?
E.g. Embedding an image in context to a series of images, or even embedding a particular object in an image with respect to other objects in the image? This seems like it could be highly useful for content analysis where relationships between entities are often very important.

romanticmonkey · 2021-03-12T08:59:58Z

@k-partha There's this model called PiCANet, which learns to map the pixels of salient objects in an image (they call it pixel-wise contexual attention). I think this might be related to your second idea. Hope it sounds interesting to you!

My question: I'm very interested in the applications of transfer learning in images. What are some fun image projects that make use of pre-trained models (like VGG16)? I know that there are artist identification tasks (for paintings). Are there social science related ones?

MOTOKU666 · 2021-03-12T11:12:27Z

I'm also interested in how voice and images may be combined together to have a more comprehensive analysis of video resources. Would this be computationally demanding? Is there any mature way to deal with videos?

Rui-echo-Pan · 2021-03-12T14:30:55Z

I am also curious about if there can be some analysis concerning the context, and analysis among smaller elements of the vision in the field of vision analysis. Like text analysis, we may depart the documents to sentences and sentences to words, so is there any similar analysis concerning visual analysis?

sabinahartnett · 2021-03-12T15:25:27Z

to what degree are the NLP techniques we've discussed in class implemented in audio analysis (where audio is transcribed and analyzed as 'written text')?

hesongrun · 2021-03-12T15:27:10Z

To what degree can transformers help with computer visions? I think they have already revolutionized the NLP. I am wondering if introducing the attention mechanism can better capture the fundamental distribution or ideas behind the image?

Bin-ary-Li · 2021-03-12T15:36:14Z

To me, the most exciting part of ConvNN is its connection with vision and neuroscience. I wonder if there will be more NN models that can reverse-engineer various sensory/cognitive systems that we know so much about.

william-wei-zhu · 2021-03-12T15:41:47Z

Like the method of identifying the context of words in texts, I wonder if in image recognition, we can also detect context of an object by its surrounding environment.

egemenpamukcu · 2021-03-12T15:42:46Z

Echoing Partha's question, I would also like to hear more about efforts to embed images, audio and even video? Would it be possible to create an embedding for movies and find movies similar to it for example? Or songs?

theoevans1 · 2021-03-12T15:59:03Z

To what extent is computer vision through deep learning a black box? When using these techniques, in what ways are we able to understand the reasons for image classifications?

zshibing1 · 2021-03-12T16:21:39Z

Some images, e.g., facial images, are more "structured" than others, then how useful are computational methods in analyzing those less structured images?

jcvotava · 2021-03-12T17:50:09Z

How computationally expensive are audio and image processing techniques using neural nets as opposed to NLP using NNs?

mingtao-gao · 2021-03-12T19:31:37Z

For user-generated images on social media, are they open to be scraped by researchers? Will people be less willing to post images of themselves or silly ones if they realize they are being monitored and analyzed?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Images, Art & Video - Fundamentals #43

Images, Art & Video - Fundamentals #43

HyunkuKwon commented Jan 12, 2021

Raychanan commented Mar 9, 2021

ming-cui commented Mar 11, 2021

jacyanthis commented Mar 11, 2021

RobertoBarrosoLuque commented Mar 11, 2021

lilygrier commented Mar 12, 2021

xxicheng commented Mar 12, 2021

jinfei1125 commented Mar 12, 2021

k-partha commented Mar 12, 2021

romanticmonkey commented Mar 12, 2021

MOTOKU666 commented Mar 12, 2021

Rui-echo-Pan commented Mar 12, 2021

sabinahartnett commented Mar 12, 2021

hesongrun commented Mar 12, 2021 •

edited

Loading

Bin-ary-Li commented Mar 12, 2021

william-wei-zhu commented Mar 12, 2021

egemenpamukcu commented Mar 12, 2021

theoevans1 commented Mar 12, 2021

zshibing1 commented Mar 12, 2021

jcvotava commented Mar 12, 2021

mingtao-gao commented Mar 12, 2021

Images, Art & Video - Fundamentals #43

Images, Art & Video - Fundamentals #43

Comments

HyunkuKwon commented Jan 12, 2021

Raychanan commented Mar 9, 2021

ming-cui commented Mar 11, 2021

jacyanthis commented Mar 11, 2021

RobertoBarrosoLuque commented Mar 11, 2021

lilygrier commented Mar 12, 2021

xxicheng commented Mar 12, 2021

jinfei1125 commented Mar 12, 2021

k-partha commented Mar 12, 2021

romanticmonkey commented Mar 12, 2021

MOTOKU666 commented Mar 12, 2021

Rui-echo-Pan commented Mar 12, 2021

sabinahartnett commented Mar 12, 2021

hesongrun commented Mar 12, 2021 • edited Loading

Bin-ary-Li commented Mar 12, 2021

william-wei-zhu commented Mar 12, 2021

egemenpamukcu commented Mar 12, 2021

theoevans1 commented Mar 12, 2021

zshibing1 commented Mar 12, 2021

jcvotava commented Mar 12, 2021

mingtao-gao commented Mar 12, 2021

hesongrun commented Mar 12, 2021 •

edited

Loading