Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Images, Art & Video - Fundamentals #43

Open
HyunkuKwon opened this issue Jan 12, 2021 · 20 comments
Open

Images, Art & Video - Fundamentals #43

HyunkuKwon opened this issue Jan 12, 2021 · 20 comments

Comments

@HyunkuKwon
Copy link
Collaborator

Goodfellow, Ian, Yoshua Bengio, and Aaron Courville. 2016. Deep learning. Chapter 12.2 “Convolutional Networks.” MIT press: 326-366.

Goodfellow, Ian, Yoshua Bengio, and Aaron Courville. 2016. Deep learning. Chapter 12.2 “Applications: Machine Vision.” MIT press: 447-453.

@Raychanan
Copy link

Goodfellow and colleagues talked about global and local normalizations in their paper. I’m wondering if the paper “Deep Neural Networks Are More Accurate Than Humans at Detecting Sexual Orientation From Facial Images” by Michal Kosinski and Yilun Wang is an application of the concept of normalization?

In the paper by Kosinski and Wang, the Euclidean distances between the landmarks were normalized in order to account for the differing sizes of the faces in facial images.

Is the normalization process used by Kosinski and Wang the same thing as Goodfellow wrote about in their paper? Also, I don’t know if the purpose of normalization by Goodfellow is to help me compare different metrics/units. Can you please explain more about this?

@ming-cui
Copy link

When we train CNNs to identify images, should we seriously consider striding and padding?

@jacyanthis
Copy link

Why aren’t convolutions used much in text classification like BERT, GPT-2, or GPT-3 (yet)? e.g. SqueezeBERT

@RobertoBarrosoLuque
Copy link

The first couple of sections in the deep learning book chapter are mainly about different hardware implementations for deep neural networks training, deployment, etc. Are their any resources that could be shared with us to learn more about parallel computation and efficient algorithm design to optimize our models?

@lilygrier
Copy link

How similar are the methods employed in computer vision to the methods employed in audio analysis? It seems to me that the two would be quite different, as soundwaves do not seem analogous to pixels. I'd be interested to hear how these two applications of deep neural networks have co-evolved over time.

@xxicheng
Copy link

Could you please give us some examples of applying these methods to inequality topics?

@jinfei1125
Copy link

When reading the chapter, I am thinking dealing with images is similar to dealing with matrices, and the big difference is that we change the columns and rows of matrics to pixels, and we change the number in matrics to a series of numbers like RGB, contrast, or some computer vision-specific terms. Is this true? I feel like all models in content analysis finally use numbers to represent everything. Images and videos are of the complicated ones. Does this mean dealing with images and videos need more computing power and are more computationally expensive?

@k-partha
Copy link

Are there methods that produce contextual embeddings in images, similar to that in language models like BERT?
E.g. Embedding an image in context to a series of images, or even embedding a particular object in an image with respect to other objects in the image? This seems like it could be highly useful for content analysis where relationships between entities are often very important.

@romanticmonkey
Copy link

@k-partha There's this model called PiCANet, which learns to map the pixels of salient objects in an image (they call it pixel-wise contexual attention). I think this might be related to your second idea. Hope it sounds interesting to you!

My question: I'm very interested in the applications of transfer learning in images. What are some fun image projects that make use of pre-trained models (like VGG16)? I know that there are artist identification tasks (for paintings). Are there social science related ones?

@MOTOKU666
Copy link

I'm also interested in how voice and images may be combined together to have a more comprehensive analysis of video resources. Would this be computationally demanding? Is there any mature way to deal with videos?

@Rui-echo-Pan
Copy link

I am also curious about if there can be some analysis concerning the context, and analysis among smaller elements of the vision in the field of vision analysis. Like text analysis, we may depart the documents to sentences and sentences to words, so is there any similar analysis concerning visual analysis?

@sabinahartnett
Copy link

to what degree are the NLP techniques we've discussed in class implemented in audio analysis (where audio is transcribed and analyzed as 'written text')?

@hesongrun
Copy link

hesongrun commented Mar 12, 2021

To what degree can transformers help with computer visions? I think they have already revolutionized the NLP. I am wondering if introducing the attention mechanism can better capture the fundamental distribution or ideas behind the image?

@Bin-ary-Li
Copy link

To me, the most exciting part of ConvNN is its connection with vision and neuroscience. I wonder if there will be more NN models that can reverse-engineer various sensory/cognitive systems that we know so much about.

@william-wei-zhu
Copy link

Like the method of identifying the context of words in texts, I wonder if in image recognition, we can also detect context of an object by its surrounding environment.

@egemenpamukcu
Copy link

Echoing Partha's question, I would also like to hear more about efforts to embed images, audio and even video? Would it be possible to create an embedding for movies and find movies similar to it for example? Or songs?

@theoevans1
Copy link

To what extent is computer vision through deep learning a black box? When using these techniques, in what ways are we able to understand the reasons for image classifications?

@zshibing1
Copy link

Some images, e.g., facial images, are more "structured" than others, then how useful are computational methods in analyzing those less structured images?

@jcvotava
Copy link

How computationally expensive are audio and image processing techniques using neural nets as opposed to NLP using NNs?

@mingtao-gao
Copy link

For user-generated images on social media, are they open to be scraped by researchers? Will people be less willing to post images of themselves or silly ones if they realize they are being monitored and analyzed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests