Multimodal Deep Learning

In the last few years, there have been several breakthroughs in the methodologies used in Natural Language Processing (NLP) as well as Computer Vision (CV). Beyond these improvements on single-modality models, large-scale multi-modal approaches have become a very active area of research.

In this seminar, we reviewed these approaches and attempted to create a solid overview of the field, starting with the current state-of-the-art approaches in the two subfields of Deep Learning individually. Further, modeling frameworks are discussed where one modality is transformed into the other, as well as models in which one modality is utilized to enhance representation learning for the other. To conclude the second part, architectures with a focus on handling both modalities simultaneously are introduced. Finally, we also cover other modalities as well as general-purpose multi-modal models, which are able to handle different tasks on different modalities within one unified architecture. One interesting application (Generative Art) eventually caps off this booklet.

How this book came about

This book is the result of a student seminar for Master Statistics and Master Data Science at the LMU in the summer semester 2022. Each student in the seminar wrote about a specific chapter of the book to pass the seminar.

How to build the book

Step 0: Prerequisites

Make sure you have git and R up and running on your computer.

Step 1: Clone the repository to your machine

With RStudio: https://support.rstudio.com/hc/en-us/articles/200532077-Version-Control-with-Git-and-SVN

With command-line:

git clone [email protected]/slds-lmu/seminar_multimodal_dl.git

Step 2: Install dependencies

Start R in the project folder:

install.packages("devtools")
devtools::install_dev_deps()

Step 3: Render the book (R commands)

# HTML
bookdown::render_book('./', 'bookdown::gitbook')
# PDF
bookdown::render_book('./', 'bookdown::pdf_book')

Name		Name	Last commit message	Last commit date
Latest commit History 505 Commits
.github/workflows		.github/workflows
arxiv-files		arxiv-files
bibfiles/merged		bibfiles/merged
code/01-chapter1		code/01-chapter1
css		css
data/01-chapter1		data/01-chapter1
figures		figures
latex		latex
presentations		presentations
results/01-chapter1		results/01-chapter1
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
00-introduction.Rmd		00-introduction.Rmd
01-00-intro-modalities.Rmd		01-00-intro-modalities.Rmd
01-01-sota-nlp.Rmd		01-01-sota-nlp.Rmd
01-02-sota-cv.Rmd		01-02-sota-cv.Rmd
01-03-benchmarks.Rmd		01-03-benchmarks.Rmd
02-00-multimodal.Rmd		02-00-multimodal.Rmd
02-01-img2text.Rmd		02-01-img2text.Rmd
02-02-text2img.Rmd		02-02-text2img.Rmd
02-03-img-support-text.Rmd		02-03-img-support-text.Rmd
02-04-text-support-img.Rmd		02-04-text-support-img.Rmd
02-05-text-plus-img.Rmd		02-05-text-plus-img.Rmd
03-00-further.Rmd		03-00-further.Rmd
03-01-further-modalities.Rmd		03-01-further-modalities.Rmd
03-02-structured-unstructured.Rmd		03-02-structured-unstructured.Rmd
03-03-multi-purpose.Rmd		03-03-multi-purpose.Rmd
03-04-usecase.Rmd		03-04-usecase.Rmd
96-conclusion.Rmd		96-conclusion.Rmd
97-epilogue.Rmd		97-epilogue.Rmd
98-acknowledgments.Rmd		98-acknowledgments.Rmd
99-references.Rmd		99-references.Rmd
CONTRIBUTING.md		CONTRIBUTING.md
DESCRIPTION		DESCRIPTION
Makefile		Makefile
README.md		README.md
Table-ch2-3-final.pdf		Table-ch2-3-final.pdf
_bookdown.yml		_bookdown.yml
_output.yml		_output.yml
book.bib		book.bib
book.bib.backup		book.bib.backup
book.bib.backup.blg		book.bib.backup.blg
book.bib.blg		book.bib.blg
book.bib.new		book.bib.new
book.knit.md		book.knit.md
citeprog-warnings.txt		citeprog-warnings.txt
fix-cits.R		fix-cits.R
index.Rmd		index.Rmd
krantz.cls		krantz.cls

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multimodal Deep Learning

How this book came about

How to build the book

About

Contributors 16

Languages

slds-lmu/seminar_multimodal_dl

Folders and files

Latest commit

History

Repository files navigation

Multimodal Deep Learning

How this book came about

How to build the book

About

Resources

Stars

Watchers

Forks

Contributors 16

Languages