index.html

<!DOCTYPE html>
<html lang="en">

<head>
  <meta charset="utf-8">
  <title>HUMOR: a Crowd-Annotated Spanish Corpus for Humor Analysis</title>
  <meta name="description"
        content="Crowd-annotated corpus of 27k tweets written in Spanish, labeled by humor and funniness (1 to 5) value, created for Humor Analysis and Natural Language Processing.">
  <meta name="author" content="Santiago Castro; Luis Chiruzzo, Aiala Rosá; Diego Garat; Guillermo Moncecchi"/>
  <meta name="viewport" content="width=device-width, initial-scale=1">

  <script type="application/ld+json">
    {
      "@context": "http://schema.org",
      "@type": "Dataset",
      "name": "HUMOR",
      "description": "Crowd-annotated corpus of 27k tweets written in Spanish, labeled by humor and funniness (1 to 5) value, created for Humor Analysis and Natural Language Processing.",
      "creator": "Santiago Castro, Luis Chiruzzo, Aiala Rosá, Diego Garat, and Guillermo Moncecchi",
      "distribution": {
        "@type": "DataDownload",
        "encodingFormat": "CSV",
        "contentUrl": "https://pln-fing-udelar.github.io/humor/annotations_by_tweet.csv"
      },
      "datePublished": "2018-07-20"
    }
  </script>

  <meta property="og:type" content="website"/>
  <meta property="og:site_name" content="HUMOR: a Crowd-Annotated Spanish Corpus for Humor Analysis"/>
  <meta property="og:image" content="https://pln-fing-udelar.github.io/humor/og.png"/>
  <meta property="og:image:height" content="630"/>
  <meta property="og:image:width" content="1200"/>
  <meta property="og:title" content="HUMOR: a Crowd-Annotated Spanish Corpus for Humor Analysis"/>
  <meta property="og:description"
        content="Crowd-annotated corpus of 27k tweets written in Spanish, labeled by humor and funniness (1 to 5) value, created for Humor Analysis and Natural Language Processing."/>
  <meta property="og:url" content="https://pln-fing-udelar.github.io/humor/"/>
  <meta property="fb:app_id" content="1887710507982042"/>
  <meta name="twitter:card" content="summary"/>
  <meta name="twitter:site" content="@PLN_UdelaR"/>
  <meta name="twitter:creator" content="@PLN_UdelaR"/>

  <link href="index.css" rel="stylesheet">

  <script async src="https://www.googletagmanager.com/gtag/js?id=UA-34392230-8"></script>
  <script>
    window.dataLayer = window.dataLayer || [];

    function gtag() {
      dataLayer.push(arguments);
    }

    gtag('js', new Date());

    gtag('config', 'UA-34392230-8');
  </script>

</head>

<body>

<header>
  <h1>HUMOR</h1>
  <p id="subtitle">A Crowd-Annotated Spanish Corpus for Humor Analysis</p>
</header>

<div id="authors-affiliations">
  <p>
    <a href="https://github.com/bryant1410">Santiago Castro</a>, Luis Chiruzzo, Aiala Rosá, Diego Garat, and
    <a href="https://www.fing.edu.uy/~gmonce/">Guillermo Moncecchi</a>
  </p>
  <p>
    <a href="https://www.fing.edu.uy/inco/grupos/pln/">Grupo de Procesamiento de Lenguaje Natural</a> (NLP Group),
    <a href="https://udelar.edu.uy/">Universidad de la República</a> — Uruguay
  </p>
</div>

<p>Crowd-annotated corpus of 27k tweets written in Spanish, labeled by humor and funniness (1 to 5) value, created for
  Humor Analysis and Natural Language Processing.</p>

<a href="https://github.com/pln-fing-udelar/humor" target="_blank" class="github-corner"
   aria-label="View source on Github">
  <svg width="80" height="80" viewbox="0 0 250 250"
       style="fill:#151513; color:#fff; position: absolute; top: 0; border: 0; right: 0;" aria-hidden="true">
    <path d="M0,0 L115,115 L130,115 L142,142 L250,250 L250,0 Z"></path>
    <path
        d="M128.3,109.0 C113.8,99.7 119.0,89.6 119.0,89.6 C122.0,82.7 120.5,78.6 120.5,78.6 C119.2,72.0 123.4,76.3 123.4,76.3 C127.3,80.9 125.5,87.3 125.5,87.3 C122.9,97.6 130.6,101.9 134.4,103.2"
        fill="currentColor" style="transform-origin: 130px 106px;" class="octo-arm"></path>
    <path
        d="M115.0,115.0 C114.9,115.1 118.7,116.5 119.8,115.4 L133.7,101.6 C136.9,99.2 139.9,98.4 142.2,98.6 C133.8,88.0 127.5,74.4 143.8,58.0 C148.5,53.4 154.0,51.2 159.7,51.0 C160.3,49.4 163.2,43.6 171.4,40.1 C171.4,40.1 176.1,42.5 178.8,56.2 C183.1,58.6 187.2,61.8 190.9,65.4 C194.5,69.0 197.7,73.2 200.1,77.6 C213.8,80.2 216.3,84.9 216.3,84.9 C212.7,93.1 206.9,96.0 205.4,96.6 C205.1,102.4 203.0,107.8 198.3,112.5 C181.9,128.9 168.3,122.5 157.7,114.1 C157.9,116.9 156.7,120.9 152.7,124.9 L141.0,136.5 C139.8,137.7 141.6,141.9 141.8,141.8 Z"
        fill="currentColor" class="octo-body">
    </path>
  </svg>
</a>

<a href="http://www.aclweb.org/anthology/W18-3502">Paper</a>

<h2>Downloads</h2>

The dataset consists of the following 2 files:

<ul>
  <li><a href="annotations.csv">Annotations CSV</a></li>
  <li><a href="tweets.csv">Tweets CSV</a></li>
</ul>

<p>Aggregated version (one row per tweet with the sum of annotations for each category): <a
    href="annotations_by_tweet_all.csv">All Annotations by Tweet CSV</a></p>

<p>Aggregated version, without the annotations from people who did not pass the test tweets: <a
    href="annotations_by_tweet.csv"><b>Annotations by Tweet CSV</b></a>. This one was used by the <a
    href="https://www.fing.edu.uy/inco/grupos/pln/haha/">HAHA task, about Humor Recognition and Funniness Detection</a>.
</p>

<h2>Citation</h2>

If you publish work that uses this dataset, please cite as follows:

<pre><code>@inproceedings{castro2018,
  title={A Crowd-Annotated Spanish Corpus for Humor Analysis},
  author={Castro, Santiago and Chiruzzo, Luis and Ros{\'a}, Aiala and Garat, Diego and Moncecchi, Guillermo},
  booktitle={Proceedings of SocialNLP 2018, The 6th International Workshop on Natural Language Processing for Social Media},
  year={2018}
}</code></pre>

<h2>Slides</h2>

<a href="slides.pdf">SocialNLP 2018 @ ACL slides</a>

</body>

</html>