documentation/recommendations.neo4j-browser-guide

<style type="text/css" media="screen">
/*
.nodes-image {
	margin:-100;
}
*/	
@import url("//maxcdn.bootstrapcdn.com/font-awesome/4.4.0/css/font-awesome.min.css");

.imageblock .content img, .image img {max-width: 100%;}
.deck h3, .deck h4 {display: block !important;margin-bottom:8px;margin-top:5px;}
.listingblock {margin:8px;}
.pull-bottom {position:relative;bottom:1em;}
.admonitionblock td.icon [class^="fa icon-"]{font-size:2.5em;text-shadow:1px 1px 2px rgba(0,0,0,.5);cursor:default}
.admonitionblock td.icon .icon-note:before{content:"\f05a";color:#19407c}
.admonitionblock td.icon .icon-tip:before{content:"\f0eb";text-shadow:1px 1px 2px rgba(155,155,0,.8);color:#111}
.admonitionblock td.icon .icon-warning:before{content:"\f071";color:#bf6900}
.admonitionblock td.icon .icon-caution:before{content:"\f06d";color:#bf3400}
.admonitionblock td.icon .icon-important:before{content:"\f06a";color:#bf0000}
.admonitionblock.note.speaker { display:none; }
</style>
<style type="text/css" media="screen">
/* #editor.maximize-editor .CodeMirror-code { font-size:24px; line-height:26px; } */
</style>
<article class="guide" ng-controller="AdLibDataController">
  <carousel class="deck container-fluid">
    <!--slide class="row-fluid">
      <div class="col-sm-3">
        <h3>Recommendations</h3>
        <p class="lead">Information</p>
			<!dl>
				
				
			</dl>
		</div>
      <div class="col-sm-9">
        <figure>
          <img style="width:300px" src=""/>
        </figure>
      </div>
    </slide-->
    

   <h4>Recommendations</h4>
   <style type="text/css">
* {
  margin-bottom: 0.5em;
}
</style>


<slide class="row-fluid">
  <div class="col-sm-12">
    <h3>Personalized Product Recommendations with Neo4j</h3>
    <br/>
    <div>
      <div class="imageblock" style="float: right;">
<div class="content">
<img src="https://guides.neo4j.com/sandbox/recommendations/img/openmoviegraph.png" alt="openmoviegraph">
</div>
</div>


   <h4>Recommendations</h4>
   <div class="paragraph">
<p>Personalized product recommendations can increase conversions, improve sales rates and provide a better experice for users. In this Neo4j Browser guide, we&#8217;ll take a look at how you can generate graph-based real-time personalized product recommendations using a dataset of movies and movie ratings, but these techniques can be applied to many different types of products or content.</p>
</div>


   <h4>Graph-Based Recommendations</h4>
   <div class="paragraph">
<p>Generating <strong>personalized recommendations</strong> is one of the most common use cases for a graph database. Some of the main benefits of using graphs to generate recommendations include:</p>
</div>
<div class="olist arabic">
<ol class="arabic">
<li>
<p><strong>Performance</strong>. Index-free adjacency allows for <strong>calculating recommendations in real time</strong>, ensuring the recommendation is always relevant and reflecting up-to-date information.</p>
</li>
<li>
<p><strong>Data model</strong>. The labeled property graph model allows for easily combining datasets from multiple sources, allowing enterprises to <strong>unlock value from previously separated data silos.</strong></p>
</li>
</ol>
</div>
<div class="imageblock">
<div class="content">
<img src="https://guides.neo4j.com/sandbox/recommendations/img/title1.png" alt="title1" width="100%">
</div>
</div>
<div class="sidebarblock">
<div class="content">
<div class="paragraph">
<p>Data sources:</p>
</div>
<div class="ulist">
<ul>
<li>
<p><a href="http://www.omdbapi.com/">Open Movie Database</a></p>
</li>
<li>
<p><a href="https://grouplens.org/datasets/movielens/">MovieLens</a></p>
</li>
</ul>
</div>
</div>
</div>
	</div>
  </div>
</slide>


<slide class="row-fluid">
  <div class="col-sm-12">
    <h3>The Open Movie Graph Data Model</h3>
    <br/>
    <div>
      

   <h4>The Property Graph Model</h4>
   <div class="paragraph">
<p>The data model of graph databases is called the labeled property graph model.</p>
</div>
<div class="paragraph">
<p><strong>Nodes</strong>: The entities in the data.</p>
</div>
<div class="paragraph">
<p><strong>Labels</strong>: Each node can have one or more <strong>label</strong> that specifies the type of the node.</p>
</div>
<div class="paragraph">
<p><strong>Relationships</strong>: Connect two nodes. They have a single direction and type.</p>
</div>
<div class="paragraph">
<p><strong>Properties</strong>: Key-value pair properties can be stored on both nodes and relationships.</p>
</div>


   <h4>Eliminate Data Silos</h4>
   <div class="paragraph">
<p>In this use case, we are using graphs to combine data from multiple sources.</p>
</div>
<div class="paragraph">
<p><strong>Product Catalog</strong>: Data describing movies comes from the product catalog silo.</p>
</div>
<div class="paragraph">
<p><strong>User Purchases / Reviews</strong>: Data on user purchases and reviews comes from the user or transaction source.</p>
</div>
<div class="paragraph">
<p>By combining these two in the graph, we are able to query across datasets to generate personalized product recommendations.</p>
</div>
<div class="imageblock">
<div class="content">
<img src="https://guides.neo4j.com/sandbox/recommendations/img/datamodel.png" alt="datamodel" width="100%">
</div>
</div>


   <h4>Nodes</h4>
   <div class="paragraph">
<p><code>Movie</code>, <code>Actor</code>, <code>Director</code>, <code>User</code>, <code>Genre</code> are the labels used in this example.</p>
</div>


   <h4>Relationships</h4>
   <div class="paragraph">
<p><code>ACTED_IN</code>, <code>IN_GENRE</code>, <code>DIRECTED</code>, <code>RATED</code> are the relationships used in this example.</p>
</div>


   <h4>Properties</h4>
   <div class="paragraph">
<p><code>title</code>, <code>name</code>, <code>year</code>, <code>rating</code> are some of the properties used in this example.</p>
</div>
	</div>
  </div>
</slide>


<slide class="row-fluid">
  <div class="col-sm-12">
    <h3>Intro To Cypher</h3>
    <br/>
    <div>
      <div class="paragraph">
<p>In order to work with our labeled property graph, we need a query language for graphs.</p>
</div>


   <h4>Graph Patterns</h4>
   <div class="paragraph">
<p>Cypher is the query language for graphs and is centered around <strong>graph patterns</strong>. Graph patterns are expressed in Cypher using ASCII-art like syntax.</p>
</div>
<div class="paragraph">
<p><strong>Nodes</strong></p>
</div>
<div class="paragraph">
<p>Nodes are defined within parentheses <code>()</code>. Optionally, we can specify node label(s): <code>(:Movie)</code></p>
</div>
<div class="paragraph">
<p><strong>Relationships</strong></p>
</div>
<div class="paragraph">
<p>Relationships are defined within square brackets <code>[]</code>. Optionally we can specify type and direction:</p>
</div>
<div class="paragraph">
<p><code>(:Movie)<strong><-[:RATED]-</strong>(:User)</code></p>
</div>
<div class="paragraph">
<p><strong>Variables</strong></p>
</div>
<div class="paragraph">
<p>Graph elements can be bound to variables that can be referred to later in the query:</p>
</div>
<div class="paragraph">
<p><code>(<strong>m</strong>:Movie)<-[<strong>r</strong>:RATED]-(<strong>u</strong>:User)</code></p>
</div>


   <h4>Predicates</h4>
   <div class="paragraph">
<p>Filters can be applied to these graph patterns to limit the matching paths. Boolean logic operators, regular expressions and string comparison operators can be used here within the <code>WHERE</code> clause, e.g. <code>WHERE m.title CONTAINS 'Matrix'</code></p>
</div>


   <h4>Aggregations</h4>
   <div class="paragraph">
<p>There is an implicit group of all non-aggregated fields when using aggregation functions such as <code>count</code>.</p>
</div>
<div class="paragraph">
<p>Take the <a href="https://graphacademy.neo4j.com/courses/cypher-fundamentals/" target="_blank">Cypher Graphacademy courses</a> to learn more.
Use the <a href="https://neo4j.com/docs/cypher-refcard/current/?ref=browser-guide" target="_blank">Cypher Refcard</a> as a syntax reference.</p>
</div>


   <h4>Dissecting a Cypher Statement</h4>
   <div class="paragraph">
<p>Let&#8217;s look at a Cypher query that answers the question "How many reviews does each Matrix movie have?". Don&#8217;t worry if this seems complex, we&#8217;ll build up our understanding of Cypher as we move along.</p>
</div>
<div class="listingblock">
<div class="title">How many reviews does each Matrix movie have? Click on the block to put the query in the top-most window on the query editor. Hit the triangular <span class="icon"><i class="icon-play-circle"></i></span> button or press <kbd class="keyseq"><kbd>Ctrl</kbd>+<kbd>Enter</kbd></kbd> to run it and see the resulting visualization.</div>
<div class="content">
<pre mode="cypher"  class="highlight pre-scrollable programlisting cm-s-neo code runnable standalone-example ng-binding" data-lang="cypher" lang="cypher"><!--code class="cypher language-cypher"-->MATCH (m:Movie)&lt;-[:RATED]-(u:User)
WHERE m.title CONTAINS 'Matrix'
WITH m, count(*) AS reviews
RETURN m.title AS movie, reviews
ORDER BY reviews DESC LIMIT 5;<!--/code--></pre>
</div>
</div>
<table class="table tableblock frame-all grid-all" style="width: 100%;">
<colgroup>
<col style="width: 16.6666%;">
<col style="width: 50%;">
<col style="width: 33.3334%;">
</colgroup>
<tbody>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">find</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock"><code>MATCH (m:Movie)&lt;-[:RATED]-(u:User)</code></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Search for an existing graph pattern</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">filter</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock"><code>WHERE m.title CONTAINS "Matrix"</code></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Filter matching paths to only those matching a predicate</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">aggregate</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock"><code>WITH m, count(*) AS reviews</code></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Count number of paths matched for each movie</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">return</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock"><code>RETURN m.title as movie, reviews</code></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Specify columns to be returned by the statement</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">order</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock"><code>ORDER BY reviews DESC</code></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Order by number of reviews, in descending order</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">limit</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock"><code>LIMIT 5;</code></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Only return first five records</p></td>
</tr>
</tbody>
</table>
	</div>
  </div>
</slide>


<slide class="row-fluid">
  <div class="col-sm-12">
    <h3>Personalized Recommendations</h3>
    <br/>
    <div>
      <div class="paragraph">
<p>Now let&#8217;s start generating some recommendations. There are two basic approaches to recommendation algorithms.</p>
</div>


   <h4>Content-Based Filtering</h4>
   <div class="paragraph">
<p>Recommend items that are similar to those that a user is viewing, rated highly or purchased previously.</p>
</div>
<div class="imageblock">
<div class="content">
<img src="https://guides.neo4j.com/sandbox/recommendations/img/content1.png" alt="content1">
</div>
</div>
<div class="listingblock">
<div class="title">"Items similar to the item you&#8217;re looking at now"</div>
<div class="content">
<pre mode="cypher"  class="highlight pre-scrollable programlisting cm-s-neo code runnable standalone-example ng-binding" data-lang="cypher" lang="cypher"><!--code class="cypher language-cypher"-->MATCH p=(m:Movie {title: 'Net, The'})
       -[:ACTED_IN|IN_GENRE|DIRECTED*2]-()
RETURN p LIMIT 25<!--/code--></pre>
</div>
</div>


   <h4>Collaborative Filtering</h4>
   <div class="paragraph">
<p>Use the preferences, ratings and actions of other users in the network to find items to recommend.</p>
</div>
<div class="imageblock">
<div class="content">
<img src="https://guides.neo4j.com/sandbox/recommendations/img/cf1.png" alt="cf1">
</div>
</div>
<div class="listingblock">
<div class="title">"Users who got this item, also got that other item."</div>
<div class="content">
<pre mode="cypher"  class="highlight pre-scrollable programlisting cm-s-neo code runnable standalone-example ng-binding" data-lang="cypher" lang="cypher"><!--code class="cypher language-cypher"-->MATCH (m:Movie {title: 'Crimson Tide'})<-[:RATED]-
      (u:User)-[:RATED]->(rec:Movie)
WITH rec, COUNT(*) AS usersWhoAlsoWatched
ORDER BY usersWhoAlsoWatched DESC LIMIT 25
RETURN rec.title AS recommendation, usersWhoAlsoWatched<!--/code--></pre>
</div>
</div>
	</div>
  </div>
</slide>


<slide class="row-fluid">
  <div class="col-sm-12">
    <h3>Content-Based Filtering</h3>
    <br/>
    <div>
      <div class="paragraph">
<p>The goal of content-based filtering is to find similar items, using attributes (or traits) of the item. Using our movie data, one way we could define similarlity is movies that have common genres.</p>
</div>
<div class="imageblock">
<div class="content">
<img src="https://guides.neo4j.com/sandbox/recommendations/img/genres.png" alt="genres" width="100%">
</div>
</div>


   <h4>Similarity Based on Common Genres</h4>
   <div class="listingblock">
<div class="title">Find movies most similar to Inception based on shared genres</div>
<div class="content">
<pre mode="cypher"  class="highlight pre-scrollable programlisting cm-s-neo code runnable standalone-example ng-binding" data-lang="cypher" lang="cypher"><!--code class="cypher language-cypher"-->// Find similar movies by common genres
MATCH (m:Movie)-[:IN_GENRE]-&gt;(g:Genre)
              &lt;-[:IN_GENRE]-(rec:Movie)
WHERE m.title = 'Inception'
WITH rec, collect(g.name) AS genres, count(*) AS commonGenres
RETURN rec.title, genres, commonGenres
ORDER BY commonGenres DESC LIMIT 10;<!--/code--></pre>
</div>
</div>


   <h4>Personalized Recommendations Based on Genres</h4>
   <div class="paragraph">
<p>If we know what movies a user has watched, we can use this information to recommend similar movies:</p>
</div>
<div class="listingblock">
<div class="title">Recommend movies similar to those the user has already watched</div>
<div class="content">
<pre mode="cypher"  class="highlight pre-scrollable programlisting cm-s-neo code runnable standalone-example ng-binding" data-lang="cypher" lang="cypher"><!--code class="cypher language-cypher"-->// Content recommendation by overlapping genres
MATCH (u:User {name: 'Angelica Rodriguez'})-[r:RATED]-&gt;(m:Movie),
      (m)-[:IN_GENRE]-&gt;(g:Genre)&lt;-[:IN_GENRE]-(rec:Movie)
WHERE NOT EXISTS{ (u)-[:RATED]-&gt;(rec) }
WITH rec, g.name as genre, count(*) AS count
WITH rec, collect([genre, count]) AS scoreComponents
RETURN rec.title AS recommendation, rec.year AS year, scoreComponents,
       reduce(s=0,x in scoreComponents | s+x[1]) AS score
ORDER BY score DESC LIMIT 10<!--/code--></pre>
</div>
</div>


   <h4>Weighted Content Algorithm</h4>
   <div class="paragraph">
<p>Of course there are many more traits in addition to just genre that we can consider to compute similarity, such as actors and directors.
Let&#8217;s use a weighted sum to score the recommendations based on the number of actors (3x), genres (5x) and directors (4x) they have in common to boost the score:</p>
</div>
<div class="listingblock">
<div class="title">Compute a weighted sum based on the number and types of overlapping traits</div>
<div class="content">
<pre mode="cypher"  class="highlight pre-scrollable programlisting cm-s-neo code runnable standalone-example ng-binding" data-lang="cypher" lang="cypher"><!--code class="cypher language-cypher"-->// Find similar movies by common genres
MATCH (m:Movie) WHERE m.title = 'Wizard of Oz, The'
MATCH (m)-[:IN_GENRE]->(g:Genre)<-[:IN_GENRE]-(rec:Movie)

WITH m, rec, count(*) AS gs

OPTIONAL MATCH (m)<-[:ACTED_IN]-(a)-[:ACTED_IN]->(rec)
WITH m, rec, gs, count(a) AS as

OPTIONAL MATCH (m)<-[:DIRECTED]-(d)-[:DIRECTED]->(rec)
WITH m, rec, gs, as, count(d) AS ds

RETURN rec.title AS recommendation,
       (5*gs)+(3*as)+(4*ds) AS score
ORDER BY score DESC LIMIT 25<!--/code--></pre>
</div>
</div>
	</div>
  </div>
</slide>


<slide class="row-fluid">
  <div class="col-sm-12">
    <h3>Content-Based Similarity Metrics</h3>
    <br/>
    <div>
      <div class="paragraph">
<p>So far we&#8217;ve used the number of common traits as a way to score the relevance of our recommendations.
Let&#8217;s now consider a more robust way to quantify similarity, using a similarity metric.
Similarity metrics are an important component used in generating personalized recommendations that allow us to quantify how similar two items (or as we&#8217;ll see later, how similar two users preferences) are.</p>
</div>


   <h4>Jaccard Index</h4>
   <div class="paragraph">
<p>The Jaccard index is a number between 0 and 1 that indicates how similar two sets are.
The Jaccard index of two identical sets is 1.
If two sets do not have a common element, then the Jaccard index is 0.
The Jaccard is calculated by dividing the size of the intersection of two sets by the union of the two sets.</p>
</div>
<div class="paragraph">
<p>We can calculate the Jaccard index for sets of movie genres to determine how similar two movies are.</p>
</div>
<div class="listingblock">
<div class="title">What movies are most similar to Inception based on Jaccard similarity of genres?</div>
<div class="content">
<pre mode="cypher"  class="highlight pre-scrollable programlisting cm-s-neo code runnable standalone-example ng-binding" data-lang="cypher" lang="cypher"><!--code class="cypher language-cypher"-->MATCH (m:Movie {title:'Inception'})-[:IN_GENRE]-&gt;
      (g:Genre)&lt;-[:IN_GENRE]-(other:Movie)
WITH m, other, count(g) AS intersection, collect(g.name) as common

WITH m,other, intersection, common,
     [(m)-[:IN_GENRE]-&gt;(mg) | mg.name] AS set1,
     [(other)-[:IN_GENRE]-&gt;(og) | og.name] AS set2

WITH m,other,intersection, common, set1, set2,
     set1+[x IN set2 WHERE NOT x IN set1] AS union

RETURN m.title, other.title, common, set1,set2,
       ((1.0*intersection)/size(union)) AS jaccard

ORDER BY jaccard DESC LIMIT 25<!--/code--></pre>
</div>
</div>
<div class="paragraph">
<p>We can apply this same approach to all "traits" of the movie (genre, actors, directors, etc.):</p>
</div>
<div class="listingblock">
<div class="content">
<pre mode="cypher"  class="highlight pre-scrollable programlisting cm-s-neo code runnable standalone-example ng-binding" data-lang="cypher" lang="cypher"><!--code class="cypher language-cypher"-->MATCH (m:Movie {title: 'Inception'})-[:IN_GENRE|ACTED_IN|DIRECTED]-
                   (t)<-[:IN_GENRE|ACTED_IN|DIRECTED]-(other:Movie)
WITH m, other, count(t) AS intersection, collect(t.name) AS common,
     [(m)-[:IN_GENRE|ACTED_IN|DIRECTED]-(mt) | mt.name] AS set1,
     [(other)-[:IN_GENRE|ACTED_IN|DIRECTED]-(ot) | ot.name] AS set2

WITH m,other,intersection, common, set1, set2,
     set1 + [x IN set2 WHERE NOT x IN set1] AS union

RETURN m.title, other.title, common, set1,set2,
       ((1.0*intersection)/size(union)) AS jaccard
ORDER BY jaccard DESC LIMIT 25<!--/code--></pre>
</div>
</div>
	</div>
  </div>
</slide>


<slide class="row-fluid">
  <div class="col-sm-12">
    <h3>Collaborative Filtering – Leveraging Movie Ratings</h3>
    <br/>
    <div>
      <div class="imageblock">
<div class="content">
<img src="https://guides.neo4j.com/sandbox/recommendations/img/cf2.png" alt="cf2" width="100%">
</div>
</div>
<div class="paragraph">
<p>Notice that we have user-movie ratings in our graph.
The collaborative filtering approach is going to make use of this information to find relevant recommendations.</p>
</div>
<div class="paragraph">
<p>Steps:</p>
</div>
<div class="olist arabic">
<ol class="arabic">
<li>
<p>Find similar users in the network (our peer group).</p>
</li>
<li>
<p>Assuming that similar users have similar preferences, what are the movies those similar users like?</p>
</li>
</ol>
</div>
<div class="listingblock">
<div class="title">Show all ratings by Misty Williams</div>
<div class="content">
<pre mode="cypher"  class="highlight pre-scrollable programlisting cm-s-neo code runnable standalone-example ng-binding" data-lang="cypher" lang="cypher"><!--code class="cypher language-cypher"-->// Show all ratings by Misty Williams
MATCH (u:User {name: 'Misty Williams'})
MATCH (u)-[r:RATED]-&gt;(m:Movie)
RETURN *
LIMIT 100;<!--/code--></pre>
</div>
</div>
<div class="listingblock">
<div class="title">Find Misty&#8217;s average rating</div>
<div class="content">
<pre mode="cypher"  class="highlight pre-scrollable programlisting cm-s-neo code runnable standalone-example ng-binding" data-lang="cypher" lang="cypher"><!--code class="cypher language-cypher"-->// Show average ratings by Misty Williams
MATCH (u:User {name: 'Misty Williams'})
MATCH (u)-[r:RATED]-&gt;(m:Movie)
RETURN avg(r.rating) AS average;<!--/code--></pre>
</div>
</div>
<div class="listingblock">
<div class="title">What are the movies that Misty liked more than average?</div>
<div class="content">
<pre mode="cypher"  class="highlight pre-scrollable programlisting cm-s-neo code runnable standalone-example ng-binding" data-lang="cypher" lang="cypher"><!--code class="cypher language-cypher"-->// What are the movies that Misty liked more than average?
MATCH (u:User {name: 'Misty Williams'})
MATCH (u)-[r:RATED]-&gt;(m:Movie)
WITH u, avg(r.rating) AS average
MATCH (u)-[r:RATED]-&gt;(m:Movie)
WHERE r.rating &gt; average
RETURN *
LIMIT 100;<!--/code--></pre>
</div>
</div>
	</div>
  </div>
</slide>


<slide class="row-fluid">
  <div class="col-sm-12">
    <h3>Collaborative Filtering – The Wisdom of Crowds</h3>
    <br/>
    <div>
      

   <h4>Simple Collaborative Filtering</h4>
   <div class="paragraph">
<p>Here we just use the fact that someone has rated a movie, not their actual rating to demonstrate the structure of finding the peers.
Then we look at what else the peers rated, that the user has not rated themselves yet.</p>
</div>
<div class="listingblock">
<div class="content">
<pre mode="cypher"  class="highlight pre-scrollable programlisting cm-s-neo code runnable standalone-example ng-binding" data-lang="cypher" lang="cypher"><!--code class="cypher language-cypher"-->MATCH (u:User {name: 'Cynthia Freeman'})-[:RATED]->
      (:Movie)<-[:RATED]-(peer:User)
MATCH (peer)-[:RATED]->(rec:Movie)
WHERE NOT EXISTS { (u)-[:RATED]->(rec) }
RETURN rec.title, rec.year, rec.plot
LIMIT 25<!--/code--></pre>
</div>
</div>
<div class="paragraph">
<p>Of course this is just a simple appraoch, there are many problems with this query, such as not normalizing based on popularity or not taking ratings into consideration.
We&#8217;ll do that next, looking at movies being rated similarly, and then picking highly rated movies and using their rating and frequency to sort the results.</p>
</div>
<div class="listingblock">
<div class="content">
<pre mode="cypher"  class="highlight pre-scrollable programlisting cm-s-neo code runnable standalone-example ng-binding" data-lang="cypher" lang="cypher"><!--code class="cypher language-cypher"-->MATCH (u:User {name: 'Cynthia Freeman'})-[r1:RATED]->
      (:Movie)<-[r2:RATED]-(peer:User)
WHERE abs(r1.rating-r2.rating) < 2 // similarly rated
WITH distinct u, peer
MATCH (peer)-[r3:RATED]->(rec:Movie)
WHERE r3.rating > 3
  AND NOT EXISTS { (u)-[:RATED]->(rec) }
WITH rec, count(*) as freq, avg(r3.rating) as rating
RETURN rec.title, rec.year, rating, freq, rec.plot
ORDER BY rating DESC, freq DESC
LIMIT 25<!--/code--></pre>
</div>
</div>
<div class="paragraph">
<p>In the next section, we will see how we can improve this approach using the <strong>kNN method</strong>.</p>
</div>


   <h4>Only Consider Genres Liked by the User</h4>
   <div class="paragraph">
<p>Many recommender systems are a blend of collaborative filtering and content-based approaches:</p>
</div>
<div class="listingblock">
<div class="title">For a particular user, what genres have a higher-than-average rating? Use this to score similar movies</div>
<div class="content">
<pre mode="cypher"  class="highlight pre-scrollable programlisting cm-s-neo code runnable standalone-example ng-binding" data-lang="cypher" lang="cypher"><!--code class="cypher language-cypher"-->// compute mean rating
MATCH (u:User {name: 'Andrew Freeman'})-[r:RATED]->(m:Movie)
WITH u, avg(r.rating) AS mean

// find genres with higher than average rating and their number of rated movies
MATCH (u)-[r:RATED]->(m:Movie)
       -[:IN_GENRE]->(g:Genre)
WHERE r.rating > mean

WITH u, g, count(*) AS score

// find movies in those genres, that have not been watched yet
MATCH (g)<-[:IN_GENRE]-(rec:Movie)
WHERE NOT EXISTS { (u)-[:RATED]->(rec) }

// order by sum of scores
RETURN rec.title AS recommendation, rec.year AS year,
       sum(score) AS sscore,
       collect(DISTINCT g.name) AS genres
ORDER BY sscore DESC LIMIT 10<!--/code--></pre>
</div>
</div>
	</div>
  </div>
</slide>


<slide class="row-fluid">
  <div class="col-sm-12">
    <h3>Collaborative Filtering – Similarity Metrics</h3>
    <br/>
    <div>
      <div class="paragraph">
<p>We use similarity metrics to quantify how similar two users or two items are.
We&#8217;ve already seen Jaccard similarity used in the context of content-based filtering.
Now, we&#8217;ll see how similarity metrics are used with collaborative filtering.</p>
</div>


   <h4>Cosine Distance</h4>
   <div class="paragraph">
<p>Jaccard similarity was useful for comparing movies and is essentially comparing two sets (groups of genres, actors, directors, etc.).
However, with movie ratings each relationship has a <strong>weight</strong> that we can consider as well.</p>
</div>


   <h4>Cosine Similarity</h4>
   <div class="imageblock">
<div class="content">
<img src="https://guides.neo4j.com/sandbox/recommendations/img/cosine.png" alt="cosine" width="400px">
</div>
</div>
<div class="paragraph">
<p>The cosine similarity of two users will tell us how similar two users' preferences for movies are.
Users with a high cosine similarity will have similar preferences.</p>
</div>
<div class="listingblock">
<div class="title">Find the users with the most similar preferences to Cynthia Freeman, according to cosine similarity</div>
<div class="content">
<pre mode="cypher"  class="highlight pre-scrollable programlisting cm-s-neo code runnable standalone-example ng-binding" data-lang="cypher" lang="cypher"><!--code class="cypher language-cypher"-->// Most similar users using Cosine similarity
MATCH (p1:User {name: "Cynthia Freeman"})-[x:RATED]->
      (m:Movie)<-[y:RATED]-(p2:User)
WITH p1, p2, count(m) AS numbermovies,
     sum(x.rating * y.rating) AS xyDotProduct,
     collect(x.rating) as xRatings, collect(y.rating) as yRatings
WHERE numbermovies > 10
WITH p1, p2, xyDotProduct,
sqrt(reduce(xDot = 0.0, a IN xRatings | xDot + a^2)) AS xLength,
sqrt(reduce(yDot = 0.0, b IN yRatings | yDot + b^2)) AS yLength
RETURN p1.name, p2.name, xyDotProduct / (xLength * yLength) AS sim
ORDER BY sim DESC
LIMIT 100;<!--/code--></pre>
</div>
</div>
<div class="paragraph">
<p>We can also compute this measure using the <a href="https://neo4j.com/docs/graph-data-science/current/alpha-algorithms/cosine/" target="_blank">Cosine Similarity algorithm</a> in the Neo4j Graph Data Science Library.</p>
</div>
<div class="listingblock">
<div class="title">Find the users with the most similar preferences to Cynthia Freeman, according to cosine similarity function</div>
<div class="content">
<pre mode="cypher"  class="highlight pre-scrollable programlisting cm-s-neo code runnable standalone-example ng-binding" data-lang="cypher" lang="cypher"><!--code class="cypher language-cypher"-->MATCH (p1:User {name: 'Cynthia Freeman'})-[x:RATED]-&gt;(movie)&lt;-[x2:RATED]-(p2:User)
WHERE p2 &lt;&gt; p1
WITH p1, p2, collect(x.rating) AS p1Ratings, collect(x2.rating) AS p2Ratings
WHERE size(p1Ratings) &gt; 10
RETURN p1.name AS from,
       p2.name AS to,
       gds.similarity.cosine(p1Ratings, p2Ratings) AS similarity
ORDER BY similarity DESC<!--/code--></pre>
</div>
</div>
	</div>
  </div>
</slide>


<slide class="row-fluid">
  <div class="col-sm-12">
    <h3>Collaborative Filtering – Similarity Metrics</h3>
    <br/>
    <div>
      

   <h4>Pearson Similarity</h4>
   <div class="paragraph">
<p>Pearson similarity, or Pearson correlation, is another similarity metric we can use.
This is particularly well-suited for product recommendations because it takes into account the fact that different users will have different <strong>mean ratings</strong>: on average some users will tend to give higher ratings than others.
Since Pearson similarity considers differences about the mean, this metric will account for these discrepancies.</p>
</div>
<div class="imageblock">
<div class="content">
<img src="https://guides.neo4j.com/sandbox/recommendations/img/pearson.png" alt="pearson" width="400px">
</div>
</div>
<div class="listingblock">
<div class="title">Find users most similar to Cynthia Freeman, according to Pearson similarity</div>
<div class="content">
<pre mode="cypher"  class="highlight pre-scrollable programlisting cm-s-neo code runnable standalone-example ng-binding" data-lang="cypher" lang="cypher"><!--code class="cypher language-cypher"-->MATCH (u1:User {name:"Cynthia Freeman"})-[r:RATED]-&gt;(m:Movie)
WITH u1, avg(r.rating) AS u1_mean

MATCH (u1)-[r1:RATED]-&gt;(m:Movie)&lt;-[r2:RATED]-(u2)
WITH u1, u1_mean, u2, collect({r1: r1, r2: r2}) AS ratings
WHERE size(ratings) &gt; 10

MATCH (u2)-[r:RATED]-&gt;(m:Movie)
WITH u1, u1_mean, u2, avg(r.rating) AS u2_mean, ratings

UNWIND ratings AS r

WITH sum( (r.r1.rating-u1_mean) * (r.r2.rating-u2_mean) ) AS nom,
     sqrt( sum( (r.r1.rating - u1_mean)^2) * sum( (r.r2.rating - u2_mean) ^2)) AS denom,
     u1, u2 WHERE denom &lt;&gt; 0

RETURN u1.name, u2.name, nom/denom AS pearson
ORDER BY pearson DESC LIMIT 100<!--/code--></pre>
</div>
</div>
<div class="paragraph">
<p>We can also compute this measure using the <a href="https://neo4j.com/docs/graph-data-science/current/alpha-algorithms/pearson/" target="_blank">Pearson Similarity algorithm</a> in the Neo4j Graph Data Science Library.</p>
</div>
<div class="listingblock">
<div class="title">Find users most similar to Cynthia Freeman, according to the Pearson similarity function</div>
<div class="content">
<pre mode="cypher"  class="highlight pre-scrollable programlisting cm-s-neo code runnable standalone-example ng-binding" data-lang="cypher" lang="cypher"><!--code class="cypher language-cypher"-->MATCH (p1:User {name: 'Cynthia Freeman'})-[x:RATED]-&gt;(movie)&lt;-[x2:RATED]-(p2:User)
WHERE p2 &lt;&gt; p1
WITH p1, p2, collect(x.rating) AS p1Ratings, collect(x2.rating) AS p2Ratings
WHERE size(p1Ratings) &gt; 10
RETURN p1.name AS from,
       p2.name AS to,
       gds.similarity.pearson(p1Ratings, p2Ratings) AS similarity
ORDER BY similarity DESC<!--/code--></pre>
</div>
</div>
	</div>
  </div>
</slide>


<slide class="row-fluid">
  <div class="col-sm-12">
    <h3>Collaborative Filtering – Neighborhood-Based Recommendations</h3>
    <br/>
    <div>
      

   <h4>kNN – K-Nearest Neighbors</h4>
   <div class="paragraph">
<p>Now that we have a method for finding similar users based on preferences, the next step is to allow each of the <strong>k</strong> most similar users to vote for what items should be recommended.</p>
</div>
<div class="paragraph">
<p>Essentially:</p>
</div>
<div class="paragraph">
<p>"Who are the 10 users with tastes in movies most similar to mine? What movies have they rated highly that I haven&#8217;t seen yet?"</p>
</div>
<div class="listingblock">
<div class="title">kNN movie recommendation using Pearson similarity</div>
<div class="content">
<pre mode="cypher"  class="highlight pre-scrollable programlisting cm-s-neo code runnable standalone-example ng-binding" data-lang="cypher" lang="cypher"><!--code class="cypher language-cypher"-->MATCH (u1:User {name:"Cynthia Freeman"})-[r:RATED]->(m:Movie)
WITH u1, avg(r.rating) AS u1_mean

MATCH (u1)-[r1:RATED]->(m:Movie)<-[r2:RATED]-(u2)
WITH u1, u1_mean, u2, COLLECT({r1: r1, r2: r2}) AS ratings WHERE size(ratings) > 10

MATCH (u2)-[r:RATED]->(m:Movie)
WITH u1, u1_mean, u2, avg(r.rating) AS u2_mean, ratings

UNWIND ratings AS r

WITH sum( (r.r1.rating-u1_mean) * (r.r2.rating-u2_mean) ) AS nom,
     sqrt( sum( (r.r1.rating - u1_mean)^2) * sum( (r.r2.rating - u2_mean) ^2)) AS denom,
     u1, u2 WHERE denom <> 0

WITH u1, u2, nom/denom AS pearson
ORDER BY pearson DESC LIMIT 10

MATCH (u2)-[r:RATED]->(m:Movie) WHERE NOT EXISTS( (u1)-[:RATED]->(m) )

RETURN m.title, SUM( pearson * r.rating) AS score
ORDER BY score DESC LIMIT 25<!--/code--></pre>
</div>
</div>
	</div>
  </div>
</slide>


<slide class="row-fluid">
  <div class="col-sm-12">
    <h3>Further Work</h3>
    <br/>
    <div>
      

   <h4>Resources</h4>
   <div class="ulist">
<ul>
<li>
<p>Web <a href="https://neo4j.com/docs/cypher-refcard/current/?ref=browser-guide" target="_blank">Cypher Refcard</a></p>
</li>
<li>
<p>Web <a href="https://neo4j.com/docs/?ref=browser-guide" target="_blank">Neo4j Documentation</a></p>
</li>
<li>
<p>Blog Post <a href="https://neo4j.com/blog/collaborative-filtering-creating-teams/?ref=browser-guide" target="_blank">Collaborative Filtering: Creating the Best Teams Ever</a></p>
</li>
<li>
<p>Video <a href="https://www.youtube.com/watch?v=60E2WV4iwIg" target="_blank">Data Science and Recommendations</a></p>
</li>
<li>
<p>Web <a href="https://neo4j.com/use-cases/real-time-recommendation-engine/?ref=browser-guide" target="_blank">Use-Case: Real-Time Recommendation Engines</a></p>
</li>
<li>
<p>Article: <a href="https://neo4j.com/developer-blog/exploring-practical-recommendation-systems-in-neo4j/" target="_blank">Exploring Practical Recommendation Systems In Neo4j</a></p>
</li>
<li>
<p>Book (free download) <a href="https://neo4j.com/graph-data-science-for-dummies/" target="_blank">Graph Data Science For Dummies</a></p>
</li>
</ul>
</div>


   <h4>Exercises</h4>
   <div class="paragraph">
<p>Extend these queries:</p>
</div>
<div class="dlist">
<dl>
<dt class="hdlist1">Temporal component</dt>
<dd>
<p>Preferences change over time, use the rating timestamp to consider how more recent ratings might be used to find more relevant recommendations.</p>
</dd>
<dt class="hdlist1">Keyword extraction</dt>
<dd>
<p>Enhance the traits available using the plot description.<br>
How would you model extracted keywords for movies?</p>
</dd>
<dt class="hdlist1">Image recognition using posters</dt>
<dd>
<p>There are several libraries and APIs that offer image recognition and tagging.<br>
Since we have movie poster images for each movie, how could we use these to enhance our recomendations?</p>
</dd>
</dl>
</div>
	</div>
  </div>
</slide>
  </carousel>
</article>