GitHub - mikejholly/pillage: Extract content and metadata from a web page.

Pillage is a super awesome Node.js library for parsing webpages. It uses a baller algorithm^✝ to identify the content region of a webpage with accuracy that's really, really, really, really ... fun. Once we have the content region we can parse out text, images, videos and other media. We also threw in a lot of the easy stuff like OG tags for your convenience.

_{✝ It basically searches for every text node, then recursively climbs the parent tree, assigning a weighed "score" based on text length to each parent. The value rapidly drops off as we move up the tree. This is done for all text nodes so the weights accumulate to identify the most probable shared parent. Once we have that wrapper we can make assumptions and easily parse out body content.}

Install

npm install pillage

Usage

var pillage = require('pillage');

// Fetch a URL and process
pillage(url, function(err, result) {
  console.log(result);
});

// or, process HTML directly
var result = pillage(html);
console.log(result);

 // Here's the object structure that it will return
 return {
   title: extractTitle(html),
   description: extractDescription(html),
   text: extractText(html),
   images: extractImages(html),
   videos: extractVideos(html),
   twitterTags: extractTwitterTags(html),
   openGraphTags: extractOpenGraphTags(html),
   articleTags: extractArticleTags(html),
   oEmbed: extractOEmbed(html),
 };

License

MIT

Author

Mike Holly

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
src		src
test		test
.gitignore		.gitignore
.jshintrc		.jshintrc
README.md		README.md
TODO.md		TODO.md
index.js		index.js
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Install

Usage

License

Author

About

Releases

Packages

Contributors 2

Languages

mikejholly/pillage

Folders and files

Latest commit

History

Repository files navigation

Install

Usage

License

Author

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages