A port of willdurand/EmailReplyParser (which is a port of github/email_reply_parser) to nodejs
willdurand's port of the Github library supports multiple spoken languages and has some enhanced parsing and therefore was used as a template for this library.
This package can be found on npm:
npm install node-email-reply-parser
Note: This package is ES6 compatible and does not transpile automatically.
var replyParser = require("node-email-reply-parser");
var email = replyParser(emailContent);
email
has two primary methods:
getFragments()
: returns all the fragments of the emailgetVisibleText()
: returns the text that is considered 'visible'
getVisibleText()
accepts an optional options object:
getVisibleText({aggressive: true})
Setting aggressive
to true will make the parser assume that any fragment which is not hidden, but which is both preceded and followed by a hidden fragment, should not be visible. This works around the issue of clients breaking quoted text into multiple lines (e.g. Gmail).
Using aggressive mode runs the risk of losing visible lines which are interspersed with quoted lines, but is useful when parsing e.g. emails from a 'reply by email' feature which contain a large block of quoted text.
A fragment has the following functions:
getContent()
: returns the content of the fragmentisSignature()
: returns whether or not the fragment is likely a signatureisQuoted()
: returns whether or not the fragment is likely a quoteisHidden()
: returns whether or not the text is considered 'hidden'isEmpty()
: returns whether or not the fragment has any content
Passing true
as the second argument to replyParser
will have the return value be just the visible text:
$visibleText = replyParser($emailContent, true);
The parser is not able to pick up some of the edge cases. They are outlined below.
Quoted headers aren't picked up if there's an extra line break:
On <date>, <author> wrote:
> blah
They also aren't picked up if the email client breaks it up into multiple lines, like gmail and it's 80 column automatic limit:
On <date>, <author>
wrote:
> blah
Lines starting with -
or _
sometimes mark the beginning of signatures:
Hello
--
Rick
Not everyone follows this convention:
Hello
Mr Rick Olson
Galactic President Superstar Mc Awesomeville
GitHub
**********************DISCLAIMER***********************************
* Note: blah blah blah *
**********************DISCLAIMER***********************************
Apparently, prefixing lines with >
isn't universal either:
Hello
--
Rick
________________________________________
From: Bob [[email protected]]
Sent: Monday, March 14, 2011 6:16 PM
To: Rick
Install the required dependencies:
npm install --dev
Run the tests:
npm test
Please see the CONTRIBUTING.md file.
- GitHub - Amazing Ruby gem for parsing emails
- William Durand [email protected] - Extensive PHP version of GitHub's library
node-email-reply-parser is released under the MIT Licenses. Please see the bundled LICENSE file for details.