Skip to content

dmitriid/pegjs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

46 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

An implementation of PEG.js grammar for Erlang

This is a rather straightforward port/implementation of the grammar defined for PEG.js.

Current status

  • As far as I can tell, implements everything from the PEG.js grammar

  • Generates complete useable parsers

  • The project is bootstrapped (see priv/pegjs_parse.pegjs). Original grammar for Neotoma is also available in priv/pegjs_parse.peg

  • It's based on an earlier definition of the grammar (probably this) than the one that currently exists for PEG.js.

    Current-ish version of the grammar has been ported to priv/parser.pegjs, but causes the VM to quit with an out-of-memory exception on sufficiently large garmmars (including its own). See How to contribute section for more info

  • Implements support for @append extension (see, e.g. core-pegjs in the for-GET project)

Further work

  • Dialyze, create dialyzer-friendly parsers

How to use

> pegjs:file("extra/csv_pegjs.peg").
ok
> c("extra/csv_pegjs.erl").
{ok, csv_pegjs}
> csv_pegjs:parse("a,b,c").
[{<<"head">>,
  [{<<"head">>,[[[[],[],<<"a">>]]]},
   {<<"tail">>,
    [[[<<",">>],[[[[],[],<<"b">>]]]],
     [[<<",">>],[[[[],[],<<"c">>]]]]]}]},
 {<<"tail">>,[]}]

There are several options you can pass along to pegjs:file(File, Options::options()):

-type options() :: [option()].

%% options for pegjs

-type option()  :: {output, Dir::string() | binary()} %% where to put the generated file
                                                      %% Default: directory of the input file
                 | {module, string() | binary()}      %% to change the module name
                                                      %% Default: name of the input file
                 | pegjs_analyze:option().

%% options for pegjs_analyze

-type option()  :: {ignore_unused, boolean()}        %% ignore unused rules. Default: true
                 | {ignore_duplicates, boolean()}    %% ignore duplicate rules. Default: false
                 | {ignore_unparsed, boolean()}      %% ignore incomplete parses. Default: false
                 | {ignore_missing_rules, boolean()} %% Default: false
                 | {ignore_invalid_code, boolean()}  %% Default: false
                 | {parser, atom()}                  %% use a different module to parse grammars. 
                                                     %% Default: pegjs_parse
                 | {root, Dir::string() | binary()}. %% root directory for @append instructions. 
                                                     %% Default: undefined

How to contribute/develop

Suggestions and improvements are more than welcome!

Current grammar in priv/pegjs_parse.peg is created for Neotoma, so you need that to tweak pegjs.

pegjs_analyze module is inspired by neotoma_analyze from the 2.0-refactor branch of neotoma.

Non-generated parser combinators can be found in priv/pegjs.template.

Safe working parser is always available at src/pegjs_parse.erl.safe.

pegjs grammar

The current grammar from which the project is now bootstrapped lives in priv/pegjs_parser.pegjs. When you've tweaked it and you want to try your changes, generate a different module and tell pegjs to use your new module instead:

> pegjs:file("priv/pegjs_parse.pegjs", [{output, "src"}, {module, modified_parser}]).
ok
> c(modified_parser).
{ok, modified_parser)
> pegjs:file("extra/json.pegjs", [{parser, modified_parser}]).
ok
... etc. ...

Once you're satisfied with your changes, overwrite pegjs_parser (which is used by default):

> pegjs:file("priv/pegjs_parse.pegjs", [{output, "src"}]).
ok
> c(pegjs_parse).
{ok, pegjs_parse)
> pegjs:file("extra/json.pegjs").
ok
... etc. ...

Up-to-date grammar

A port of a current-ish version of the PEG.js grammar can be found in priv/parser.pegjs. src/pegjs.erl, src/pegjs.hrl and src/pegjs_analyze.erl have all been updated to work with this grammar (and will generate a parser for you. Note, however, that priv/pegjs.template doesn't contain code for the action combinator).

To generate a parser from this grammar:

> pegjs:file("priv/parser.pegjs", [{output, "src/"}]).
ok
> c("src/parser.erl").
{ok, parser}
> pegjs:file("extra/csv.pegjs", [{parser, parser}]).
ok
... etc ...

However, the parser causes the VM to fail with an out-of-memory exception for sufficiently large grammars (including parser.pegjs). YMMV. The culprit is the escape/1,2 function (see initializer section). I haven't figured out what to do about this yet.

Original neotoma

The original parser for pegjs was derived from a grammar defined for Neotoma. You can also start your work from there:

> neotoma:file("priv/pegjs_parse.peg", [{output, "src/"}]).
ok
> pegjs:file(.... etc ... )

However, the original grammar will get increasingly outdated as time goes on, so it's there for reference only.