mdast utility that turns markdown into a syntax tree
- What is this?
- When should I use this?
- Install
- Use
- API
- List of extensions
- Syntax
- Syntax tree
- Security
- Related
- Types
- Contribute
This package is a utility that takes markdown input and turns it into a markdown abstract syntax tree.
This utility uses micromark
, which turns markdown into tokens, and then turns those tokens into nodes.
If you want to handle syntax trees manually, use this.
When you just want to turn markdown into HTML, use micromark
instead.
For an easier time processing content, use the remark ecosystem instead.
This package is ESM only.
In Node.js (version 18+) with yarn:
yarn add @flex-development/mdast-util-from-markdown
See Git - Protocols | Yarn for details regarding installing from Git.
In Deno with esm.sh
:
import { fromMarkdown } from 'https://esm.sh/@flex-development/mdast-util-from-markdown'
In browsers with esm.sh
:
<script type="module">
import { fromMarkdown } from 'https://esm.sh/@flex-development/mdast-util-from-markdown'
</script>
Say we have the following markdown file example.md
:
## Hello, *World*!
…and our module example.mjs
looks as follows:
import { fromMarkdown } from '@flex-development/mdast-util-from-markdown'
import { inspect } from '@flex-development/unist-util-inspect'
import { read } from 'to-vfile'
const file = await read('example.md')
const tree = fromMarkdown(String(file))
console.log(inspect(tree))
…now running node example.mjs
yields:
root[1] (1:1-2:1, 0-19)
└─0 heading[3] (1:1-1:19, 0-18)
│ depth: 2
├─0 text "Hello, " (1:4-1:11, 3-10)
├─1 emphasis[1] (1:11-1:18, 10-17)
│ └─0 text "World" (1:12-1:17, 11-16)
└─2 text "!" (1:18-1:19, 17-18)
Turn markdown into a syntax tree.
(value: Value | null | undefined, encoding?: Encoding | null | undefined, options?: Options) => Root
(value: Value | null | undefined, options?: Options | null | undefined) => Root
value
(Value
|null
|undefined
) — markdown to parseencoding
(Encoding
|null
|undefined
, optional) — character encoding for whenvalue
isUint8Array
- default:
'utf8'
- default:
options
(Options
|null
|undefined
, optional) — configuration
(Root
) mdast.
Create an mdast compiler.
👉 The compiler only understands complete buffering, not streaming.
options
(Options
|null
|undefined
, optional) — configuration
(Compiler
) mdast compiler.
(Handles
) Token types mapped to default token handlers.
👉 Default handlers are also exported by name. See
src/handles.ts
for more info.
mdast compiler context (TypeScript type).
buffer
((this: CompileContext) => undefined
) — capture some of the output dataconfig
(Config
) — configurationdata
(CompileData
) — info passed around; key/value storeenter
((this: CompileContext, node: Nodes, token: Token, onError?: OnEnterError) => undefined
) — enter a nodeexit
((this: CompileContext, token: Token, onError?: OnExitError) => undefined
) — exit a noderesume
((this: CompileContext) => string
) — stop capturing and access the output datasliceSerialize
(TokenizeContext['sliceSerialize']
) — get the string value of a tokenstack
(StackedNode[]
) — stack of nodestokenStack
(TokenTuple[]
) — stack of tokens
Interface of tracked data (TypeScript interface).
interface CompileData {/* see code */}
When developing extensions that use more data, augment CompileData
to register custom fields:
declare module 'mdast-util-from-markdown' {
interface CompileData {
mathFlowInside?: boolean | undefined
}
}
Turn micromark events into a syntax tree (TypeScript type).
events
(Event[]
) — list of events
(Root
) mdast.
Configuration (TypeScript type).
canContainEols
(string[]
) — token types where line endings are usedenter
(Handles
) — opening handlesexit
(Handles
) — closing handlestransforms
(Transform[]
) — tree transforms
Encodings supported by TextEncoder
(TypeScript type).
See micromark-util-types
for more info.
type Encoding =
| 'utf-8' // always supported in node
| 'utf-16le' // always supported in node
| 'utf-16be' // not supported when ICU is disabled
| (string & {}) // everything else (depends on browser, or full ICU data)
The start or end of a token amongst other events (TypeScript type).
See micromark-util-types
for more info.
type Event = ['enter' | 'exit', Token, TokenizeContext]
Change how tokens are turned into nodes (TypeScript type).
See Config
for more info.
type Extension = Partial<Config>
Temporary node (TypeScript type).
type Fragment = Omit<mdast.Parent, 'children' | 'type'> & {
children: mdast.PhrasingContent[]
type: 'fragment'
}
children
(mdast.PhrasingContent[]
) — list of childrentype
('fragment'
) — node type
Handle a token (TypeScript type).
this
(CompileContext
) — compiler contexttoken
(Token
) — token to handle
(undefined | void
) Nothing.
Token types mapped to handles (TypeScript type).
type Handles = Record<string, Handle>
Handle the case where the right
token is open, but is closed by the left
token, or because end of file was reached
(TypeScript type).
this
(Omit<CompileContext, 'sliceSerialize'>
) — compiler contextleft
(Token
|undefined
) — left tokenright
(Token
) — open token
(undefined
) Nothing.
Handle the case where the right
token is open, but is closed by exiting the left
token (TypeScript type).
this
(Omit<CompileContext, 'sliceSerialize'>
) — compiler contextleft
(Token
) — left tokenright
(Token
) — open token
(undefined
) Nothing.
Configuration options (TypeScript type).
extensions?
(micromark.Extension[]
|null
|undefined
) — extensions for this utility to change how tokens are turned into nodesfrom?
(StartPoint
|null
|undefined
) — point before first character in markdown value. node positions will be relative to this pointmdastExtensions?
((Extension | Extension[])[]
|null
|undefined
) — extensions for this utility to change how tokens are turned into nodes
A location in the source document and chunk (TypeScript type).
See micromark-util-types
for more info.
A node on the compiler context stack (TypeScript type).
type StackedNode = Fragment | mdast.Nodes
Point before first character in a markdown value (TypeScript type).
type StartPoint = Omit<Point, '_bufferIndex' | '_index'>
List containing an open token on the stack, and an optional error handler to use if the token isn't closed properly (TypeScript type).
type TokenTuple = [token: Token, handler: OnEnterError | undefined]
A span of chunks (TypeScript interface).
See micromark-util-types
for more info.
A context object that helps with tokenizing markdown constructs (TypeScript interface).
See micromark-util-types
for more info.
Extra transform, to change the AST afterwards (TypeScript type).
tree
(Root
) — tree to transform
(Root
| null
| undefined
| void
) New tree or nothing (in which case the current tree is used).
Contents of a file.
See micromark-util-types
for more info.
type Value = Uint8Array | string
mdast-util-directive
— directivesmdast-util-frontmatter
— frontmatter (YAML, TOML, more)mdast-util-gfm
— GFMmdast-util-gfm-autolink-literal
— GFM autolink literalsmdast-util-gfm-footnote
— GFM footnotesmdast-util-gfm-strikethrough
— GFM strikethroughmdast-util-gfm-table
— GFM tablesmdast-util-gfm-task-list-item
— GFM task list itemssyntax-tree/mdast-util-math
— mathsyntax-tree/mdast-util-mdx
— MDXsyntax-tree/mdast-util-mdx-expression
— MDX expressionssyntax-tree/mdast-util-mdx-jsx
— MDX JSXsyntax-tree/mdast-util-mdxjs-esm
— MDX ESM
Markdown is parsed according to CommonMark. Extensions can add support for other syntax. If you’re interested in extending markdown, more information is available in micromark’s readme.
The syntax tree is mdast.
This package is fully typed with TypeScript.
As markdown is sometimes used for HTML, and improper use of HTML can open you up to a cross-site scripting (XSS)
attack, use of mdast-util-from-markdown
can also be unsafe.
When going to HTML, use this utility in combination with hast-util-sanitize
to make the tree safe.
mdast-util-to-markdown
— serialize mdast as markdownmicromark
— parse markdownremark
— process markdown
See CONTRIBUTING.md
.