Skip to content

Latest commit

 

History

History
219 lines (173 loc) · 6.77 KB

README.md

File metadata and controls

219 lines (173 loc) · 6.77 KB

lingo: literate programming with Go + Markdown

lingo is a simple tool for literate programming with Go and Markdown. lingo is heavily inspired by tango, a similar tool designed for literate programming with Rust and Markdown.

When run, lingo will extract Go source code from fenced code blocks in each Markdown file in the current directory. Markdown files must use the .md extension, and code will only be extracted from fenced code blocks with the language go. Each Markdown file some-file.md that contains Go code will be converted into a file some-file.go.

To author a program with lingo, simply write your program as fenced code blocks in Markdown files, then add a .go file in the same directory with a //go:generate lingo directive preceding its package name.

This file is the source for lingo itself; let's break it down!

Preamble

As usual, we start our program with a package clause followed by our import declarations. Because we're going to be working with Markdown, our only imports outside the standard library are from a fork of the Goldmark Markdown parser. We'll be using that package to parse Markdown into an AST that we'll then use as a basis for source code extraction.

package main

import (
	"bytes"
	"fmt"
	"log"
	"os"
	"path/filepath"
	"sort"

	"github.com/pgavlin/goldmark"
	"github.com/pgavlin/goldmark/ast"
	"github.com/pgavlin/goldmark/extension"
	goldmark_parser "github.com/pgavlin/goldmark/parser"
	"github.com/pgavlin/goldmark/text"
	"github.com/pgavlin/goldmark/util"
)

Source Position Mapping

Because we're essentially generating source code, we'd like the extracted source code to retain its original source positions. This allows downstream tools to reference positions in the Markdown rather than positions in the extracted code. Go gives us the ability to propagate this information through the use of line directives.

The only position information we need is the line number itself, as we'll be emitting directives of the form //line filename:line. Unfortunately, Goldmark does not track line information in its AST! It does, however, track the byte offset of each block of text, including the contents of code blocks. We can determine the line number of a code block ourselves by first building a byte offset to line number index from the Markdown source. This index is a simple list of integers, where each entry tracks E_i is the byte offset of the end of line i. With this structure, we can determine the number of the line that contains a particular offset o by searching for the smallest index i where E_i > o; the 1-indexed line number containing o is then i + 1.

type lineIndex []int

func (index lineIndex) lineNumber(offset int) int {
	i := sort.Search(len(index), func(i int) bool {
		return index[i] > offset
	})
	return i + 1
}

func indexLines(f []byte) lineIndex {
	var index lineIndex
	for offset, b := range f {
		if b == '\n' {
			index = append(index, offset)
		}
	}
	return index
}

Extracting Go Source from Markdown

With our line index implemented, converting each file is straightforward. First, we read in the source code and build our line index:

func convertFile(name string) error {
	contents, err := os.ReadFile(name)
	if err != nil {
		return err
	}

	index := indexLines(contents)

Next, we parse the Markdown:

	parser := goldmark.DefaultParser()
	parser.AddOptions(goldmark_parser.WithParagraphTransformers(
		util.Prioritized(extension.NewTableParagraphTransformer(), 200),
	))
	document := parser.Parse(text.NewReader(contents))

Then, we walk the parsed AST, looking for fenced code blocks with the language go:

	var source bytes.Buffer
	ast.Walk(document, func(n ast.Node, enter bool) (ast.WalkStatus, error) {
		code, ok := n.(*ast.FencedCodeBlock)
		if !ok || !enter || string(code.Language(contents)) != "go" {
			return ast.WalkContinue, nil
		}

		lines := code.Lines()
		if lines.Len() == 0 {
			return ast.WalkContinue, nil
		}

When we find a suitable code block, we determine its line number, then emit a line directive followed by the contents of the code block into our output:

		lineNumber := index.lineNumber(lines.At(0).Start)
		fmt.Fprintf(&source, "//line %v:%v\n", name, lineNumber)

		for i := 0; i < lines.Len(); i++ {
			line := lines.At(i)
			source.Write(line.Value(contents))
		}

		return ast.WalkContinue, nil
	})

Finally, we emit the collected source into an output file and return. If the walk did not extract any source code, we do not emit an output file.

	if source.Len() == 0 {
		return nil
	}
	return os.WriteFile(name[:len(name)-3]+".go", source.Bytes(), 0600)
}

The Entry Point

The only thing left to do now is to implement lingo's entry point. The entry point is responsible for finding the Markdown files the tool will convert and driving their conversion using convertFile.

lingo operates on Markdown files in the current directory, so we begin by fetching the path of the current directory and listing its contents:

func main() {
	wd, err := os.Getwd()
	if err != nil {
		log.Fatalf("could not read current directory: %v", err)
	}

	entries, err := os.ReadDir(wd)
	if err != nil {
		log.Fatalf("could not read current directory: %v", err)
	}

Then, we iterate the directory's contents and attempt to convert each .md file to a .go file.

	for _, entry := range entries {
		name := entry.Name()
		ext := filepath.Ext(name)
		if ext != ".md" {
			continue
		}
		if err = convertFile(name); err != nil {
			log.Fatalf("could not convert file '%v': %v", name, err)
		}
	}
}

And we're done!

Hacking on lingo

Hacking on lingo is a little bit different from working with a more traditional Go code base. In order to make changes to lingo itself, you'll need to edit this file, then run go generate. You should commit the changes to this file and the changes to README.go:

$ touch README.md
$ go generate
$ git add README.{md,go}

Building and Installing

To build or install lingo, just run go install from the root of the repository.

Testing

Before testing lingo, first build it using the instructions above. Once you've built lingo, you can run the tests by invoking go test. The test data for lingo lives in the directories under testdata. Each directory contains a single test, with the inputs in the directory itself and the expected outputs in the expected subdirectory.

Tests are driven by the code in main_test.go. Each test runs lingo in a particular test directory, then compares the contents of the files in the directory with the contents of the expected subdirectory.