Revisiting the idea of DOCX partials #97

stadelmanma · 2018-03-12T19:25:15Z

@senny I threw this idea at you awhile back before I was familiar with how Sablon actually worked and how to manage the internals of a docx file. Ultimately you decided against it due to the complexity and the fact merging two word docs requires successfully transferring a lot of content.
However, I would like to circle back around to it because for my company's use case it offers a much less error prone method of generating the complex MS word reports we need and others might benefit from it as well.

Our Use Case/Need

Currently, our report generation system is a large number of ERB templates that build on each other to create the report structure section by section (in some cases a single section is complex enough to warrant more than one HTML partial). This results in 2-3 very long HTML text blobs that are stored in the context object and processed.

While it works, it is also a maintenance bear. If any of the HTML partials ends up creating invalid HTML syntax then an error is thrown that cannot be easily traced. Instead of being able to go back to a specific partial, I just know that a given context key made up of many partials is invalid. Thankfully because of version control and CI testing catching errors is relatively straight forward now as long as the diffs are small.

However, being able to use "partials" that are saved as DOCX files resolves this issue. Instead of having to convert HTML to MS Word I'm already working in MS Word and I can use simple context keys again. It also makes maintenance for our team easier since I could remove the files from our VCS and allow people to edit them as they would any regular MS Word document.

Implementation Overview

Importing part of another MS Word doc is nontrivial but given the new DOM features added in the latest releases it becomes much easier. In my mind we only need to successfully transfer a couple of components from document "B" into document "A" to support a reasonable feature set for our users.

All media items in use (images, etc.) in the transferred portion of document "B"
All content types inside document "B" but not in "A"
Anything specified inside a *.rels file (i.e. hyperlinks) in document "B"
- This would be handled on a file by file basis and only the rIds needed would get copied over
All footnotes and endnotes in use in document "B"
All lists in use (determined by their w:abstractNumId in document "B"

Styles would not get transferred over. I think any styles desired in the final document should get added directly to the "primary" template. This is similar with how we currently deal with styles since Sablon does not actually create any "default" styles on it's own. The same goes for themes and fonts, they need to already be present in the primary template.

My thought process is hitting all of the high points initially would be a good start (i.e. the 99% use case) and then I can add any minor things overlooked as users report them in.

@senny (and anyone else interested) let me know your thoughts when you have time to think it over.

The text was updated successfully, but these errors were encountered:

stadelmanma · 2019-06-07T15:38:50Z

Began work on this at https://github.com/stadelmanma/sablon/tree/support-docx-partials, the code is functional. The content class is a little long and complex for the subset of features I have chosen to support (what I think is around 90% use-case). I think this might be better off as a separate gem (i.e. sablon-partials) and I'll just port some of the architecture improvements over to the mainline.

This is especially true if people want to expand the supported range or if my initial implementation it too naive.

Current Features:

Lists are copied over correctly
Content that uses an "rId" value from the document.rels file is copied over
Footnotes and Endnotes are copied
Content in the media folder is copied over
Bookmarks are shifted to avoid duplicate IDs
Missing content types are added

Current Limitations:

Images are duplicated if the partial is used more than once
rIds that appear twice in the document will get duplicated in most cases
endnotes are duplicated

stadelmanma added new feature proposed labels Mar 12, 2018

stadelmanma mentioned this issue Dec 10, 2018

Add ability to generate one file for few contexts #123

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revisiting the idea of DOCX partials #97

Revisiting the idea of DOCX partials #97

stadelmanma commented Mar 12, 2018

stadelmanma commented Jun 7, 2019 •

edited

Loading

Revisiting the idea of DOCX partials #97

Revisiting the idea of DOCX partials #97

Comments

stadelmanma commented Mar 12, 2018

Our Use Case/Need

Implementation Overview

stadelmanma commented Jun 7, 2019 • edited Loading

stadelmanma commented Jun 7, 2019 •

edited

Loading