Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revisiting the idea of DOCX partials #97

Open
stadelmanma opened this issue Mar 12, 2018 · 1 comment
Open

Revisiting the idea of DOCX partials #97

stadelmanma opened this issue Mar 12, 2018 · 1 comment

Comments

@stadelmanma
Copy link
Collaborator

@senny I threw this idea at you awhile back before I was familiar with how Sablon actually worked and how to manage the internals of a docx file. Ultimately you decided against it due to the complexity and the fact merging two word docs requires successfully transferring a lot of content.
However, I would like to circle back around to it because for my company's use case it offers a much less error prone method of generating the complex MS word reports we need and others might benefit from it as well.

Our Use Case/Need

Currently, our report generation system is a large number of ERB templates that build on each other to create the report structure section by section (in some cases a single section is complex enough to warrant more than one HTML partial). This results in 2-3 very long HTML text blobs that are stored in the context object and processed.

While it works, it is also a maintenance bear. If any of the HTML partials ends up creating invalid HTML syntax then an error is thrown that cannot be easily traced. Instead of being able to go back to a specific partial, I just know that a given context key made up of many partials is invalid. Thankfully because of version control and CI testing catching errors is relatively straight forward now as long as the diffs are small.

However, being able to use "partials" that are saved as DOCX files resolves this issue. Instead of having to convert HTML to MS Word I'm already working in MS Word and I can use simple context keys again. It also makes maintenance for our team easier since I could remove the files from our VCS and allow people to edit them as they would any regular MS Word document.

Implementation Overview

Importing part of another MS Word doc is nontrivial but given the new DOM features added in the latest releases it becomes much easier. In my mind we only need to successfully transfer a couple of components from document "B" into document "A" to support a reasonable feature set for our users.

  • All media items in use (images, etc.) in the transferred portion of document "B"
  • All content types inside document "B" but not in "A"
  • Anything specified inside a *.rels file (i.e. hyperlinks) in document "B"
    • This would be handled on a file by file basis and only the rIds needed would get copied over
  • All footnotes and endnotes in use in document "B"
  • All lists in use (determined by their w:abstractNumId in document "B"

Styles would not get transferred over. I think any styles desired in the final document should get added directly to the "primary" template. This is similar with how we currently deal with styles since Sablon does not actually create any "default" styles on it's own. The same goes for themes and fonts, they need to already be present in the primary template.

My thought process is hitting all of the high points initially would be a good start (i.e. the 99% use case) and then I can add any minor things overlooked as users report them in.

@senny (and anyone else interested) let me know your thoughts when you have time to think it over.

@stadelmanma
Copy link
Collaborator Author

stadelmanma commented Jun 7, 2019

Began work on this at https://github.com/stadelmanma/sablon/tree/support-docx-partials, the code is functional. The content class is a little long and complex for the subset of features I have chosen to support (what I think is around 90% use-case). I think this might be better off as a separate gem (i.e. sablon-partials) and I'll just port some of the architecture improvements over to the mainline.

This is especially true if people want to expand the supported range or if my initial implementation it too naive.

Current Features:

  • Lists are copied over correctly
  • Content that uses an "rId" value from the document.rels file is copied over
  • Footnotes and Endnotes are copied
  • Content in the media folder is copied over
  • Bookmarks are shifted to avoid duplicate IDs
  • Missing content types are added

Current Limitations:

  • Images are duplicated if the partial is used more than once
  • rIds that appear twice in the document will get duplicated in most cases
  • endnotes are duplicated

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant