Skip to content

A supercharged version of paperless: scan, index and archive all your physical documents

License

Notifications You must be signed in to change notification settings

samotelf/paperless-ng

 
 

Repository files navigation

Build Status Documentation Status Docker Hub Pulls Coverage Status

Paperless-ng

Paperless is an application by Daniel Quinn and others that indexes your scanned documents and allows you to easily search for documents and store metadata alongside your documents.

Paperless-ng is a fork of the original project, adding a new interface and many other changes under the hood. For a detailed list of changes, see below.

This project is still in development and some things may not work as expected.

How it Works

Paperless does not control your scanner, it only helps you deal with what your scanner produces.

  1. Buy a document scanner that can write to a place on your network. If you need some inspiration, have a look at the scanner recommendations page.
  2. Set it up to "scan to FTP" or something similar. It should be able to push scanned images to a server without you having to do anything. Of course if your scanner doesn't know how to automatically upload the file somewhere, you can always do that manually. Paperless doesn't care how the documents get into its local consumption directory.
  3. Have the target server run the Paperless consumption script to OCR the file and index it into a local database.
  4. Use the web frontend to sift through the database and find what you want.
  5. Download the PDF you need/want via the web interface and do whatever you like with it. You can even print it and send it as if it's the original. In most cases, no one will care or notice.

Here's what you get:

Dashboard

Why Paperless-ng?

I wanted to make big changes to the project that will impact the way it is used by its users greatly. Among the users who currently use paperless in production there are probably many that don't want these changes right away. I also wanted to have more control over what goes into the code and what does not. Therefore, paperless-ng was created. NG stands for both Angular (the framework used for the Frontend) and next-gen. Publishing this project under a different name also avoids confusion between paperless and paperless-ng.

The gist of the changes is the following:

  • New front end. This will eventually be mobile friendly as well.
  • New full text search.
  • New email processing.
  • Machine learning powered document matching.
  • A task processor that processes documents in parallel and also tells you when something goes wrong.
  • Code cleanup in many, MANY areas. Some of the code was just overly complicated.
  • More tests, more stability.

If you want to see some screenshots of paperless-ng in action, some are available in the documentation.

For a complete list of changes, check out the changelog

Roadmap for 1.0

  • Test coverage at 90%.
  • Store archived documents with an embedded OCR text layer, while keeping originals available. Making good progress in the feature-ocrmypdf branch.
  • Fix whatever bugs I and you find

Roadmap for versions beyond 1.0

  • More search. The search backend is incredibly versatile and customizable. Searching is the most important feature of this project and thus, I want to implement things like:
    • Group and limit search results by correspondent, show “more from this” links in the results.
    • Ability to search for “Similar documents” in the search results
    • Provide corrections for mispelled queries
  • An interactive consumer that shows its progress for documents it processes on the web page.
    • With live updates ans websockets. This already works on a dev branch, but requires a lot of new dependencies, which I'm not particular happy about.
    • Notifications when a document was added with buttons to open the new document right away.
  • Arbitrary tag colors. Allow the selection of any color with a color picker.

On the chopping block.

  • GnuPG encrypion. Here's a note about encryption in paperless. The gist of it is that I don't see which attacks this implementation protects against. It gives a false sense of security to users who don't care about how it works.

Getting started

The recommended way to deploy paperless is docker-compose. Don't clone the repository, grab the latest release to get started instead. The dockerfiles archive contains just the docker files which will pull the image from docker hub. The source archive contains everything you need to build the docker image yourself (i.e. if you want to run on Raspberry Pi).

Read the documentation on how to get started.

Alternatively, you can install the dependencies and setup apache and a database server yourself. The documenation has information about the individual components of paperless that you need to take care of.

Migrating to paperless-ng

Read the section about migration in the documentation. Its also entirely possible to go back to paperless by reverting the database migrations.

Documentation

The documentation for Paperless-ng is available on ReadTheDocs.

Suggestions? Questions? Something not working?

Please open an issue and start a discussion about it!

Feel like helping out?

There's still lots of things to be done, just have a look at that issue log. If you feel like conctributing to the project, please do! Bug fixes and improvements to the front end (I just can't seem to get some of these CSS things right) are always welcome.

If you want to implement something big: Please start a discussion about that in the issues! Maybe I've already had something similar in mind and we can make it happen together. However, keep in mind that the general roadmap is to make the existing features stable and get them tested. See the roadmap above.

Affiliated Projects

Paperless has been around a while now, and people are starting to build stuff on top of it. If you're one of those people, we can add your project to this list:

Compatibility with Paperless-ng is unknown.

Important Note

Document scanners are typically used to scan sensitive documents. Things like your social insurance number, tax records, invoices, etc. Everything is stored in the clear without encryption by default (it needs to be searchable, so if someone has ideas on how to do that on encrypted data, I'm all ears). This means that Paperless should never be run on an untrusted host. Instead, I recommend that if you do want to use it, run it locally on a server in your own home.

About

A supercharged version of paperless: scan, index and archive all your physical documents

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 51.1%
  • PostScript 22.1%
  • TypeScript 17.0%
  • HTML 6.9%
  • Shell 1.4%
  • SCSS 0.8%
  • Other 0.7%