Compare commits

..

2 Commits

Author SHA1 Message Date
Trenton Holmes
7d8721c53c (delete me) 2022-04-02 07:14:14 -07:00
Trenton Holmes
aa67c851e3 (delete me) 2022-04-02 07:12:56 -07:00
437 changed files with 57602 additions and 105426 deletions

View File

@@ -1,9 +0,0 @@
{
"qpdf": {
"version": "11.1.1"
},
"jbig2enc": {
"version": "0.29",
"git_tag": "0.29"
}
}

View File

@@ -27,14 +27,11 @@ indent_style = space
[*.md]
indent_style = space
[Pipfile.lock]
indent_style = space
# Tests don't get a line width restriction. It's still a good idea to follow
# the 79 character rule, but in the interests of clarity, tests often need to
# violate it.
[**/test_*.py]
max_line_length = off
[Dockerfile*]
[Dockerfile]
indent_style = space

View File

@@ -1,97 +0,0 @@
name: Bug report
description: Something is not working
title: "[BUG] Concise description of the issue"
labels: ["bug", "unconfirmed"]
body:
- type: markdown
attributes:
value: |
Have a question? 👉 [Start a new discussion](https://github.com/paperless-ngx/paperless-ngx/discussions/new) or [ask in chat](https://matrix.to/#/#paperless:adnidor.de).
Before opening an issue, please double check:
- [The troubleshooting documentation](https://paperless-ngx.readthedocs.io/en/latest/troubleshooting.html).
- [The installation instructions](https://paperless-ngx.readthedocs.io/en/latest/setup.html#installation).
- [Existing issues and discussions](https://github.com/paperless-ngx/paperless-ngx/search?q=&type=issues).
- Disable any customer container initialization scripts, if using any
If you encounter issues while installing or configuring Paperless-ngx, please post in the ["Support" section of the discussions](https://github.com/paperless-ngx/paperless-ngx/discussions/new?category=support).
- type: textarea
id: description
attributes:
label: Description
description: A clear and concise description of what the bug is. If applicable, add screenshots to help explain your problem.
placeholder: |
Currently Paperless does not work when...
[Screenshot if applicable]
validations:
required: true
- type: textarea
id: reproduction
attributes:
label: Steps to reproduce
description: Steps to reproduce the behavior.
placeholder: |
1. Go to '...'
2. Click on '....'
3. See error
validations:
required: true
- type: textarea
id: logs
attributes:
label: Webserver logs
description: Logs from the web server related to your issue.
render: bash
validations:
required: true
- type: textarea
id: logs_browser
attributes:
label: Browser logs
description: Logs from the web browser related to your issue, if needed
render: bash
- type: input
id: version
attributes:
label: Paperless-ngx version
placeholder: e.g. 1.6.0
validations:
required: true
- type: input
id: host-os
attributes:
label: Host OS
description: Host OS of the machine running paperless-ngx. Please add the architecture (uname -m) if applicable.
placeholder: e.g. Archlinux / Ubuntu 20.04 / Raspberry Pi `arm64`
validations:
required: true
- type: dropdown
id: install-method
attributes:
label: Installation method
options:
- Docker - official image
- Docker - linuxserver.io image
- Bare metal
- Other (please describe above)
description: Note there are significant differences from the official image and linuxserver.io, please check if your issue is specific to the third-party image.
validations:
required: true
- type: input
id: browser
attributes:
label: Browser
description: Which browser you are using, if relevant.
placeholder: e.g. Chrome, Safari
- type: input
id: config-changes
attributes:
label: Configuration changes
description: Any configuration changes you made in `docker-compose.yml`, `docker-compose.env` or `paperless.conf`.
- type: input
id: other
attributes:
label: Other
description: Any other relevant details.

50
.github/ISSUE_TEMPLATE/bug_report.md vendored Normal file
View File

@@ -0,0 +1,50 @@
---
name: Bug report
about: Something is not working
title: '[BUG] Concise description of the issue'
labels: ''
assignees: ''
---
<!---
=> Before opening an issue, please check the documentation and see if it helps you resolve your issue: https://paperless-ngx.readthedocs.io/en/latest/troubleshooting.html
=> Please also make sure that you followed the installation instructions.
=> Please search the issues and look for similar issues before opening a bug report.
=> If you would like to submit a feature request please submit one under https://github.com/paperless-ngx/paperless-ngx/discussions/categories/feature-requests
=> If you encounter issues while installing of configuring Paperless-ngx, please post that in the "Support" section of the discussions. Remember that Paperless successfully runs on a variety of different systems. If paperless does not start, it's probably an issue with your system, and not an issue of paperless.
=> Don't remove the [BUG] prefix from the title.
-->
**Describe the bug**
A clear and concise description of what the bug is.
**To Reproduce**
Steps to reproduce the behavior:
1. Go to '...'
2. Click on '....'
3. Scroll down to '....'
4. See error
**Expected behavior**
A clear and concise description of what you expected to happen.
**Screenshots**
If applicable, add screenshots to help explain your problem.
**Webserver logs**
```
If available, post any logs from the web server related to your issue.
```
**Relevant information**
- Host OS of the machine running paperless: [e.g. Archlinux / Ubuntu 20.04]
- Browser [e.g. chrome, safari]
- Version [e.g. 1.0.0]
- Installation method: [docker / bare metal]
- Any configuration changes you made in `docker-compose.yml`, `docker-compose.env` or `paperless.conf`.

View File

@@ -1,11 +0,0 @@
blank_issues_enabled: false
contact_links:
- name: 🤔 Questions and Help
url: https://github.com/paperless-ngx/paperless-ngx/discussions
about: This issue tracker is not for support questions. Please refer to our Discussions.
- name: 💬 Chat
url: https://matrix.to/#/#paperless:adnidor.de
about: Want to discuss Paperless-ngx with others? Check out our chat.
- name: 🚀 Feature Request
url: https://github.com/paperless-ngx/paperless-ngx/discussions/new?category=feature-requests
about: Remember to search for existing feature requests and "up-vote" any you like

19
.github/ISSUE_TEMPLATE/other.md vendored Normal file
View File

@@ -0,0 +1,19 @@
---
name: Other
about: Anything that is not a feature request or bug.
title: '[Other] Title of your issue'
labels: ''
assignees: ''
---
<!--
=> Discussions, Feedback and other suggestions belong in the "Discussion" section and not on the issue tracker.
=> If you would like to submit a feature request please submit one under https://github.com/paperless-ngx/paperless-ngx/discussions/categories/feature-requests
=> If you encounter issues while installing of configuring Paperless-ngx, please post that in the "Support" section of the discussions. Remember that Paperless successfully runs on a variety of different systems. If paperless does not start, it's probably is an issue with your system, and not an issue of paperless.
=> Don't remove the [Other] prefix from the title.
-->

View File

@@ -1,15 +1,23 @@
<!--
<!--
Note: All PRs with code changes should be targeted to the `dev` branch, pure documentation changes can target `main`
-->
## Proposed change
<!--
Please include a summary of the change and which issue is fixed (if any) and any relevant motivation / context. List any dependencies that are required for this change. If appropriate, please include an explanation of how your proposed change can be tested. Screenshots and / or videos can also be helpful if appropriate.
Please include a summary of the change and which issue is fixed (if any) and any relevant motivation / context. List any dependencies that are required for this change. If appropriate, please include an explanation of how your poposed change can be tested. Screenshots and / or videos can also be helpful if appropriate.
-->
Fixes # (issue)
<!--
Please also tag the relevant team to help with review. You can tag any of the following:
@paperless-ngx/backend (Python / django, database, etc.)
@paperless-ngx/frontend (JavaScript/Typescript, HTML, CSS, etc.)
@paperless-ngx/ci-cd (GitHub Actions, deployment)
@paperless-ngx/test (General testing for larger PRs)
-->
## Type of change
<!--

View File

@@ -6,14 +6,11 @@ updates:
# Enable version updates for npm
- package-ecosystem: "npm"
target-branch: "dev"
# Look for `package.json` and `lock` files in the `/src-ui` directory
# Look for `package.json` and `lock` files in the `root` directory
directory: "/src-ui"
# Check the npm registry for updates every month
schedule:
interval: "monthly"
labels:
- "frontend"
- "dependencies"
# Add reviewers
reviewers:
- "paperless-ngx/frontend"
@@ -29,13 +26,9 @@ updates:
labels:
- "backend"
- "dependencies"
# Add reviewers
reviewers:
- "paperless-ngx/backend"
# Enable updates for Github Actions
- package-ecosystem: "github-actions"
target-branch: "dev"
directory: "/"
schedule:
# Check for updates to GitHub Actions every month
@@ -45,4 +38,4 @@ updates:
- "dependencies"
# Add reviewers
reviewers:
- "paperless-ngx/ci-cd"
- "paperless-ngx/backend"

View File

@@ -1,55 +0,0 @@
autolabeler:
- label: "bug"
branch:
- '/^fix/'
title:
- "/^fix/i"
- label: "enhancement"
branch:
- '/^feature/'
title:
- "/^feature/i"
categories:
- title: 'Breaking Changes'
labels:
- 'breaking-change'
- title: 'Features'
labels:
- 'enhancement'
- title: 'Bug Fixes'
labels:
- 'bug'
- title: 'Documentation'
label: 'documentation'
- title: 'Maintenance'
labels:
- 'chore'
- 'deployment'
- 'translation'
- 'ci-cd'
- title: 'Dependencies'
collapse-after: 3
label: 'dependencies'
- title: 'All App Changes'
labels:
- 'frontend'
- 'backend'
collapse-after: 0
include-labels:
- 'enhancement'
- 'bug'
- 'chore'
- 'deployment'
- 'translation'
- 'dependencies'
- 'documentation'
- 'frontend'
- 'backend'
- 'ci-cd'
category-template: '### $TITLE'
change-template: '- $TITLE @$AUTHOR ([#$NUMBER]($URL))'
change-title-escapes: '\<*_&#@'
template: |
## paperless-ngx $RESOLVED_VERSION
$CHANGES

View File

@@ -1,402 +0,0 @@
#!/usr/bin/env python3
import json
import logging
import os
import shutil
import subprocess
from argparse import ArgumentParser
from typing import Dict
from typing import Final
from typing import List
from typing import Optional
from common import get_log_level
from github import ContainerPackage
from github import GithubBranchApi
from github import GithubContainerRegistryApi
logger = logging.getLogger("cleanup-tags")
class DockerManifest2:
"""
Data class wrapping the Docker Image Manifest Version 2.
See https://docs.docker.com/registry/spec/manifest-v2-2/
"""
def __init__(self, data: Dict) -> None:
self._data = data
# This is the sha256: digest string. Corresponds to GitHub API name
# if the package is an untagged package
self.digest = self._data["digest"]
platform_data_os = self._data["platform"]["os"]
platform_arch = self._data["platform"]["architecture"]
platform_variant = self._data["platform"].get(
"variant",
"",
)
self.platform = f"{platform_data_os}/{platform_arch}{platform_variant}"
class RegistryTagsCleaner:
"""
This is the base class for the image registry cleaning. Given a package
name, it will keep all images which are tagged and all untagged images
referred to by a manifest. This results in only images which have been untagged
and cannot be referenced except by their SHA in being removed. None of these
images should be referenced, so it is fine to delete them.
"""
def __init__(
self,
package_name: str,
repo_owner: str,
repo_name: str,
package_api: GithubContainerRegistryApi,
branch_api: Optional[GithubBranchApi],
):
self.actually_delete = False
self.package_api = package_api
self.branch_api = branch_api
self.package_name = package_name
self.repo_owner = repo_owner
self.repo_name = repo_name
self.tags_to_delete: List[str] = []
self.tags_to_keep: List[str] = []
# Get the information about all versions of the given package
# These are active, not deleted, the default returned from the API
self.all_package_versions = self.package_api.get_active_package_versions(
self.package_name,
)
# Get a mapping from a tag like "1.7.0" or "feature-xyz" to the ContainerPackage
# tagged with it. It makes certain lookups easy
self.all_pkgs_tags_to_version: Dict[str, ContainerPackage] = {}
for pkg in self.all_package_versions:
for tag in pkg.tags:
self.all_pkgs_tags_to_version[tag] = pkg
logger.info(
f"Located {len(self.all_package_versions)} versions of package {self.package_name}",
)
self.decide_what_tags_to_keep()
def clean(self):
"""
This method will delete image versions, based on the selected tags to delete
"""
for tag_to_delete in self.tags_to_delete:
package_version_info = self.all_pkgs_tags_to_version[tag_to_delete]
if self.actually_delete:
logger.info(
f"Deleting {tag_to_delete} (id {package_version_info.id})",
)
self.package_api.delete_package_version(
package_version_info,
)
else:
logger.info(
f"Would delete {tag_to_delete} (id {package_version_info.id})",
)
else:
logger.info("No tags to delete")
def clean_untagged(self, is_manifest_image: bool):
"""
This method will delete untagged images, that is those which are not named. It
handles if the image tag is actually a manifest, which points to images that look otherwise
untagged.
"""
def _clean_untagged_manifest():
"""
Handles the deletion of untagged images, but where the package is a manifest, ie a multi
arch image, which means some "untagged" images need to exist still.
Ok, bear with me, these are annoying.
Our images are multi-arch, so the manifest is more like a pointer to a sha256 digest.
These images are untagged, but pointed to, and so should not be removed (or every pull fails).
So for each image getting kept, parse the manifest to find the digest(s) it points to. Then
remove those from the list of untagged images. The final result is the untagged, not pointed to
version which should be safe to remove.
Example:
Tag: ghcr.io/paperless-ngx/paperless-ngx:1.7.1 refers to
amd64: sha256:b9ed4f8753bbf5146547671052d7e91f68cdfc9ef049d06690b2bc866fec2690
armv7: sha256:81605222df4ba4605a2ba4893276e5d08c511231ead1d5da061410e1bbec05c3
arm64: sha256:374cd68db40734b844705bfc38faae84cc4182371de4bebd533a9a365d5e8f3b
each of which appears as untagged image, but isn't really.
So from the list of untagged packages, remove those digests. Once all tags which
are being kept are checked, the remaining untagged packages are actually untagged
with no referrals in a manifest to them.
"""
# Simplify the untagged data, mapping name (which is a digest) to the version
# At the moment, these are the images which APPEAR untagged.
untagged_versions = {}
for x in self.all_package_versions:
if x.untagged:
untagged_versions[x.name] = x
skips = 0
# Parse manifests to locate digests pointed to
for tag in sorted(self.tags_to_keep):
full_name = f"ghcr.io/{self.repo_owner}/{self.package_name}:{tag}"
logger.info(f"Checking manifest for {full_name}")
try:
proc = subprocess.run(
[
shutil.which("docker"),
"manifest",
"inspect",
full_name,
],
capture_output=True,
)
manifest_list = json.loads(proc.stdout)
for manifest_data in manifest_list["manifests"]:
manifest = DockerManifest2(manifest_data)
if manifest.digest in untagged_versions:
logger.info(
f"Skipping deletion of {manifest.digest},"
f" referred to by {full_name}"
f" for {manifest.platform}",
)
del untagged_versions[manifest.digest]
skips += 1
except Exception as err:
self.actually_delete = False
logger.exception(err)
return
logger.info(
f"Skipping deletion of {skips} packages referred to by a manifest",
)
# Delete the untagged and not pointed at packages
logger.info(f"Deleting untagged packages of {self.package_name}")
for to_delete_name in untagged_versions:
to_delete_version = untagged_versions[to_delete_name]
if self.actually_delete:
logger.info(
f"Deleting id {to_delete_version.id} named {to_delete_version.name}",
)
self.package_api.delete_package_version(
to_delete_version,
)
else:
logger.info(
f"Would delete {to_delete_name} (id {to_delete_version.id})",
)
def _clean_untagged_non_manifest():
"""
If the package is not a multi-arch manifest, images without tags are safe to delete.
"""
for package in self.all_package_versions:
if package.untagged:
if self.actually_delete:
logger.info(
f"Deleting id {package.id} named {package.name}",
)
self.package_api.delete_package_version(
package,
)
else:
logger.info(
f"Would delete {package.name} (id {package.id})",
)
else:
logger.info(
f"Not deleting tag {package.tags[0]} of package {self.package_name}",
)
logger.info("Beginning untagged image cleaning")
if is_manifest_image:
_clean_untagged_manifest()
else:
_clean_untagged_non_manifest()
def decide_what_tags_to_keep(self):
"""
This method holds the logic to delete what tags to keep and there fore
what tags to delete.
By default, any image with at least 1 tag will be kept
"""
# By default, keep anything which is tagged
self.tags_to_keep = list(set(self.all_pkgs_tags_to_version.keys()))
class MainImageTagsCleaner(RegistryTagsCleaner):
def decide_what_tags_to_keep(self):
"""
Overrides the default logic for deciding what images to keep. Images tagged as "feature-"
will be removed, if the corresponding branch no longer exists.
"""
# Default to everything gets kept still
super().decide_what_tags_to_keep()
# Locate the feature branches
feature_branches = {}
for branch in self.branch_api.get_branches(
repo=self.repo_name,
):
if branch.name.startswith("feature-"):
logger.debug(f"Found feature branch {branch.name}")
feature_branches[branch.name] = branch
logger.info(f"Located {len(feature_branches)} feature branches")
if not len(feature_branches):
# Our work here is done, delete nothing
return
# Filter to packages which are tagged with feature-*
packages_tagged_feature: List[ContainerPackage] = []
for package in self.all_package_versions:
if package.tag_matches("feature-"):
packages_tagged_feature.append(package)
# Map tags like "feature-xyz" to a ContainerPackage
feature_pkgs_tags_to_versions: Dict[str, ContainerPackage] = {}
for pkg in packages_tagged_feature:
for tag in pkg.tags:
feature_pkgs_tags_to_versions[tag] = pkg
logger.info(
f'Located {len(feature_pkgs_tags_to_versions)} versions of package {self.package_name} tagged "feature-"',
)
# All the feature tags minus all the feature branches leaves us feature tags
# with no corresponding branch
self.tags_to_delete = list(
set(feature_pkgs_tags_to_versions.keys()) - set(feature_branches.keys()),
)
# All the tags minus the set of going to be deleted tags leaves us the
# tags which will be kept around
self.tags_to_keep = list(
set(self.all_pkgs_tags_to_version.keys()) - set(self.tags_to_delete),
)
logger.info(
f"Located {len(self.tags_to_delete)} versions of package {self.package_name} to delete",
)
class LibraryTagsCleaner(RegistryTagsCleaner):
"""
Exists for the off change that someday, the installer library images
will need their own logic
"""
pass
def _main():
parser = ArgumentParser(
description="Using the GitHub API locate and optionally delete container"
" tags which no longer have an associated feature branch",
)
# Requires an affirmative command to actually do a delete
parser.add_argument(
"--delete",
action="store_true",
default=False,
help="If provided, actually delete the container tags",
)
# When a tagged image is updated, the previous version remains, but it no longer tagged
# Add this option to remove them as well
parser.add_argument(
"--untagged",
action="store_true",
default=False,
help="If provided, delete untagged containers as well",
)
# If given, the package is assumed to be a multi-arch manifest. Cache packages are
# not multi-arch, all other types are
parser.add_argument(
"--is-manifest",
action="store_true",
default=False,
help="If provided, the package is assumed to be a multi-arch manifest following schema v2",
)
# Allows configuration of log level for debugging
parser.add_argument(
"--loglevel",
default="info",
help="Configures the logging level",
)
# Get the name of the package being processed this round
parser.add_argument(
"package",
help="The package to process",
)
args = parser.parse_args()
logging.basicConfig(
level=get_log_level(args),
datefmt="%Y-%m-%d %H:%M:%S",
format="%(asctime)s %(levelname)-8s %(message)s",
)
# Must be provided in the environment
repo_owner: Final[str] = os.environ["GITHUB_REPOSITORY_OWNER"]
repo: Final[str] = os.environ["GITHUB_REPOSITORY"]
gh_token: Final[str] = os.environ["TOKEN"]
# Find all branches named feature-*
# Note: Only relevant to the main application, but simpler to
# leave in for all packages
with GithubBranchApi(gh_token) as branch_api:
with GithubContainerRegistryApi(gh_token, repo_owner) as container_api:
if args.package in {"paperless-ngx", "paperless-ngx/builder/cache/app"}:
cleaner = MainImageTagsCleaner(
args.package,
repo_owner,
repo,
container_api,
branch_api,
)
else:
cleaner = LibraryTagsCleaner(
args.package,
repo_owner,
repo,
container_api,
None,
)
# Set if actually doing a delete vs dry run
cleaner.actually_delete = args.delete
# Clean images with tags
cleaner.clean()
# Clean images which are untagged
cleaner.clean_untagged(args.is_manifest)
if __name__ == "__main__":
_main()

View File

@@ -1,48 +0,0 @@
#!/usr/bin/env python3
import logging
def get_image_tag(
repo_name: str,
pkg_name: str,
pkg_version: str,
) -> str:
"""
Returns a string representing the normal image for a given package
"""
return f"ghcr.io/{repo_name.lower()}/builder/{pkg_name}:{pkg_version}"
def get_cache_image_tag(
repo_name: str,
pkg_name: str,
pkg_version: str,
branch_name: str,
) -> str:
"""
Returns a string representing the expected image cache tag for a given package
Registry type caching is utilized for the builder images, to allow fast
rebuilds, generally almost instant for the same version
"""
return f"ghcr.io/{repo_name.lower()}/builder/cache/{pkg_name}:{pkg_version}"
def get_log_level(args) -> int:
"""
Returns a logging level, based
:param args:
:return:
"""
levels = {
"critical": logging.CRITICAL,
"error": logging.ERROR,
"warn": logging.WARNING,
"warning": logging.WARNING,
"info": logging.INFO,
"debug": logging.DEBUG,
}
level = levels.get(args.loglevel.lower())
if level is None:
level = logging.INFO
return level

View File

@@ -1,92 +0,0 @@
#!/usr/bin/env python3
"""
This is a helper script for the mutli-stage Docker image builder.
It provides a single point of configuration for package version control.
The output JSON object is used by the CI workflow to determine what versions
to build and pull into the final Docker image.
Python package information is obtained from the Pipfile.lock. As this is
kept updated by dependabot, it usually will need no further configuration.
The sole exception currently is pikepdf, which has a dependency on qpdf,
and is configured here to use the latest version of qpdf built by the workflow.
Other package version information is configured directly below, generally by
setting the version and Git information, if any.
"""
import argparse
import json
import os
from pathlib import Path
from typing import Final
from common import get_cache_image_tag
from common import get_image_tag
def _main():
parser = argparse.ArgumentParser(
description="Generate a JSON object of information required to build the given package, based on the Pipfile.lock",
)
parser.add_argument(
"package",
help="The name of the package to generate JSON for",
)
PIPFILE_LOCK_PATH: Final[Path] = Path("Pipfile.lock")
BUILD_CONFIG_PATH: Final[Path] = Path(".build-config.json")
# Read the main config file
build_json: Final = json.loads(BUILD_CONFIG_PATH.read_text())
# Read Pipfile.lock file
pipfile_data: Final = json.loads(PIPFILE_LOCK_PATH.read_text())
args: Final = parser.parse_args()
# Read from environment variables set by GitHub Actions
repo_name: Final[str] = os.environ["GITHUB_REPOSITORY"]
branch_name: Final[str] = os.environ["GITHUB_REF_NAME"]
# Default output values
version = None
extra_config = {}
if args.package in pipfile_data["default"]:
# Read the version from Pipfile.lock
pkg_data = pipfile_data["default"][args.package]
pkg_version = pkg_data["version"].split("==")[-1]
version = pkg_version
# Any extra/special values needed
if args.package == "pikepdf":
extra_config["qpdf_version"] = build_json["qpdf"]["version"]
elif args.package in build_json:
version = build_json[args.package]["version"]
else:
raise NotImplementedError(args.package)
# The JSON object we'll output
output = {
"name": args.package,
"version": version,
"image_tag": get_image_tag(repo_name, args.package, version),
"cache_tag": get_cache_image_tag(
repo_name,
args.package,
version,
branch_name,
),
}
# Add anything special a package may need
output.update(extra_config)
# Output the JSON info to stdout
print(json.dumps(output))
if __name__ == "__main__":
_main()

View File

@@ -1,274 +0,0 @@
#!/usr/bin/env python3
"""
This module contains some useful classes for interacting with the Github API.
The full documentation for the API can be found here: https://docs.github.com/en/rest
Mostly, this focusses on two areas, repo branches and repo packages, as the use case
is cleaning up container images which are no longer referred to.
"""
import functools
import logging
import re
import urllib.parse
from typing import Dict
from typing import List
from typing import Optional
import httpx
logger = logging.getLogger("github-api")
class _GithubApiBase:
"""
A base class for interacting with the Github API. It
will handle the session and setting authorization headers.
"""
def __init__(self, token: str) -> None:
self._token = token
self._client: Optional[httpx.Client] = None
def __enter__(self) -> "_GithubApiBase":
"""
Sets up the required headers for auth and response
type from the API
"""
self._client = httpx.Client()
self._client.headers.update(
{
"Accept": "application/vnd.github.v3+json",
"Authorization": f"token {self._token}",
},
)
return self
def __exit__(self, exc_type, exc_val, exc_tb):
"""
Ensures the authorization token is cleaned up no matter
the reason for the exit
"""
if "Accept" in self._client.headers:
del self._client.headers["Accept"]
if "Authorization" in self._client.headers:
del self._client.headers["Authorization"]
# Close the session as well
self._client.close()
self._client = None
def _read_all_pages(self, endpoint):
"""
Helper function to read all pages of an endpoint, utilizing the
next.url until exhausted. Assumes the endpoint returns a list
"""
internal_data = []
while True:
resp = self._client.get(endpoint)
if resp.status_code == 200:
internal_data += resp.json()
if "next" in resp.links:
endpoint = resp.links["next"]["url"]
else:
logger.debug("Exiting pagination loop")
break
else:
logger.warning(f"Request to {endpoint} return HTTP {resp.status_code}")
resp.raise_for_status()
return internal_data
class _EndpointResponse:
"""
For all endpoint JSON responses, store the full
response data, for ease of extending later, if need be.
"""
def __init__(self, data: Dict) -> None:
self._data = data
class GithubBranch(_EndpointResponse):
"""
Simple wrapper for a repository branch, only extracts name information
for now.
"""
def __init__(self, data: Dict) -> None:
super().__init__(data)
self.name = self._data["name"]
class GithubBranchApi(_GithubApiBase):
"""
Wrapper around branch API.
See https://docs.github.com/en/rest/branches/branches
"""
def __init__(self, token: str) -> None:
super().__init__(token)
self._ENDPOINT = "https://api.github.com/repos/{REPO}/branches"
def get_branches(self, repo: str) -> List[GithubBranch]:
"""
Returns all current branches of the given repository owned by the given
owner or organization.
"""
# The environment GITHUB_REPOSITORY already contains the owner in the correct location
endpoint = self._ENDPOINT.format(REPO=repo)
internal_data = self._read_all_pages(endpoint)
return [GithubBranch(branch) for branch in internal_data]
class ContainerPackage(_EndpointResponse):
"""
Data class wrapping the JSON response from the package related
endpoints
"""
def __init__(self, data: Dict):
super().__init__(data)
# This is a numerical ID, required for interactions with this
# specific package, including deletion of it or restoration
self.id: int = self._data["id"]
# A string name. This might be an actual name or it could be a
# digest string like "sha256:"
self.name: str = self._data["name"]
# URL to the package, including its ID, can be used for deletion
# or restoration without needing to build up a URL ourselves
self.url: str = self._data["url"]
# The list of tags applied to this image. Maybe an empty list
self.tags: List[str] = self._data["metadata"]["container"]["tags"]
@functools.cached_property
def untagged(self) -> bool:
"""
Returns True if the image has no tags applied to it, False otherwise
"""
return len(self.tags) == 0
@functools.cache
def tag_matches(self, pattern: str) -> bool:
"""
Returns True if the image has at least one tag which matches the given regex,
False otherwise
"""
for tag in self.tags:
if re.match(pattern, tag) is not None:
return True
return False
def __repr__(self):
return f"Package {self.name}"
class GithubContainerRegistryApi(_GithubApiBase):
"""
Class wrapper to deal with the Github packages API. This class only deals with
container type packages, the only type published by paperless-ngx.
"""
def __init__(self, token: str, owner_or_org: str) -> None:
super().__init__(token)
self._owner_or_org = owner_or_org
if self._owner_or_org == "paperless-ngx":
# https://docs.github.com/en/rest/packages#get-all-package-versions-for-a-package-owned-by-an-organization
self._PACKAGES_VERSIONS_ENDPOINT = "https://api.github.com/orgs/{ORG}/packages/{PACKAGE_TYPE}/{PACKAGE_NAME}/versions"
# https://docs.github.com/en/rest/packages#delete-package-version-for-an-organization
self._PACKAGE_VERSION_DELETE_ENDPOINT = "https://api.github.com/orgs/{ORG}/packages/{PACKAGE_TYPE}/{PACKAGE_NAME}/versions/{PACKAGE_VERSION_ID}"
else:
# https://docs.github.com/en/rest/packages#get-all-package-versions-for-a-package-owned-by-the-authenticated-user
self._PACKAGES_VERSIONS_ENDPOINT = "https://api.github.com/user/packages/{PACKAGE_TYPE}/{PACKAGE_NAME}/versions"
# https://docs.github.com/en/rest/packages#delete-a-package-version-for-the-authenticated-user
self._PACKAGE_VERSION_DELETE_ENDPOINT = "https://api.github.com/user/packages/{PACKAGE_TYPE}/{PACKAGE_NAME}/versions/{PACKAGE_VERSION_ID}"
self._PACKAGE_VERSION_RESTORE_ENDPOINT = (
f"{self._PACKAGE_VERSION_DELETE_ENDPOINT}/restore"
)
def get_active_package_versions(
self,
package_name: str,
) -> List[ContainerPackage]:
"""
Returns all the versions of a given package (container images) from
the API
"""
package_type: str = "container"
# Need to quote this for slashes in the name
package_name = urllib.parse.quote(package_name, safe="")
endpoint = self._PACKAGES_VERSIONS_ENDPOINT.format(
ORG=self._owner_or_org,
PACKAGE_TYPE=package_type,
PACKAGE_NAME=package_name,
)
pkgs = []
for data in self._read_all_pages(endpoint):
pkgs.append(ContainerPackage(data))
return pkgs
def get_deleted_package_versions(
self,
package_name: str,
) -> List[ContainerPackage]:
package_type: str = "container"
# Need to quote this for slashes in the name
package_name = urllib.parse.quote(package_name, safe="")
endpoint = (
self._PACKAGES_VERSIONS_ENDPOINT.format(
ORG=self._owner_or_org,
PACKAGE_TYPE=package_type,
PACKAGE_NAME=package_name,
)
+ "?state=deleted"
)
pkgs = []
for data in self._read_all_pages(endpoint):
pkgs.append(ContainerPackage(data))
return pkgs
def delete_package_version(self, package_data: ContainerPackage):
"""
Deletes the given package version from the GHCR
"""
resp = self._client.delete(package_data.url)
if resp.status_code != 204:
logger.warning(
f"Request to delete {package_data.url} returned HTTP {resp.status_code}",
)
def restore_package_version(
self,
package_name: str,
package_data: ContainerPackage,
):
package_type: str = "container"
endpoint = self._PACKAGE_VERSION_RESTORE_ENDPOINT.format(
ORG=self._owner_or_org,
PACKAGE_TYPE=package_type,
PACKAGE_NAME=package_name,
PACKAGE_VERSION_ID=package_data.id,
)
resp = self._client.post(endpoint)
if resp.status_code != 204:
logger.warning(
f"Request to delete {endpoint} returned HTTP {resp.status_code}",
)

15
.github/stale.yml vendored
View File

@@ -1,23 +1,18 @@
# Number of days of inactivity before an issue becomes stale
daysUntilStale: 30
# Number of days of inactivity before a stale issue is closed
daysUntilClose: 7
# Only issues or pull requests with all of these labels are check if stale. Defaults to `[]` (disabled)
onlyLabels: [cant-reproduce]
# Issues with these labels will never be considered stale
exemptLabels:
- pinned
- security
- fixpending
# Label to use when marking an issue as stale
staleLabel: stale
# Comment to post when marking an issue as stale. Set to `false` to disable
markComment: >
This issue has been automatically marked as stale because it has not had
recent activity. It will be closed if no further activity occurs. Thank you
for your contributions.
# Comment to post when closing a stale issue. Set to `false` to disable
closeComment: false
# See https://github.com/marketplace/stale for more info on the app
# and https://github.com/probot/stale for the configuration docs

View File

@@ -3,10 +3,8 @@ name: ci
on:
push:
tags:
# https://semver.org/#spec-item-2
- 'v[0-9]+.[0-9]+.[0-9]+'
# https://semver.org/#spec-item-9
- 'v[0-9]+.[0-9]+.[0-9]+-beta.rc[0-9]+'
- ngx-*
- beta-*
branches-ignore:
- 'translations**'
pull_request:
@@ -14,40 +12,19 @@ on:
- 'translations**'
jobs:
pre-commit:
name: Linting Checks
runs-on: ubuntu-latest
steps:
-
name: Checkout repository
uses: actions/checkout@v3
-
name: Install tools
uses: actions/setup-python@v4
with:
python-version: "3.9"
-
name: Check files
uses: pre-commit/action@v3.0.0
documentation:
name: "Build Documentation"
runs-on: ubuntu-20.04
needs:
- pre-commit
steps:
-
name: Checkout
uses: actions/checkout@v3
-
name: Install pipenv
run: |
pipx install pipenv==2022.10.12
run: pipx install pipenv
-
name: Set up Python
uses: actions/setup-python@v4
uses: actions/setup-python@v3
with:
python-version: 3.9
cache: "pipenv"
@@ -56,10 +33,6 @@ jobs:
name: Install dependencies
run: |
pipenv sync --dev
-
name: List installed Python dependencies
run: |
pipenv run pip list
-
name: Make documentation
run: |
@@ -72,44 +45,85 @@ jobs:
name: documentation
path: docs/_build/html/
tests-backend:
name: "Tests (${{ matrix.python-version }})"
code-checks-backend:
name: "Backend Code Checks"
runs-on: ubuntu-20.04
steps:
-
name: Checkout
uses: actions/checkout@v3
-
name: Install checkers
run: |
pipx install reorder-python-imports
pipx install yesqa
pipx install add-trailing-comma
pipx install flake8
-
name: Run reorder-python-imports
run: |
find src/ -type f -name '*.py' ! -path "*/migrations/*" | xargs reorder-python-imports
-
name: Run yesqa
run: |
find src/ -type f -name '*.py' ! -path "*/migrations/*" | xargs yesqa
-
name: Run add-trailing-comma
run: |
find src/ -type f -name '*.py' ! -path "*/migrations/*" | xargs add-trailing-comma
# black is placed after add-trailing-comma because it may format differently
# if a trailing comma is added
-
name: Run black
uses: psf/black@stable
with:
options: "--check --diff"
version: "22.3.0"
-
name: Run flake8 checks
run: |
cd src/
flake8 --max-line-length=88 --ignore=E203,W503
code-checks-frontend:
name: "Frontend Code Checks"
runs-on: ubuntu-20.04
steps:
-
name: Checkout
uses: actions/checkout@v3
- uses: actions/setup-node@v3
with:
node-version: '16'
-
name: Install prettier
run: |
npm install prettier
-
name: Run prettier
run:
npx prettier --check --ignore-path Pipfile.lock **/*.js **/*.ts *.md **/*.md
tests-backend:
needs: [code-checks-backend]
name: "Backend Tests (${{ matrix.python-version }})"
runs-on: ubuntu-20.04
needs:
- pre-commit
strategy:
matrix:
python-version: ['3.8', '3.9', '3.10']
python-version: ['3.8', '3.9']
fail-fast: false
services:
tika:
image: ghcr.io/paperless-ngx/tika:latest
ports:
- "9998:9998/tcp"
gotenberg:
image: docker.io/gotenberg/gotenberg:7.6
ports:
- "3000:3000/tcp"
env:
# Enable Tika end to end testing
TIKA_LIVE: 1
# Enable paperless_mail testing against real server
PAPERLESS_MAIL_TEST_HOST: ${{ secrets.TEST_MAIL_HOST }}
PAPERLESS_MAIL_TEST_USER: ${{ secrets.TEST_MAIL_USER }}
PAPERLESS_MAIL_TEST_PASSWD: ${{ secrets.TEST_MAIL_PASSWD }}
steps:
-
name: Checkout
uses: actions/checkout@v3
with:
fetch-depth: 0
fetch-depth: 2
-
name: Install pipenv
run: |
pipx install pipenv==2022.10.12
run: pipx install pipenv
-
name: Set up Python
uses: actions/setup-python@v4
uses: actions/setup-python@v3
with:
python-version: "${{ matrix.python-version }}"
cache: "pipenv"
@@ -118,24 +132,20 @@ jobs:
name: Install system dependencies
run: |
sudo apt-get update -qq
sudo apt-get install -qq --no-install-recommends unpaper tesseract-ocr imagemagick ghostscript libzbar0 poppler-utils
sudo apt-get install -qq --no-install-recommends unpaper tesseract-ocr imagemagick ghostscript optipng
-
name: Install Python dependencies
run: |
pipenv sync --dev
-
name: List installed Python dependencies
run: |
pipenv run pip list
-
name: Tests
run: |
cd src/
pipenv run pytest -rfEp
pipenv run pytest
-
name: Get changed files
id: changed-files-specific
uses: tj-actions/changed-files@v34
uses: tj-actions/changed-files@v18.1
with:
files: |
src/**
@@ -156,17 +166,15 @@ jobs:
pipenv run coveralls --service=github
tests-frontend:
name: "Tests Frontend"
needs: [code-checks-frontend]
name: "Frontend Tests"
runs-on: ubuntu-20.04
needs:
- pre-commit
strategy:
matrix:
node-version: [16.x]
steps:
- uses: actions/checkout@v3
-
name: Use Node.js ${{ matrix.node-version }}
- name: Use Node.js ${{ matrix.node-version }}
uses: actions/setup-node@v3
with:
node-version: ${{ matrix.node-version }}
@@ -174,152 +182,42 @@ jobs:
- run: cd src-ui && npm run test
- run: cd src-ui && npm run e2e:ci
prepare-docker-build:
name: Prepare Docker Pipeline Data
if: github.event_name == 'push' && (startsWith(github.ref, 'refs/heads/feature-') || github.ref == 'refs/heads/dev' || github.ref == 'refs/heads/beta' || contains(github.ref, 'beta.rc') || startsWith(github.ref, 'refs/tags/v'))
runs-on: ubuntu-20.04
# If the push triggered the installer library workflow, wait for it to
# complete here. This ensures the required versions for the final
# image have been built, while not waiting at all if the versions haven't changed
concurrency:
group: build-installer-library
cancel-in-progress: false
needs:
- documentation
- tests-backend
- tests-frontend
steps:
-
name: Set ghcr repository name
id: set-ghcr-repository
run: |
ghcr_name=$(echo "${GITHUB_REPOSITORY}" | awk '{ print tolower($0) }')
echo "repository=${ghcr_name}" >> $GITHUB_OUTPUT
-
name: Checkout
uses: actions/checkout@v3
-
name: Set up Python
uses: actions/setup-python@v4
with:
python-version: "3.9"
-
name: Setup qpdf image
id: qpdf-setup
run: |
build_json=$(python ${GITHUB_WORKSPACE}/.github/scripts/get-build-json.py qpdf)
echo ${build_json}
echo "qpdf-json=${build_json}" >> $GITHUB_OUTPUT
-
name: Setup psycopg2 image
id: psycopg2-setup
run: |
build_json=$(python ${GITHUB_WORKSPACE}/.github/scripts/get-build-json.py psycopg2)
echo ${build_json}
echo "psycopg2-json=${build_json}" >> $GITHUB_OUTPUT
-
name: Setup pikepdf image
id: pikepdf-setup
run: |
build_json=$(python ${GITHUB_WORKSPACE}/.github/scripts/get-build-json.py pikepdf)
echo ${build_json}
echo "pikepdf-json=${build_json}" >> $GITHUB_OUTPUT
-
name: Setup jbig2enc image
id: jbig2enc-setup
run: |
build_json=$(python ${GITHUB_WORKSPACE}/.github/scripts/get-build-json.py jbig2enc)
echo ${build_json}
echo "jbig2enc-json=${build_json}" >> $GITHUB_OUTPUT
outputs:
ghcr-repository: ${{ steps.set-ghcr-repository.outputs.repository }}
qpdf-json: ${{ steps.qpdf-setup.outputs.qpdf-json }}
pikepdf-json: ${{ steps.pikepdf-setup.outputs.pikepdf-json }}
psycopg2-json: ${{ steps.psycopg2-setup.outputs.psycopg2-json }}
jbig2enc-json: ${{ steps.jbig2enc-setup.outputs.jbig2enc-json}}
# build and push image to docker hub.
build-docker-image:
if: github.event_name == 'push' && (startsWith(github.ref, 'refs/heads/feature-') || github.ref == 'refs/heads/dev' || github.ref == 'refs/heads/beta' || startsWith(github.ref, 'refs/tags/ngx-') || startsWith(github.ref, 'refs/tags/beta-'))
runs-on: ubuntu-20.04
concurrency:
group: ${{ github.workflow }}-build-docker-image-${{ github.ref_name }}
cancel-in-progress: true
needs:
- prepare-docker-build
needs: [tests-backend, tests-frontend]
steps:
-
name: Check pushing to Docker Hub
id: docker-hub
# Only push to Dockerhub from the main repo AND the ref is either:
# main
# dev
# beta
# a tag
# Otherwise forks would require a Docker Hub account and secrets setup
run: |
if [[ ${{ needs.prepare-docker-build.outputs.ghcr-repository }} == "paperless-ngx/paperless-ngx" && ( ${{ github.ref_name }} == "main" || ${{ github.ref_name }} == "dev" || ${{ github.ref_name }} == "beta" || ${{ startsWith(github.ref, 'refs/tags/v') }} == "true" ) ]] ; then
echo "Enabling DockerHub image push"
echo "enable=true" >> $GITHUB_OUTPUT
else
echo "Not pushing to DockerHub"
echo "enable=false" >> $GITHUB_OUTPUT
fi
-
name: Gather Docker metadata
id: docker-meta
uses: docker/metadata-action@v4
uses: docker/metadata-action@v3
with:
images: |
ghcr.io/${{ needs.prepare-docker-build.outputs.ghcr-repository }}
name=paperlessngx/paperless-ngx,enable=${{ steps.docker-hub.outputs.enable }}
images: ghcr.io/${{ github.repository }}
tags: |
# Tag branches with branch name
type=match,pattern=ngx-(\d.\d.\d),group=1
type=semver,pattern=ngx-{{version}}
type=semver,pattern=ngx-{{major}}.{{minor}}
type=ref,event=branch
# Process semver tags
# For a tag x.y.z or vX.Y.Z, output an x.y.z and x.y image tag
type=semver,pattern={{version}}
type=semver,pattern={{major}}.{{minor}}
-
name: Checkout
uses: actions/checkout@v3
-
name: Set up Docker Buildx
uses: docker/setup-buildx-action@v2
uses: docker/setup-buildx-action@v1
-
name: Set up QEMU
uses: docker/setup-qemu-action@v2
uses: docker/setup-qemu-action@v1
-
name: Login to Github Container Registry
uses: docker/login-action@v2
uses: docker/login-action@v1
with:
registry: ghcr.io
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
-
name: Login to Docker Hub
uses: docker/login-action@v2
# Don't attempt to login is not pushing to Docker Hub
if: steps.docker-hub.outputs.enable == 'true'
with:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_TOKEN }}
-
name: Build and push
uses: docker/build-push-action@v3
uses: docker/build-push-action@v2
with:
context: .
file: ./Dockerfile
@@ -327,19 +225,8 @@ jobs:
push: ${{ github.event_name != 'pull_request' }}
tags: ${{ steps.docker-meta.outputs.tags }}
labels: ${{ steps.docker-meta.outputs.labels }}
build-args: |
JBIG2ENC_VERSION=${{ fromJSON(needs.prepare-docker-build.outputs.jbig2enc-json).version }}
QPDF_VERSION=${{ fromJSON(needs.prepare-docker-build.outputs.qpdf-json).version }}
PIKEPDF_VERSION=${{ fromJSON(needs.prepare-docker-build.outputs.pikepdf-json).version }}
PSYCOPG2_VERSION=${{ fromJSON(needs.prepare-docker-build.outputs.psycopg2-json).version }}
# Get cache layers from this branch, then dev, then main
# This allows new branches to get at least some cache benefits, generally from dev
cache-from: |
type=registry,ref=ghcr.io/${{ needs.prepare-docker-build.outputs.ghcr-repository }}/builder/cache/app:${{ github.ref_name }}
type=registry,ref=ghcr.io/${{ needs.prepare-docker-build.outputs.ghcr-repository }}/builder/cache/app:dev
type=registry,ref=ghcr.io/${{ needs.prepare-docker-build.outputs.ghcr-repository }}/builder/cache/app:main
cache-to: |
type=registry,mode=max,ref=ghcr.io/${{ needs.prepare-docker-build.outputs.ghcr-repository }}/builder/cache/app:${{ github.ref_name }}
cache-from: type=gha
cache-to: type=gha,mode=max
-
name: Inspect image
run: |
@@ -357,34 +244,24 @@ jobs:
path: src/documents/static/frontend/
build-release:
needs:
- build-docker-image
needs: [build-docker-image, documentation]
runs-on: ubuntu-20.04
steps:
-
name: Checkout
uses: actions/checkout@v3
-
name: Install pipenv
run: |
pip3 install --upgrade pip setuptools wheel pipx
pipx install pipenv
-
name: Set up Python
uses: actions/setup-python@v4
uses: actions/setup-python@v3
with:
python-version: 3.9
cache: "pipenv"
cache-dependency-path: 'Pipfile.lock'
-
name: Install Python dependencies
run: |
pipenv sync --dev
-
name: Install system dependencies
name: Install dependencies
run: |
sudo apt-get update -qq
sudo apt-get install -qq --no-install-recommends gettext liblept5
pip3 install --upgrade pip setuptools wheel
pip3 install -r requirements.txt
-
name: Download frontend artifact
uses: actions/download-artifact@v3
@@ -397,38 +274,34 @@ jobs:
with:
name: documentation
path: docs/_build/html/
-
name: Generate requirements file
run: |
pipenv requirements > requirements.txt
-
name: Compile messages
run: |
cd src/
pipenv run python3 manage.py compilemessages
-
name: Collect static files
run: |
cd src/
pipenv run python3 manage.py collectstatic --no-input
-
name: Move files
run: |
mkdir dist
mkdir dist/paperless-ngx
mkdir dist/paperless-ngx/scripts
cp .dockerignore .env Dockerfile Pipfile Pipfile.lock requirements.txt LICENSE README.md dist/paperless-ngx/
cp .dockerignore .env Dockerfile Pipfile Pipfile.lock LICENSE README.md requirements.txt dist/paperless-ngx/
cp paperless.conf.example dist/paperless-ngx/paperless.conf
cp gunicorn.conf.py dist/paperless-ngx/gunicorn.conf.py
cp -r docker/ dist/paperless-ngx/docker
cp docker/ dist/paperless-ngx/docker -r
cp scripts/*.service scripts/*.sh dist/paperless-ngx/scripts/
cp -r src/ dist/paperless-ngx/src
cp -r docs/_build/html/ dist/paperless-ngx/docs
mv static dist/paperless-ngx
cp src/ dist/paperless-ngx/src -r
cp docs/_build/html/ dist/paperless-ngx/docs -r
-
name: Compile messages
run: |
cd dist/paperless-ngx/src
python3 manage.py compilemessages
-
name: Collect static files
run: |
cd dist/paperless-ngx/src
python3 manage.py collectstatic --no-input
-
name: Make release package
run: |
cd dist
find . -name __pycache__ | xargs rm -r
tar -cJf paperless-ngx.tar.xz paperless-ngx/
-
name: Upload release artifact
@@ -439,13 +312,8 @@ jobs:
publish-release:
runs-on: ubuntu-20.04
outputs:
prerelease: ${{ steps.get_version.outputs.prerelease }}
changelog: ${{ steps.create-release.outputs.body }}
version: ${{ steps.get_version.outputs.version }}
needs:
- build-release
if: github.ref_type == 'tag' && (startsWith(github.ref_name, 'v') || contains(github.ref_name, '-beta.rc'))
needs: build-release
if: contains(github.ref, 'refs/tags/ngx-') || contains(github.ref, 'refs/tags/beta-')
steps:
-
name: Download release artifact
@@ -457,24 +325,27 @@ jobs:
name: Get version
id: get_version
run: |
echo "version=${{ github.ref_name }}" >> $GITHUB_OUTPUT
if [[ ${{ contains(github.ref_name, '-beta.rc') }} == 'true' ]]; then
echo "prerelease=true" >> $GITHUB_OUTPUT
else
echo "prerelease=false" >> $GITHUB_OUTPUT
if [[ $GITHUB_REF == refs/tags/ngx-* ]]; then
echo ::set-output name=version::${GITHUB_REF#refs/tags/ngx-}
echo ::set-output name=prerelease::false
echo ::set-output name=body::"For a complete list of changes, see the changelog at https://paperless-ngx.readthedocs.io/en/latest/changelog.html"
elif [[ $GITHUB_REF == refs/tags/beta-* ]]; then
echo ::set-output name=version::${GITHUB_REF#refs/tags/beta-}
echo ::set-output name=prerelease::true
echo ::set-output name=body::"For a complete list of changes, see the changelog at https://github.com/paperless-ngx/paperless-ngx/blob/beta/docs/changelog.rst"
fi
-
name: Create Release and Changelog
id: create-release
uses: paperless-ngx/release-drafter@master
with:
name: Paperless-ngx ${{ steps.get_version.outputs.version }}
tag: ${{ steps.get_version.outputs.version }}
version: ${{ steps.get_version.outputs.version }}
prerelease: ${{ steps.get_version.outputs.prerelease }}
publish: true # ensures release is not marked as draft
name: Create release
id: create_release
uses: actions/create-release@v1
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
with:
tag_name: ngx-${{ steps.get_version.outputs.version }}
release_name: Paperless-ngx ${{ steps.get_version.outputs.version }}
draft: false
prerelease: ${{ steps.get_version.outputs.prerelease }}
body: ${{ steps.get_version.outputs.body }}
-
name: Upload release archive
id: upload-release-asset
@@ -482,69 +353,7 @@ jobs:
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
with:
upload_url: ${{ steps.create-release.outputs.upload_url }}
upload_url: ${{ steps.create_release.outputs.upload_url }} # This pulls from the CREATE RELEASE step above, referencing it's ID to get its outputs object, which include a `upload_url`. See this blog post for more info: https://jasonet.co/posts/new-features-of-github-actions/#passing-data-to-future-steps
asset_path: ./paperless-ngx.tar.xz
asset_name: paperless-ngx-${{ steps.get_version.outputs.version }}.tar.xz
asset_content_type: application/x-xz
append-changelog:
runs-on: ubuntu-20.04
needs:
- publish-release
if: needs.publish-release.outputs.prerelease == 'false'
steps:
-
name: Checkout
uses: actions/checkout@v3
with:
ref: main
-
name: Install pipenv
run: |
pip3 install --upgrade pip setuptools wheel pipx
pipx install pipenv
-
name: Set up Python
uses: actions/setup-python@v4
with:
python-version: 3.9
cache: "pipenv"
cache-dependency-path: 'Pipfile.lock'
-
name: Append Changelog to docs
id: append-Changelog
working-directory: docs
run: |
git branch ${{ needs.publish-release.outputs.version }}-changelog
git checkout ${{ needs.publish-release.outputs.version }}-changelog
echo -e "# Changelog\n\n${{ needs.publish-release.outputs.changelog }}\n" > changelog-new.md
echo "Manually linking usernames"
sed -i -r 's|@(.+?) \(\[#|[@\1](https://github.com/\1) ([#|ig' changelog-new.md
CURRENT_CHANGELOG=`tail --lines +2 changelog.md`
echo -e "$CURRENT_CHANGELOG" >> changelog-new.md
mv changelog-new.md changelog.md
pipenv run pre-commit run --files changelog.md
git config --global user.name "github-actions"
git config --global user.email "41898282+github-actions[bot]@users.noreply.github.com"
git commit -am "Changelog ${{ needs.publish-release.outputs.version }} - GHA"
git push origin ${{ needs.publish-release.outputs.version }}-changelog
-
name: Create Pull Request
uses: actions/github-script@v6
with:
script: |
const { repo, owner } = context.repo;
const result = await github.rest.pulls.create({
title: '[Documentation] Add ${{ needs.publish-release.outputs.version }} changelog',
owner,
repo,
head: '${{ needs.publish-release.outputs.version }}-changelog',
base: 'main',
body: 'This PR is auto-generated by CI.'
});
github.rest.issues.addLabels({
owner,
repo,
issue_number: result.data.number,
labels: ['documentation']
});

View File

@@ -1,95 +0,0 @@
# This workflow runs on certain conditions to check for and potentially
# delete container images from the GHCR which no longer have an associated
# code branch.
# Requires a PAT with the correct scope set in the secrets
name: Cleanup Image Tags
on:
schedule:
- cron: '0 0 * * SAT'
delete:
pull_request:
types:
- closed
push:
paths:
- ".github/workflows/cleanup-tags.yml"
- ".github/scripts/cleanup-tags.py"
- ".github/scripts/github.py"
- ".github/scripts/common.py"
concurrency:
group: registry-tags-cleanup
cancel-in-progress: false
jobs:
cleanup-images:
name: Cleanup Image Tags for ${{ matrix.primary-name }}
runs-on: ubuntu-latest
strategy:
matrix:
include:
- primary-name: "paperless-ngx"
cache-name: "paperless-ngx/builder/cache/app"
- primary-name: "paperless-ngx/builder/qpdf"
cache-name: "paperless-ngx/builder/cache/qpdf"
- primary-name: "paperless-ngx/builder/pikepdf"
cache-name: "paperless-ngx/builder/cache/pikepdf"
- primary-name: "paperless-ngx/builder/jbig2enc"
cache-name: "paperless-ngx/builder/cache/jbig2enc"
- primary-name: "paperless-ngx/builder/psycopg2"
cache-name: "paperless-ngx/builder/cache/psycopg2"
env:
# Requires a personal access token with the OAuth scope delete:packages
TOKEN: ${{ secrets.GHA_CONTAINER_DELETE_TOKEN }}
steps:
-
name: Checkout
uses: actions/checkout@v3
-
name: Login to Github Container Registry
uses: docker/login-action@v2
with:
registry: ghcr.io
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
-
name: Set up Python
uses: actions/setup-python@v4
with:
python-version: "3.10"
-
name: Install httpx
run: |
python -m pip install httpx
#
# Clean up primary package
#
-
name: Cleanup for package "${{ matrix.primary-name }}"
if: "${{ env.TOKEN != '' }}"
run: |
python ${GITHUB_WORKSPACE}/.github/scripts/cleanup-tags.py --untagged --is-manifest --delete "${{ matrix.primary-name }}"
#
# Clean up registry cache package
#
-
name: Cleanup for package "${{ matrix.cache-name }}"
if: "${{ env.TOKEN != '' }}"
run: |
python ${GITHUB_WORKSPACE}/.github/scripts/cleanup-tags.py --untagged --delete "${{ matrix.cache-name }}"
#
# Verify tags which are left still pull
#
-
name: Check all tags still pull
run: |
ghcr_name=$(echo "ghcr.io/${GITHUB_REPOSITORY_OWNER}/${{ matrix.primary-name }}" | awk '{ print tolower($0) }')
echo "Pulling all tags of ${ghcr_name}"
docker pull --quiet --all-tags ${ghcr_name}
docker image list

View File

@@ -38,11 +38,11 @@ jobs:
steps:
- name: Checkout repository
uses: actions/checkout@v3
uses: actions/checkout@v2
# Initializes the CodeQL tools for scanning.
- name: Initialize CodeQL
uses: github/codeql-action/init@v2
uses: github/codeql-action/init@v1
with:
languages: ${{ matrix.language }}
# If you wish to specify custom queries, you can do so here or in a config file.
@@ -51,4 +51,4 @@ jobs:
# queries: ./path/to/local/query, your-org/your-repo/queries@main
- name: Perform CodeQL Analysis
uses: github/codeql-action/analyze@v2
uses: github/codeql-action/analyze@v1

View File

@@ -1,170 +0,0 @@
# This workflow will run to update the installer library of
# Docker images. These are the images which provide updated wheels
# .deb installation packages or maybe just some compiled library
name: Build Image Library
on:
push:
# Must match one of these branches AND one of the paths
# to be triggered
branches:
- "main"
- "dev"
- "library-*"
- "feature-*"
paths:
# Trigger the workflow if a Dockerfile changed
- "docker-builders/**"
# Trigger if a package was updated
- ".build-config.json"
- "Pipfile.lock"
# Also trigger on workflow changes related to the library
- ".github/workflows/installer-library.yml"
- ".github/workflows/reusable-workflow-builder.yml"
- ".github/scripts/**"
# Set a workflow level concurrency group so primary workflow
# can wait for this to complete if needed
# DO NOT CHANGE without updating main workflow group
concurrency:
group: build-installer-library
cancel-in-progress: false
jobs:
prepare-docker-build:
name: Prepare Docker Image Version Data
runs-on: ubuntu-20.04
steps:
-
name: Set ghcr repository name
id: set-ghcr-repository
run: |
ghcr_name=$(echo "${GITHUB_REPOSITORY}" | awk '{ print tolower($0) }')
echo "repository=${ghcr_name}" >> $GITHUB_OUTPUT
-
name: Checkout
uses: actions/checkout@v3
-
name: Set up Python
uses: actions/setup-python@v4
with:
python-version: "3.9"
-
name: Install jq
run: |
sudo apt-get update
sudo apt-get install jq
-
name: Setup qpdf image
id: qpdf-setup
run: |
build_json=$(python ${GITHUB_WORKSPACE}/.github/scripts/get-build-json.py qpdf)
echo ${build_json}
echo "qpdf-json=${build_json}" >> $GITHUB_OUTPUT
-
name: Setup psycopg2 image
id: psycopg2-setup
run: |
build_json=$(python ${GITHUB_WORKSPACE}/.github/scripts/get-build-json.py psycopg2)
echo ${build_json}
echo "psycopg2-json=${build_json}" >> $GITHUB_OUTPUT
-
name: Setup pikepdf image
id: pikepdf-setup
run: |
build_json=$(python ${GITHUB_WORKSPACE}/.github/scripts/get-build-json.py pikepdf)
echo ${build_json}
echo "pikepdf-json=${build_json}" >> $GITHUB_OUTPUT
-
name: Setup jbig2enc image
id: jbig2enc-setup
run: |
build_json=$(python ${GITHUB_WORKSPACE}/.github/scripts/get-build-json.py jbig2enc)
echo ${build_json}
echo "jbig2enc-json=${build_json}" >> $GITHUB_OUTPUT
-
name: Setup other versions
id: cache-bust-setup
run: |
pillow_version=$(jq ".default.pillow.version" Pipfile.lock | sed 's/=//g' | sed 's/"//g')
lxml_version=$(jq ".default.lxml.version" Pipfile.lock | sed 's/=//g' | sed 's/"//g')
echo "Pillow is ${pillow_version}"
echo "lxml is ${lxml_version}"
echo "pillow-version=${pillow_version}" >> $GITHUB_OUTPUT
echo "lxml-version=${lxml_version}" >> $GITHUB_OUTPUT
outputs:
ghcr-repository: ${{ steps.set-ghcr-repository.outputs.repository }}
qpdf-json: ${{ steps.qpdf-setup.outputs.qpdf-json }}
pikepdf-json: ${{ steps.pikepdf-setup.outputs.pikepdf-json }}
psycopg2-json: ${{ steps.psycopg2-setup.outputs.psycopg2-json }}
jbig2enc-json: ${{ steps.jbig2enc-setup.outputs.jbig2enc-json }}
pillow-version: ${{ steps.cache-bust-setup.outputs.pillow-version }}
lxml-version: ${{ steps.cache-bust-setup.outputs.lxml-version }}
build-qpdf-debs:
name: qpdf
needs:
- prepare-docker-build
uses: ./.github/workflows/reusable-workflow-builder.yml
with:
dockerfile: ./docker-builders/Dockerfile.qpdf
build-json: ${{ needs.prepare-docker-build.outputs.qpdf-json }}
build-args: |
QPDF_VERSION=${{ fromJSON(needs.prepare-docker-build.outputs.qpdf-json).version }}
build-jbig2enc:
name: jbig2enc
needs:
- prepare-docker-build
uses: ./.github/workflows/reusable-workflow-builder.yml
with:
dockerfile: ./docker-builders/Dockerfile.jbig2enc
build-json: ${{ needs.prepare-docker-build.outputs.jbig2enc-json }}
build-args: |
JBIG2ENC_VERSION=${{ fromJSON(needs.prepare-docker-build.outputs.jbig2enc-json).version }}
build-psycopg2-wheel:
name: psycopg2
needs:
- prepare-docker-build
uses: ./.github/workflows/reusable-workflow-builder.yml
with:
dockerfile: ./docker-builders/Dockerfile.psycopg2
build-json: ${{ needs.prepare-docker-build.outputs.psycopg2-json }}
build-args: |
PSYCOPG2_VERSION=${{ fromJSON(needs.prepare-docker-build.outputs.psycopg2-json).version }}
build-pikepdf-wheel:
name: pikepdf
needs:
- prepare-docker-build
- build-qpdf-debs
uses: ./.github/workflows/reusable-workflow-builder.yml
with:
dockerfile: ./docker-builders/Dockerfile.pikepdf
build-json: ${{ needs.prepare-docker-build.outputs.pikepdf-json }}
build-args: |
REPO=${{ needs.prepare-docker-build.outputs.ghcr-repository }}
QPDF_VERSION=${{ fromJSON(needs.prepare-docker-build.outputs.qpdf-json).version }}
PIKEPDF_VERSION=${{ fromJSON(needs.prepare-docker-build.outputs.pikepdf-json).version }}
PILLOW_VERSION=${{ needs.prepare-docker-build.outputs.pillow-version }}
LXML_VERSION=${{ needs.prepare-docker-build.outputs.lxml-version }}

View File

@@ -13,9 +13,6 @@ on:
- main
- dev
permissions:
contents: read
env:
todo: Todo
done: Done
@@ -27,8 +24,8 @@ jobs:
runs-on: ubuntu-latest
if: github.event_name == 'issues' && (github.event.action == 'opened' || github.event.action == 'reopened')
steps:
- name: Add issue to project and set status to ${{ env.todo }}
uses: leonsteinhaeuser/project-beta-automations@v2.0.1
- name: Set issue status to ${{ env.todo }}
uses: leonsteinhaeuser/project-beta-automations@v1.2.1
with:
gh_token: ${{ secrets.GH_TOKEN }}
organization: paperless-ngx
@@ -38,20 +35,13 @@ jobs:
pr_opened_or_reopened:
name: pr_opened_or_reopened
runs-on: ubuntu-latest
permissions:
# write permission is required for autolabeler
pull-requests: write
if: github.event_name == 'pull_request_target' && (github.event.action == 'opened' || github.event.action == 'reopened') && github.event.pull_request.user.login != 'dependabot'
if: github.event_name == 'pull_request_target' && (github.event.action == 'opened' || github.event.action == 'reopened')
steps:
- name: Add PR to project and set status to "Needs Review"
uses: leonsteinhaeuser/project-beta-automations@v2.0.1
- name: Set PR status to ${{ env.in_progress }}
uses: leonsteinhaeuser/project-beta-automations@v1.2.1
with:
gh_token: ${{ secrets.GH_TOKEN }}
organization: paperless-ngx
project_id: 2
resource_node_id: ${{ github.event.pull_request.node_id }}
status_value: "Needs Review" # Target status
- name: Label PR with release-drafter
uses: release-drafter/release-drafter@v5
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
status_value: ${{ env.in_progress }} # Target status

View File

@@ -1,53 +0,0 @@
name: Reusable Image Builder
on:
workflow_call:
inputs:
dockerfile:
required: true
type: string
build-json:
required: true
type: string
build-args:
required: false
default: ""
type: string
concurrency:
group: ${{ github.workflow }}-${{ fromJSON(inputs.build-json).name }}-${{ fromJSON(inputs.build-json).version }}
cancel-in-progress: false
jobs:
build-image:
name: Build ${{ fromJSON(inputs.build-json).name }} @ ${{ fromJSON(inputs.build-json).version }}
runs-on: ubuntu-latest
steps:
-
name: Checkout
uses: actions/checkout@v3
-
name: Login to Github Container Registry
uses: docker/login-action@v2
with:
registry: ghcr.io
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
-
name: Set up Docker Buildx
uses: docker/setup-buildx-action@v2
-
name: Set up QEMU
uses: docker/setup-qemu-action@v2
-
name: Build ${{ fromJSON(inputs.build-json).name }}
uses: docker/build-push-action@v3
with:
context: .
file: ${{ inputs.dockerfile }}
tags: ${{ fromJSON(inputs.build-json).image_tag }}
platforms: linux/amd64,linux/arm64,linux/arm/v7
build-args: ${{ inputs.build-args }}
push: true
cache-from: type=registry,ref=${{ fromJSON(inputs.build-json).cache_tag }}
cache-to: type=registry,mode=max,ref=${{ fromJSON(inputs.build-json).cache_tag }}

11
.gitignore vendored
View File

@@ -63,14 +63,11 @@ target/
# VS Code
.vscode
/src-ui/.vscode
/docs/.vscode
# Other stuff that doesn't belong
.virtualenv
virtualenv
/venv
.venv/
/docker-compose.env
/docker-compose.yml
@@ -87,12 +84,8 @@ scripts/nuke
/paperless.conf
/consume/
/export/
/src-ui/.vscode
# this is where the compiled frontend is moved to.
/src/documents/static/frontend/
# mac os
.DS_Store
# celery schedule file
celerybeat-schedule*
/docs/.vscode/settings.json

View File

@@ -1,8 +0,0 @@
failure-threshold: warning
ignored:
# https://github.com/hadolint/hadolint/wiki/DL3008
- DL3008
# https://github.com/hadolint/hadolint/wiki/DL3013
- DL3013
# https://github.com/hadolint/hadolint/wiki/DL3003
- DL3003

View File

@@ -5,7 +5,7 @@
repos:
# General hooks
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.3.0
rev: v4.1.0
hooks:
- id: check-docstring-first
- id: check-json
@@ -27,7 +27,7 @@ repos:
- id: check-case-conflict
- id: detect-private-key
- repo: https://github.com/pre-commit/mirrors-prettier
rev: "v2.7.1"
rev: "v2.6.1"
hooks:
- id: prettier
types_or:
@@ -37,43 +37,36 @@ repos:
exclude: "(^Pipfile\\.lock$)"
# Python hooks
- repo: https://github.com/asottile/reorder_python_imports
rev: v3.9.0
rev: v3.0.1
hooks:
- id: reorder-python-imports
exclude: "(migrations)"
- repo: https://github.com/asottile/yesqa
rev: "v1.4.0"
rev: "v1.3.0"
hooks:
- id: yesqa
exclude: "(migrations)"
- repo: https://github.com/asottile/add-trailing-comma
rev: "v2.3.0"
rev: "v2.2.1"
hooks:
- id: add-trailing-comma
exclude: "(migrations)"
- repo: https://github.com/PyCQA/flake8
rev: 5.0.4
- repo: https://gitlab.com/pycqa/flake8
rev: 3.9.2
hooks:
- id: flake8
files: ^src/
args:
- "--config=./src/setup.cfg"
- repo: https://github.com/psf/black
rev: 22.10.0
rev: 22.3.0
hooks:
- id: black
- repo: https://github.com/asottile/pyupgrade
rev: v3.2.2
hooks:
- id: pyupgrade
exclude: "(migrations)"
args:
- "--py38-plus"
# Dockerfile hooks
- repo: https://github.com/AleksaC/hadolint-py
rev: v2.10.0
- repo: https://github.com/pryorda/dockerfilelint-precommit-hooks
rev: "v0.1.0"
hooks:
- id: hadolint
- id: dockerfilelint
# Shell script hooks
- repo: https://github.com/lovesegfault/beautysh
rev: v6.2.1

View File

@@ -1,9 +0,0 @@
/.github/workflows/ @paperless-ngx/ci-cd
/docker/ @paperless-ngx/ci-cd
/scripts/ @paperless-ngx/ci-cd
/src-ui/ @paperless-ngx/frontend
/src/ @paperless-ngx/backend
Pipfile* @paperless-ngx/backend
*.py @paperless-ngx/backend

View File

@@ -1,55 +1,12 @@
# syntax=docker/dockerfile:1.4
FROM node:16 AS compile-frontend
# Pull the installer images from the library
# These are all built previously
# They provide either a .deb or .whl
ARG JBIG2ENC_VERSION
ARG QPDF_VERSION
ARG PIKEPDF_VERSION
ARG PSYCOPG2_VERSION
FROM ghcr.io/paperless-ngx/paperless-ngx/builder/jbig2enc:${JBIG2ENC_VERSION} as jbig2enc-builder
FROM ghcr.io/paperless-ngx/paperless-ngx/builder/qpdf:${QPDF_VERSION} as qpdf-builder
FROM ghcr.io/paperless-ngx/paperless-ngx/builder/pikepdf:${PIKEPDF_VERSION} as pikepdf-builder
FROM ghcr.io/paperless-ngx/paperless-ngx/builder/psycopg2:${PSYCOPG2_VERSION} as psycopg2-builder
FROM --platform=$BUILDPLATFORM node:16-bullseye-slim AS compile-frontend
# This stage compiles the frontend
# This stage runs once for the native platform, as the outputs are not
# dependent on target arch
# Inputs: None
COPY ./src-ui /src/src-ui
COPY . /src
WORKDIR /src/src-ui
RUN set -eux \
&& npm update npm -g \
&& npm ci --omit=optional
RUN set -eux \
&& ./node_modules/.bin/ng build --configuration production
RUN npm update npm -g && npm ci --no-optional
RUN ./node_modules/.bin/ng build --configuration production
FROM --platform=$BUILDPLATFORM python:3.9-slim-bullseye as pipenv-base
# This stage generates the requirements.txt file using pipenv
# This stage runs once for the native platform, as the outputs are not
# dependent on target arch
# This way, pipenv dependencies are not left in the final image
# nor can pipenv mess up the final image somehow
# Inputs: None
WORKDIR /usr/src/pipenv
COPY Pipfile* ./
RUN set -eux \
&& echo "Installing pipenv" \
&& python3 -m pip install --no-cache-dir --upgrade pipenv \
&& echo "Generating requirement.txt" \
&& pipenv requirements > requirements.txt
FROM python:3.9-slim-bullseye as main-app
FROM ghcr.io/paperless-ngx/builder/ngx-base:1.0 as main-app
LABEL org.opencontainers.image.authors="paperless-ngx team <hello@paperless-ngx.com>"
LABEL org.opencontainers.image.documentation="https://paperless-ngx.readthedocs.io/en/latest/"
@@ -57,187 +14,45 @@ LABEL org.opencontainers.image.source="https://github.com/paperless-ngx/paperles
LABEL org.opencontainers.image.url="https://github.com/paperless-ngx/paperless-ngx"
LABEL org.opencontainers.image.licenses="GPL-3.0-only"
ARG DEBIAN_FRONTEND=noninteractive
#
# Begin installation and configuration
# Order the steps below from least often changed to most
#
# copy jbig2enc
# Basically will never change again
COPY --from=jbig2enc-builder /usr/src/jbig2enc/src/.libs/libjbig2enc* /usr/local/lib/
COPY --from=jbig2enc-builder /usr/src/jbig2enc/src/jbig2 /usr/local/bin/
COPY --from=jbig2enc-builder /usr/src/jbig2enc/src/*.h /usr/local/include/
# Packages need for running
ARG RUNTIME_PACKAGES="\
curl \
file \
# fonts for text file thumbnail generation
fonts-liberation \
gettext \
ghostscript \
gnupg \
gosu \
icc-profiles-free \
imagemagick \
media-types \
liblept5 \
libpq5 \
libxml2 \
liblcms2-2 \
libtiff5 \
libxslt1.1 \
libfreetype6 \
libwebp6 \
libopenjp2-7 \
libimagequant0 \
libraqm0 \
libgnutls30 \
libjpeg62-turbo \
python3 \
python3-pip \
python3-setuptools \
postgresql-client \
mariadb-client \
# For Numpy
libatlas3-base \
# OCRmyPDF dependencies
tesseract-ocr \
tesseract-ocr-eng \
tesseract-ocr-deu \
tesseract-ocr-fra \
tesseract-ocr-ita \
tesseract-ocr-spa \
# Suggested for OCRmyPDF
pngquant \
# Suggested for pikepdf
jbig2dec \
tzdata \
unpaper \
# Mime type detection
zlib1g \
# Barcode splitter
libzbar0 \
poppler-utils"
# Install basic runtime packages.
# These change very infrequently
RUN set -eux \
echo "Installing system packages" \
&& apt-get update \
&& apt-get install --yes --quiet --no-install-recommends ${RUNTIME_PACKAGES} \
&& rm -rf /var/lib/apt/lists/* \
&& echo "Installing supervisor" \
&& python3 -m pip install --default-timeout=1000 --upgrade --no-cache-dir supervisor==4.2.4
# Copy gunicorn config
# Changes very infrequently
WORKDIR /usr/src/paperless/
COPY gunicorn.conf.py .
# setup docker-specific things
# Use mounts to avoid copying installer files into the image
# These change sometimes, but rarely
WORKDIR /usr/src/paperless/src/docker/
COPY [ \
"docker/imagemagick-policy.xml", \
"docker/supervisord.conf", \
"docker/docker-entrypoint.sh", \
"docker/docker-prepare.sh", \
"docker/paperless_cmd.sh", \
"docker/wait-for-redis.py", \
"docker/management_script.sh", \
"docker/flower-conditional.sh", \
"docker/install_management_commands.sh", \
"/usr/src/paperless/src/docker/" \
]
RUN set -eux \
&& echo "Configuring ImageMagick" \
&& mv imagemagick-policy.xml /etc/ImageMagick-6/policy.xml \
&& echo "Configuring supervisord" \
&& mkdir /var/log/supervisord /var/run/supervisord \
&& mv supervisord.conf /etc/supervisord.conf \
&& echo "Setting up Docker scripts" \
&& mv docker-entrypoint.sh /sbin/docker-entrypoint.sh \
&& chmod 755 /sbin/docker-entrypoint.sh \
&& mv docker-prepare.sh /sbin/docker-prepare.sh \
&& chmod 755 /sbin/docker-prepare.sh \
&& mv wait-for-redis.py /sbin/wait-for-redis.py \
&& chmod 755 /sbin/wait-for-redis.py \
&& mv paperless_cmd.sh /usr/local/bin/paperless_cmd.sh \
&& chmod 755 /usr/local/bin/paperless_cmd.sh \
&& mv flower-conditional.sh /usr/local/bin/flower-conditional.sh \
&& chmod 755 /usr/local/bin/flower-conditional.sh \
&& echo "Installing managment commands" \
&& chmod +x install_management_commands.sh \
&& ./install_management_commands.sh
# Install the built packages from the installer library images
# Use mounts to avoid copying installer files into the image
# These change sometimes
RUN --mount=type=bind,from=qpdf-builder,target=/qpdf \
--mount=type=bind,from=psycopg2-builder,target=/psycopg2 \
--mount=type=bind,from=pikepdf-builder,target=/pikepdf \
set -eux \
&& echo "Installing qpdf" \
&& apt-get install --yes --no-install-recommends /qpdf/usr/src/qpdf/libqpdf29_*.deb \
&& apt-get install --yes --no-install-recommends /qpdf/usr/src/qpdf/qpdf_*.deb \
&& echo "Installing pikepdf and dependencies" \
&& python3 -m pip install --no-cache-dir /pikepdf/usr/src/wheels/pyparsing*.whl \
&& python3 -m pip install --no-cache-dir /pikepdf/usr/src/wheels/packaging*.whl \
&& python3 -m pip install --no-cache-dir /pikepdf/usr/src/wheels/lxml*.whl \
&& python3 -m pip install --no-cache-dir /pikepdf/usr/src/wheels/Pillow*.whl \
&& python3 -m pip install --no-cache-dir /pikepdf/usr/src/wheels/pikepdf*.whl \
&& python3 -m pip list \
&& echo "Installing psycopg2" \
&& python3 -m pip install --no-cache-dir /psycopg2/usr/src/wheels/psycopg2*.whl \
&& python3 -m pip list
WORKDIR /usr/src/paperless/src/
COPY requirements.txt ../
# Python dependencies
# Change pretty frequently
COPY --from=pipenv-base /usr/src/pipenv/requirements.txt ./
RUN apt-get update \
# python-Levenshtein still needs to be compiled here
&& apt-get -y --no-install-recommends install \
build-essential \
&& python3 -m pip install --upgrade --no-cache-dir pip wheel \
&& python3 -m pip install --default-timeout=1000 --upgrade --no-cache-dir supervisor \
&& python3 -m pip install --default-timeout=1000 --no-cache-dir -r ../requirements.txt \
&& apt-get -y purge build-essential \
&& apt-get -y autoremove --purge \
&& rm -rf /var/lib/apt/lists/*
# Packages needed only for building a few quick Python
# dependencies
ARG BUILD_PACKAGES="\
build-essential \
git \
default-libmysqlclient-dev \
python3-dev"
# setup docker-specific things
COPY docker/ ./docker/
RUN set -eux \
&& echo "Installing build system packages" \
&& apt-get update \
&& apt-get install --yes --quiet --no-install-recommends ${BUILD_PACKAGES} \
&& python3 -m pip install --no-cache-dir --upgrade wheel \
&& echo "Installing Python requirements" \
&& python3 -m pip install --default-timeout=1000 --no-cache-dir --requirement requirements.txt \
&& echo "Cleaning up image" \
&& apt-get -y purge ${BUILD_PACKAGES} \
&& apt-get -y autoremove --purge \
&& apt-get clean --yes \
&& rm -rf /var/lib/apt/lists/* \
&& rm -rf /tmp/* \
&& rm -rf /var/tmp/* \
&& rm -rf /var/cache/apt/archives/* \
&& truncate -s 0 /var/log/*log
RUN cd docker \
&& cp imagemagick-policy.xml /etc/ImageMagick-6/policy.xml \
&& mkdir /var/log/supervisord /var/run/supervisord \
&& cp supervisord.conf /etc/supervisord.conf \
&& cp docker-entrypoint.sh /sbin/docker-entrypoint.sh \
&& chmod 755 /sbin/docker-entrypoint.sh \
&& cp docker-prepare.sh /sbin/docker-prepare.sh \
&& chmod 755 /sbin/docker-prepare.sh \
&& chmod +x install_management_commands.sh \
&& ./install_management_commands.sh \
&& cd .. \
&& rm -rf docker/
# copy backend
COPY ./src ./
COPY gunicorn.conf.py ../
# copy frontend
COPY --from=compile-frontend /src/src/documents/static/frontend/ ./documents/static/frontend/
# copy app
COPY --from=compile-frontend /src/src/ ./
# add users, setup scripts
RUN set -eux \
&& addgroup --gid 1000 paperless \
RUN addgroup --gid 1000 paperless \
&& useradd --uid 1000 --gid paperless --home-dir /usr/src/paperless paperless \
&& chown -R paperless:paperless ../ \
&& gosu paperless python3 manage.py collectstatic --clear --no-input \
@@ -252,4 +67,4 @@ ENTRYPOINT ["/sbin/docker-entrypoint.sh"]
EXPOSE 8000
CMD ["/usr/local/bin/paperless_cmd.sh"]
CMD ["/usr/local/bin/supervisord", "-c", "/etc/supervisord.conf"]

47
Pipfile
View File

@@ -10,56 +10,47 @@ name = "piwheels"
[packages]
dateparser = "~=1.1"
django = "~=4.1"
django = "~=4.0"
django-cors-headers = "*"
django-extensions = "*"
django-filter = "~=22.1"
djangorestframework = "~=3.14"
django-filter = "~=21.1"
django-q = "~=1.3"
djangorestframework = "~=3.13"
filelock = "*"
fuzzywuzzy = {extras = ["speedup"], version = "*"}
gunicorn = "*"
imap-tools = "*"
langdetect = "*"
pathvalidate = "*"
pillow = "~=9.3"
pikepdf = "*"
pillow = "~=9.0"
# Any version update to pikepdf requires a base image update
pikepdf = "~=5.1"
python-gnupg = "*"
python-dotenv = "*"
python-dateutil = "*"
python-magic = "*"
# Any version update to psycopg2 requires a base image update
psycopg2 = "*"
rapidfuzz = "*"
redis = {extras = ["hiredis"], version = "*"}
scikit-learn = "~=1.1"
# Pin this until piwheels is building 1.9 (see https://www.piwheels.org/project/scipy/)
scipy = "==1.8.1"
numpy = "*"
whitenoise = "~=6.2"
watchdog = "~=2.1"
whoosh="~=2.7"
redis = "*"
# Pinned because aarch64 wheels and updates cause warnings when loading the classifier model.
scikit-learn="==1.0.2"
whitenoise = "~=6.0.0"
watchdog = "~=2.1.0"
whoosh="~=2.7.4"
inotifyrecursive = "~=0.3"
ocrmypdf = "~=14.0"
ocrmypdf = "~=13.4"
tqdm = "*"
tika = "*"
# TODO: This will sadly also install daphne+dependencies,
# which an ASGI server we don't need. Adds about 15MB image size.
channels = "~=3.0"
# Locked version until https://github.com/django/channels_redis/issues/332
# is resolved
channels-redis = "==3.4.1"
channels-redis = "*"
uvicorn = {extras = ["standard"], version = "*"}
concurrent-log-handler = "*"
"pdfminer.six" = "*"
"backports.zoneinfo" = {version = "*", markers = "python_version < '3.9'"}
"importlib-resources" = {version = "*", markers = "python_version < '3.9'"}
zipp = {version = "*", markers = "python_version < '3.9'"}
pyzbar = "*"
mysqlclient = "*"
celery = {extras = ["redis"], version = "*"}
django-celery-results = "*"
setproctitle = "*"
nltk = "*"
pdf2image = "*"
flower = "*"
[dev-packages]
coveralls = "*"
@@ -71,10 +62,8 @@ pytest-django = "*"
pytest-env = "*"
pytest-sugar = "*"
pytest-xdist = "*"
sphinx = "~=5.3"
sphinx = "~=4.4.0"
sphinx_rtd_theme = "*"
tox = "*"
black = "*"
pre-commit = "*"
sphinx-autobuild = "*"
myst-parser = "*"

2480
Pipfile.lock generated

File diff suppressed because it is too large Load Diff

View File

@@ -2,7 +2,7 @@
[![Crowdin](https://badges.crowdin.net/paperless-ngx/localized.svg)](https://crowdin.com/project/paperless-ngx)
[![Documentation Status](https://readthedocs.org/projects/paperless-ngx/badge/?version=latest)](https://paperless-ngx.readthedocs.io/en/latest/?badge=latest)
[![Coverage Status](https://coveralls.io/repos/github/paperless-ngx/paperless-ngx/badge.svg?branch=master)](https://coveralls.io/github/paperless-ngx/paperless-ngx?branch=master)
[![Chat on Matrix](https://matrix.to/img/matrix-badge.svg)](https://matrix.to/#/%23paperlessngx%3Amatrix.org)
[![Chat on Matrix](https://matrix.to/img/matrix-badge.svg)](https://matrix.to/#/#paperless:adnidor.de)
<p align="center">
<img src="https://github.com/paperless-ngx/paperless-ngx/raw/main/resources/logo/web/png/Black%20logo%20-%20no%20background.png#gh-light-mode-only" width="50%" />
@@ -102,10 +102,9 @@ For bugs please [open an issue](https://github.com/paperless-ngx/paperless-ngx/i
Paperless has been around a while now, and people are starting to build stuff on top of it. If you're one of those people, we can add your project to this list:
- [Paperless App](https://github.com/bauerj/paperless_app): An Android/iOS app for Paperless-ngx. Also works with the original Paperless and Paperless-ng.
- [Paperless App](https://github.com/bauerj/paperless_app): An Android/iOS app for Paperless-ngx. Also works with the original Paperless and Paperless-ngx.
- [Paperless Share](https://github.com/qcasey/paperless_share). Share any files from your Android application with paperless. Very simple, but works with all of the mobile scanning apps out there that allow you to share scanned documents.
- [Scan to Paperless](https://github.com/sbrunner/scan-to-paperless): Scan and prepare (crop, deskew, OCR, ...) your documents for Paperless.
- [Paperless Mobile](https://github.com/astubenbord/paperless-mobile): A modern, feature rich mobile application for Paperless.
These projects also exist, but their status and compatibility with paperless-ngx is unknown.

View File

@@ -1,47 +0,0 @@
#!/usr/bin/env bash
# Helper script for building the Docker image locally.
# Parses and provides the nessecary versions of other images to Docker
# before passing in the rest of script args.
# First Argument: The Dockerfile to build
# Other Arguments: Additional arguments to docker build
# Example Usage:
# ./build-docker-image.sh Dockerfile -t paperless-ngx:my-awesome-feature
set -eux
if ! command -v jq; then
echo "jq required"
exit 1
elif [ ! -f "$1" ]; then
echo "$1 is not a file, please provide the Dockerfile"
exit 1
fi
# Parse what we can from Pipfile.lock
pikepdf_version=$(jq ".default.pikepdf.version" Pipfile.lock | sed 's/=//g' | sed 's/"//g')
psycopg2_version=$(jq ".default.psycopg2.version" Pipfile.lock | sed 's/=//g' | sed 's/"//g')
pillow_version=$(jq ".default.pillow.version" Pipfile.lock | sed 's/=//g' | sed 's/"//g')
lxml_version=$(jq ".default.lxml.version" Pipfile.lock | sed 's/=//g' | sed 's/"//g')
# Read this from the other config file
qpdf_version=$(jq ".qpdf.version" .build-config.json | sed 's/"//g')
jbig2enc_version=$(jq ".jbig2enc.version" .build-config.json | sed 's/"//g')
# Get the branch name (used for caching)
branch_name=$(git rev-parse --abbrev-ref HEAD)
# https://docs.docker.com/develop/develop-images/build_enhancements/
# Required to use cache-from
export DOCKER_BUILDKIT=1
docker build --file "$1" \
--progress=plain \
--cache-from ghcr.io/paperless-ngx/paperless-ngx/builder/cache/app:"${branch_name}" \
--cache-from ghcr.io/paperless-ngx/paperless-ngx/builder/cache/app:dev \
--build-arg JBIG2ENC_VERSION="${jbig2enc_version}" \
--build-arg QPDF_VERSION="${qpdf_version}" \
--build-arg PIKEPDF_VERSION="${pikepdf_version}" \
--build-arg PILLOW_VERSION="${pillow_version}" \
--build-arg LXML_VERSION="${lxml_version}" \
--build-arg PSYCOPG2_VERSION="${psycopg2_version}" "${@:2}" .

View File

@@ -1,35 +0,0 @@
# This Dockerfile compiles the jbig2enc library
# Inputs:
# - JBIG2ENC_VERSION - the Git tag to checkout and build
FROM debian:bullseye-slim as main
LABEL org.opencontainers.image.description="A intermediate image with jbig2enc built"
ARG DEBIAN_FRONTEND=noninteractive
ARG JBIG2ENC_VERSION
ARG BUILD_PACKAGES="\
build-essential \
automake \
libtool \
libleptonica-dev \
zlib1g-dev \
git \
ca-certificates"
WORKDIR /usr/src/jbig2enc
RUN set -eux \
&& echo "Installing build tools" \
&& apt-get update --quiet \
&& apt-get install --yes --quiet --no-install-recommends ${BUILD_PACKAGES} \
&& echo "Building jbig2enc" \
&& git clone --quiet --branch $JBIG2ENC_VERSION https://github.com/agl/jbig2enc . \
&& ./autogen.sh \
&& ./configure \
&& make \
&& echo "Cleaning up image" \
&& apt-get -y purge ${BUILD_PACKAGES} \
&& apt-get -y autoremove --purge \
&& rm -rf /var/lib/apt/lists/*

View File

@@ -1,92 +0,0 @@
# This Dockerfile builds the pikepdf wheel
# Inputs:
# - REPO - Docker repository to pull qpdf from
# - QPDF_VERSION - The image qpdf version to copy .deb files from
# - PIKEPDF_VERSION - Version of pikepdf to build wheel for
# Default to pulling from the main repo registry when manually building
ARG REPO="paperless-ngx/paperless-ngx"
ARG QPDF_VERSION
FROM ghcr.io/${REPO}/builder/qpdf:${QPDF_VERSION} as qpdf-builder
# This does nothing, except provide a name for a copy below
FROM python:3.9-slim-bullseye as main
LABEL org.opencontainers.image.description="A intermediate image with pikepdf wheel built"
ARG DEBIAN_FRONTEND=noninteractive
ARG PIKEPDF_VERSION
# These are not used, but will still bust the cache if one changes
# Otherwise, the main image will try to build thing (and fail)
ARG PILLOW_VERSION
ARG LXML_VERSION
ARG BUILD_PACKAGES="\
build-essential \
python3-dev \
python3-pip \
# qpdf requirement - https://github.com/qpdf/qpdf#crypto-providers
libgnutls28-dev \
# lxml requrements - https://lxml.de/installation.html
libxml2-dev \
libxslt1-dev \
# Pillow requirements - https://pillow.readthedocs.io/en/stable/installation.html#external-libraries
# JPEG functionality
libjpeg62-turbo-dev \
# conpressed PNG
zlib1g-dev \
# compressed TIFF
libtiff-dev \
# type related services
libfreetype-dev \
# color management
liblcms2-dev \
# WebP format
libwebp-dev \
# JPEG 2000
libopenjp2-7-dev \
# improved color quantization
libimagequant-dev \
# complex text layout support
libraqm-dev"
WORKDIR /usr/src
COPY --from=qpdf-builder /usr/src/qpdf/*.deb ./
# As this is an base image for a multi-stage final image
# the added size of the install is basically irrelevant
RUN set -eux \
&& echo "Installing build tools" \
&& apt-get update --quiet \
&& apt-get install --yes --quiet --no-install-recommends ${BUILD_PACKAGES} \
&& echo "Installing qpdf" \
&& dpkg --install libqpdf29_*.deb \
&& dpkg --install libqpdf-dev_*.deb \
&& echo "Installing Python tools" \
&& python3 -m pip install --no-cache-dir --upgrade \
pip \
wheel \
# https://pikepdf.readthedocs.io/en/latest/installation.html#requirements
pybind11 \
&& echo "Building pikepdf wheel ${PIKEPDF_VERSION}" \
&& mkdir wheels \
&& python3 -m pip wheel \
# Build the package at the required version
pikepdf==${PIKEPDF_VERSION} \
# Output the *.whl into this directory
--wheel-dir wheels \
# Do not use a binary packge for the package being built
--no-binary=pikepdf \
# Do use binary packages for dependencies
--prefer-binary \
# Don't cache build files
--no-cache-dir \
&& ls -ahl wheels \
&& echo "Cleaning up image" \
&& apt-get -y purge ${BUILD_PACKAGES} \
&& apt-get -y autoremove --purge \
&& rm -rf /var/lib/apt/lists/*

View File

@@ -1,48 +0,0 @@
# This Dockerfile builds the psycopg2 wheel
# Inputs:
# - PSYCOPG2_VERSION - Version to build
FROM python:3.9-slim-bullseye as main
LABEL org.opencontainers.image.description="A intermediate image with psycopg2 wheel built"
ARG PSYCOPG2_VERSION
ARG DEBIAN_FRONTEND=noninteractive
ARG BUILD_PACKAGES="\
build-essential \
python3-dev \
python3-pip \
# https://www.psycopg.org/docs/install.html#prerequisites
libpq-dev"
WORKDIR /usr/src
# As this is an base image for a multi-stage final image
# the added size of the install is basically irrelevant
RUN set -eux \
&& echo "Installing build tools" \
&& apt-get update --quiet \
&& apt-get install --yes --quiet --no-install-recommends ${BUILD_PACKAGES} \
&& echo "Installing Python tools" \
&& python3 -m pip install --no-cache-dir --upgrade pip wheel \
&& echo "Building psycopg2 wheel ${PSYCOPG2_VERSION}" \
&& cd /usr/src \
&& mkdir wheels \
&& python3 -m pip wheel \
# Build the package at the required version
psycopg2==${PSYCOPG2_VERSION} \
# Output the *.whl into this directory
--wheel-dir wheels \
# Do not use a binary packge for the package being built
--no-binary=psycopg2 \
# Do use binary packages for dependencies
--prefer-binary \
# Don't cache build files
--no-cache-dir \
&& ls -ahl wheels/ \
&& echo "Cleaning up image" \
&& apt-get -y purge ${BUILD_PACKAGES} \
&& apt-get -y autoremove --purge \
&& rm -rf /var/lib/apt/lists/*

View File

@@ -1,48 +0,0 @@
# This Dockerfile compiles the jbig2enc library
# Inputs:
# - QPDF_VERSION - the version of qpdf to build a .deb.
# Must be present as a deb-src in bookworm
FROM debian:bullseye-slim as main
LABEL org.opencontainers.image.description="A intermediate image with qpdf built"
ARG DEBIAN_FRONTEND=noninteractive
# This must match to pikepdf's minimum at least
ARG QPDF_VERSION
ARG BUILD_PACKAGES="\
build-essential \
debhelper \
debian-keyring \
devscripts \
equivs \
libtool \
# https://qpdf.readthedocs.io/en/stable/installation.html#system-requirements
libjpeg62-turbo-dev \
libgnutls28-dev \
packaging-dev \
cmake \
zlib1g-dev"
WORKDIR /usr/src
RUN set -eux \
&& echo "Installing build tools" \
&& apt-get update --quiet \
&& apt-get install --yes --quiet --no-install-recommends $BUILD_PACKAGES \
&& echo "Getting qpdf src" \
&& echo "deb-src http://deb.debian.org/debian/ bookworm main" > /etc/apt/sources.list.d/bookworm-src.list \
&& apt-get update \
&& mkdir qpdf \
&& cd qpdf \
&& apt-get source --yes --quiet qpdf=${QPDF_VERSION}-1/bookworm \
&& echo "Building qpdf" \
&& cd qpdf-$QPDF_VERSION \
&& export DEB_BUILD_OPTIONS="terse nocheck nodoc parallel=2" \
&& dpkg-buildpackage --build=binary --unsigned-source --unsigned-changes --post-clean \
&& ls -ahl ../*.deb \
&& echo "Cleaning up image" \
&& apt-get -y purge ${BUILD_PACKAGES} \
&& apt-get -y autoremove --purge \
&& rm -rf /var/lib/apt/lists/*

View File

@@ -22,10 +22,6 @@
# Docker setup does not use the configuration file.
# A few commonly adjusted settings are provided below.
# This is required if you will be exposing Paperless-ngx on a public domain
# (if doing so please consider security measures such as reverse proxy)
#PAPERLESS_URL=https://paperless.example.com
# Adjust this key if you plan to make paperless available publicly. It should
# be a very long sequence of random characters. You don't need to remember it.
#PAPERLESS_SECRET_KEY=change-me
@@ -36,7 +32,3 @@
# The default language to use for OCR. Set this to the language most of your
# documents are written in.
#PAPERLESS_OCR_LANGUAGE=eng
# Set if accessing paperless via a domain subpath e.g. https://domain.com/PATHPREFIX and using a reverse-proxy like traefik or nginx
#PAPERLESS_FORCE_SCRIPT_NAME=/PATHPREFIX
#PAPERLESS_STATIC_URL=/PATHPREFIX/static/ # trailing slash required

View File

@@ -1,83 +0,0 @@
# docker-compose file for running paperless from the Docker Hub.
# This file contains everything paperless needs to run.
# Paperless supports amd64, arm and arm64 hardware.
#
# All compose files of paperless configure paperless in the following way:
#
# - Paperless is (re)started on system boot, if it was running before shutdown.
# - Docker volumes for storing data are managed by Docker.
# - Folders for importing and exporting files are created in the same directory
# as this file and mounted to the correct folders inside the container.
# - Paperless listens on port 8000.
#
# In addition to that, this docker-compose file adds the following optional
# configurations:
#
# - Instead of SQLite (default), MariaDB is used as the database server.
#
# To install and update paperless with this file, do the following:
#
# - Copy this file as 'docker-compose.yml' and the files 'docker-compose.env'
# and '.env' into a folder.
# - Run 'docker-compose pull'.
# - Run 'docker-compose run --rm webserver createsuperuser' to create a user.
# - Run 'docker-compose up -d'.
#
# For more extensive installation and update instructions, refer to the
# documentation.
version: "3.4"
services:
broker:
image: docker.io/library/redis:7
restart: unless-stopped
volumes:
- redisdata:/data
db:
image: docker.io/library/mariadb:10
restart: unless-stopped
volumes:
- dbdata:/var/lib/mysql
environment:
MARIADB_HOST: paperless
MARIADB_DATABASE: paperless
MARIADB_USER: paperless
MARIADB_PASSWORD: paperless
MARIADB_ROOT_PASSWORD: paperless
ports:
- "3306:3306"
webserver:
image: ghcr.io/paperless-ngx/paperless-ngx:latest
restart: unless-stopped
depends_on:
- db
- broker
ports:
- 8000:8000
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8000"]
interval: 30s
timeout: 10s
retries: 5
volumes:
- data:/usr/src/paperless/data
- media:/usr/src/paperless/media
- ./export:/usr/src/paperless/export
- ./consume:/usr/src/paperless/consume
env_file: docker-compose.env
environment:
PAPERLESS_REDIS: redis://broker:6379
PAPERLESS_DBENGINE: mariadb
PAPERLESS_DBHOST: db
PAPERLESS_DBUSER: paperless # only needed if non-default username
PAPERLESS_DBPASS: paperless # only needed if non-default password
PAPERLESS_DBPORT: 3306
volumes:
data:
media:
dbdata:
redisdata:

View File

@@ -31,13 +31,13 @@
version: "3.4"
services:
broker:
image: docker.io/library/redis:7
image: redis:6.0
restart: unless-stopped
volumes:
- redisdata:/data
db:
image: docker.io/library/postgres:13
image: postgres:13
restart: unless-stopped
volumes:
- pgdata:/var/lib/postgresql/data
@@ -55,7 +55,7 @@ services:
ports:
- 8010:8000
healthcheck:
test: ["CMD", "curl", "-fs", "-S", "--max-time", "2", "http://localhost:8000"]
test: ["CMD", "curl", "-f", "http://localhost:8000"]
interval: 30s
timeout: 10s
retries: 5

View File

@@ -1,6 +1,7 @@
# docker-compose file for running paperless from the docker container registry.
# This file contains everything paperless needs to run.
# Paperless supports amd64, arm and arm64 hardware.
# Paperless supports amd64, arm and arm64 hardware. The apache/tika image
# does not support arm or arm64, however.
#
# All compose files of paperless configure paperless in the following way:
#
@@ -33,13 +34,13 @@
version: "3.4"
services:
broker:
image: docker.io/library/redis:7
image: redis:6.0
restart: unless-stopped
volumes:
- redisdata:/data
db:
image: docker.io/library/postgres:13
image: postgres:13
restart: unless-stopped
volumes:
- pgdata:/var/lib/postgresql/data
@@ -59,7 +60,7 @@ services:
ports:
- 8000:8000
healthcheck:
test: ["CMD", "curl", "-fs", "-S", "--max-time", "2", "http://localhost:8000"]
test: ["CMD", "curl", "-f", "http://localhost:8000"]
interval: 30s
timeout: 10s
retries: 5
@@ -77,14 +78,14 @@ services:
PAPERLESS_TIKA_ENDPOINT: http://tika:9998
gotenberg:
image: docker.io/gotenberg/gotenberg:7.6
image: gotenberg/gotenberg:7
restart: unless-stopped
command:
- "gotenberg"
- "--chromium-disable-routes=true"
tika:
image: ghcr.io/paperless-ngx/tika:latest
image: apache/tika
restart: unless-stopped
volumes:

View File

@@ -29,13 +29,13 @@
version: "3.4"
services:
broker:
image: docker.io/library/redis:7
image: redis:6.0
restart: unless-stopped
volumes:
- redisdata:/data
db:
image: docker.io/library/postgres:13
image: postgres:13
restart: unless-stopped
volumes:
- pgdata:/var/lib/postgresql/data
@@ -53,7 +53,7 @@ services:
ports:
- 8000:8000
healthcheck:
test: ["CMD", "curl", "-fs", "-S", "--max-time", "2", "http://localhost:8000"]
test: ["CMD", "curl", "-f", "http://localhost:8000"]
interval: 30s
timeout: 10s
retries: 5

View File

@@ -1,4 +1,4 @@
# docker-compose file for running paperless from the Docker Hub.
# docker-compose file for running paperless from the docker container registry.
# This file contains everything paperless needs to run.
# Paperless supports amd64, arm and arm64 hardware.
#
@@ -10,10 +10,14 @@
# as this file and mounted to the correct folders inside the container.
# - Paperless listens on port 8000.
#
# SQLite is used as the database. The SQLite file is stored in the data volume.
#
# iwishiwasaneagle/apache-tika-arm docker image is used to enable arm64 arch
# which apache/tika does not currently support.
#
# In addition to that, this docker-compose file adds the following optional
# configurations:
#
# - Instead of SQLite (default), MariaDB is used as the database server.
# - Apache Tika and Gotenberg servers are started with paperless and paperless
# is configured to use these services. These provide support for consuming
# Office documents (Word, Excel, Power Point and their LibreOffice counter-
@@ -33,30 +37,15 @@
version: "3.4"
services:
broker:
image: docker.io/library/redis:7
image: redis:6.0
restart: unless-stopped
volumes:
- redisdata:/data
db:
image: docker.io/library/mariadb:10
restart: unless-stopped
volumes:
- dbdata:/var/lib/mysql
environment:
MARIADB_HOST: paperless
MARIADB_DATABASE: paperless
MARIADB_USER: paperless
MARIADB_PASSWORD: paperless
MARIADB_ROOT_PASSWORD: paperless
ports:
- "3306:3306"
webserver:
image: ghcr.io/paperless-ngx/paperless-ngx:latest
restart: unless-stopped
depends_on:
- db
- broker
- gotenberg
- tika
@@ -75,28 +64,21 @@ services:
env_file: docker-compose.env
environment:
PAPERLESS_REDIS: redis://broker:6379
PAPERLESS_DBENGINE: mariadb
PAPERLESS_DBHOST: db
PAPERLESS_DBUSER: paperless # only needed if non-default username
PAPERLESS_DBPASS: paperless # only needed if non-default password
PAPERLESS_DBPORT: 3306
PAPERLESS_TIKA_ENABLED: 1
PAPERLESS_TIKA_GOTENBERG_ENDPOINT: http://gotenberg:3000
PAPERLESS_TIKA_ENDPOINT: http://tika:9998
gotenberg:
image: docker.io/gotenberg/gotenberg:7.6
image: thecodingmachine/gotenberg
restart: unless-stopped
command:
- "gotenberg"
- "--chromium-disable-routes=true"
environment:
DISABLE_GOOGLE_CHROME: 1
tika:
image: ghcr.io/paperless-ngx/tika:latest
image: iwishiwasaneagle/apache-tika-arm@sha256:a78c25ffe57ecb1a194b2859d42a61af46e9e845191512b8f1a4bf90578ffdfd
restart: unless-stopped
volumes:
data:
media:
dbdata:
redisdata:

View File

@@ -1,6 +1,8 @@
# docker-compose file for running paperless from the docker container registry.
# This file contains everything paperless needs to run.
# Paperless supports amd64, arm and arm64 hardware.
# Paperless supports amd64, arm and arm64 hardware. The apache/tika image
# does not support arm or arm64, however.
#
# All compose files of paperless configure paperless in the following way:
#
# - Paperless is (re)started on system boot, if it was running before shutdown.
@@ -33,7 +35,7 @@
version: "3.4"
services:
broker:
image: docker.io/library/redis:7
image: redis:6.0
restart: unless-stopped
volumes:
- redisdata:/data
@@ -48,7 +50,7 @@ services:
ports:
- 8000:8000
healthcheck:
test: ["CMD", "curl", "-fs", "-S", "--max-time", "2", "http://localhost:8000"]
test: ["CMD", "curl", "-f", "http://localhost:8000"]
interval: 30s
timeout: 10s
retries: 5
@@ -65,14 +67,14 @@ services:
PAPERLESS_TIKA_ENDPOINT: http://tika:9998
gotenberg:
image: docker.io/gotenberg/gotenberg:7.6
image: gotenberg/gotenberg:7
restart: unless-stopped
command:
- "gotenberg"
- "--chromium-disable-routes=true"
tika:
image: ghcr.io/paperless-ngx/tika:latest
image: apache/tika
restart: unless-stopped
volumes:

View File

@@ -26,7 +26,7 @@
version: "3.4"
services:
broker:
image: docker.io/library/redis:7
image: redis:6.0
restart: unless-stopped
volumes:
- redisdata:/data
@@ -39,7 +39,7 @@ services:
ports:
- 8000:8000
healthcheck:
test: ["CMD", "curl", "-fs", "-S", "--max-time", "2", "http://localhost:8000"]
test: ["CMD", "curl", "-f", "http://localhost:8000"]
interval: 30s
timeout: 10s
retries: 5

View File

@@ -2,142 +2,46 @@
set -e
# Adapted from:
# https://github.com/docker-library/postgres/blob/master/docker-entrypoint.sh
# usage: file_env VAR
# ie: file_env 'XYZ_DB_PASSWORD' will allow for "$XYZ_DB_PASSWORD_FILE" to
# fill in the value of "$XYZ_DB_PASSWORD" from a file, especially for Docker's
# secrets feature
file_env() {
local -r var="$1"
local -r fileVar="${var}_FILE"
# Basic validation
if [ "${!var:-}" ] && [ "${!fileVar:-}" ]; then
echo >&2 "error: both $var and $fileVar are set (but are exclusive)"
exit 1
fi
# Only export var if the _FILE exists
if [ "${!fileVar:-}" ]; then
# And the file exists
if [[ -f ${!fileVar} ]]; then
echo "Setting ${var} from file"
val="$(< "${!fileVar}")"
export "$var"="$val"
else
echo "File ${!fileVar} doesn't exist"
exit 1
fi
fi
}
# Source: https://github.com/sameersbn/docker-gitlab/
map_uidgid() {
local -r usermap_original_uid=$(id -u paperless)
local -r usermap_original_gid=$(id -g paperless)
local -r usermap_new_uid=${USERMAP_UID:-$usermap_original_uid}
local -r usermap_new_gid=${USERMAP_GID:-${usermap_original_gid:-$usermap_new_uid}}
if [[ ${usermap_new_uid} != "${usermap_original_uid}" || ${usermap_new_gid} != "${usermap_original_gid}" ]]; then
echo "Mapping UID and GID for paperless:paperless to $usermap_new_uid:$usermap_new_gid"
usermod -o -u "${usermap_new_uid}" paperless
groupmod -o -g "${usermap_new_gid}" paperless
USERMAP_ORIG_UID=$(id -u paperless)
USERMAP_ORIG_GID=$(id -g paperless)
USERMAP_NEW_UID=${USERMAP_UID:-$USERMAP_ORIG_UID}
USERMAP_NEW_GID=${USERMAP_GID:-${USERMAP_ORIG_GID:-$USERMAP_NEW_UID}}
if [[ ${USERMAP_NEW_UID} != "${USERMAP_ORIG_UID}" || ${USERMAP_NEW_GID} != "${USERMAP_ORIG_GID}" ]]; then
echo "Mapping UID and GID for paperless:paperless to $USERMAP_NEW_UID:$USERMAP_NEW_GID"
usermod -o -u "${USERMAP_NEW_UID}" paperless
groupmod -o -g "${USERMAP_NEW_GID}" paperless
fi
}
map_folders() {
# Export these so they can be used in docker-prepare.sh
export DATA_DIR="${PAPERLESS_DATA_DIR:-/usr/src/paperless/data}"
export MEDIA_ROOT_DIR="${PAPERLESS_MEDIA_ROOT:-/usr/src/paperless/media}"
export CONSUME_DIR="${PAPERLESS_CONSUMPTION_DIR:-/usr/src/paperless/consume}"
}
nltk_data () {
# Store the NLTK data outside the Docker container
local -r nltk_data_dir="${DATA_DIR}/nltk"
local -r truthy_things=("yes y 1 t true")
# If not set, or it looks truthy
if [[ -z "${PAPERLESS_ENABLE_NLTK}" ]] || [[ "${truthy_things[*]}" =~ ${PAPERLESS_ENABLE_NLTK,} ]]; then
# Download or update the snowball stemmer data
python3 -W ignore::RuntimeWarning -m nltk.downloader -d "${nltk_data_dir}" snowball_data
# Download or update the stopwords corpus
python3 -W ignore::RuntimeWarning -m nltk.downloader -d "${nltk_data_dir}" stopwords
# Download or update the punkt tokenizer data
python3 -W ignore::RuntimeWarning -m nltk.downloader -d "${nltk_data_dir}" punkt
else
echo "Skipping NLTK data download"
fi
}
initialize() {
# Setup environment from secrets before anything else
for env_var in \
PAPERLESS_DBUSER \
PAPERLESS_DBPASS \
PAPERLESS_SECRET_KEY \
PAPERLESS_AUTO_LOGIN_USERNAME \
PAPERLESS_ADMIN_USER \
PAPERLESS_ADMIN_MAIL \
PAPERLESS_ADMIN_PASSWORD \
PAPERLESS_REDIS; do
# Check for a version of this var with _FILE appended
# and convert the contents to the env var value
file_env ${env_var}
done
# Change the user and group IDs if needed
map_uidgid
# Check for overrides of certain folders
map_folders
local -r export_dir="/usr/src/paperless/export"
for dir in \
"${export_dir}" \
"${DATA_DIR}" "${DATA_DIR}/index" \
"${MEDIA_ROOT_DIR}" "${MEDIA_ROOT_DIR}/documents" "${MEDIA_ROOT_DIR}/documents/originals" "${MEDIA_ROOT_DIR}/documents/thumbnails" \
"${CONSUME_DIR}"; do
if [[ ! -d "${dir}" ]]; then
echo "Creating directory ${dir}"
mkdir "${dir}"
for dir in export data data/index media media/documents media/documents/originals media/documents/thumbnails; do
if [[ ! -d "../$dir" ]]; then
echo "Creating directory ../$dir"
mkdir ../$dir
fi
done
local -r tmp_dir="/tmp/paperless"
echo "Creating directory ${tmp_dir}"
mkdir -p "${tmp_dir}"
nltk_data
echo "Creating directory /tmp/paperless"
mkdir -p /tmp/paperless
set +e
echo "Adjusting permissions of paperless files. This may take a while."
chown -R paperless:paperless ${tmp_dir}
for dir in \
"${export_dir}" \
"${DATA_DIR}" \
"${MEDIA_ROOT_DIR}" \
"${CONSUME_DIR}"; do
find "${dir}" -not \( -user paperless -and -group paperless \) -exec chown paperless:paperless {} +
done
chown -R paperless:paperless /tmp/paperless
find .. -not \( -user paperless -and -group paperless \) -exec chown paperless:paperless {} +
set -e
"${gosu_cmd[@]}" /sbin/docker-prepare.sh
gosu paperless /sbin/docker-prepare.sh
}
install_languages() {
echo "Installing languages..."
read -ra langs <<<"$1"
local langs="$1"
read -ra langs <<<"$langs"
# Check that it is not empty
if [ ${#langs[@]} -eq 0 ]; then
@@ -172,11 +76,6 @@ install_languages() {
echo "Paperless-ngx docker container starting..."
gosu_cmd=(gosu paperless)
if [ "$(id -u)" == "$(id -u paperless)" ]; then
gosu_cmd=()
fi
# Install additional languages if specified
if [[ -n "$PAPERLESS_OCR_LANGUAGES" ]]; then
install_languages "$PAPERLESS_OCR_LANGUAGES"
@@ -186,7 +85,7 @@ initialize
if [[ "$1" != "/"* ]]; then
echo Executing management command "$@"
exec "${gosu_cmd[@]}" python3 manage.py "$@"
exec gosu paperless python3 manage.py "$@"
else
echo Executing "$@"
exec "$@"

View File

@@ -1,19 +1,16 @@
#!/usr/bin/env bash
set -e
wait_for_postgres() {
local attempt_num=1
local -r max_attempts=5
attempt_num=1
max_attempts=5
echo "Waiting for PostgreSQL to start..."
local -r host="${PAPERLESS_DBHOST:-localhost}"
local -r port="${PAPERLESS_DBPORT:-5432}"
host="${PAPERLESS_DBHOST:=localhost}"
port="${PAPERLESS_DBPORT:=5342}"
# Disable warning, host and port can't have spaces
# shellcheck disable=SC2086
while [ ! "$(pg_isready -h ${host} -p ${port})" ]; do
while [ ! "$(pg_isready -h $host -p $port)" ]; do
if [ $attempt_num -eq $max_attempts ]; then
echo "Unable to connect to database."
@@ -28,38 +25,6 @@ wait_for_postgres() {
done
}
wait_for_mariadb() {
echo "Waiting for MariaDB to start..."
local -r host="${PAPERLESS_DBHOST:=localhost}"
local -r port="${PAPERLESS_DBPORT:=3306}"
local attempt_num=1
local -r max_attempts=5
while ! true > /dev/tcp/$host/$port; do
if [ $attempt_num -eq $max_attempts ]; then
echo "Unable to connect to database."
exit 1
else
echo "Attempt $attempt_num failed! Trying again in 5 seconds..."
fi
attempt_num=$(("$attempt_num" + 1))
sleep 5
done
}
wait_for_redis() {
# We use a Python script to send the Redis ping
# instead of installing redis-tools just for 1 thing
if ! python3 /sbin/wait-for-redis.py; then
exit 1
fi
}
migrations() {
(
# flock is in place to prevent multiple containers from doing migrations
@@ -68,18 +33,17 @@ migrations() {
flock 200
echo "Apply database migrations..."
python3 manage.py migrate
) 200>"${DATA_DIR}/migration_lock"
) 200>/usr/src/paperless/data/migration_lock
}
search_index() {
index_version=1
index_version_file=/usr/src/paperless/data/.index_version
local -r index_version=1
local -r index_version_file=${DATA_DIR}/.index_version
if [[ (! -f "${index_version_file}") || $(<"${index_version_file}") != "$index_version" ]]; then
if [[ (! -f "$index_version_file") || $(<$index_version_file) != "$index_version" ]]; then
echo "Search index out of date. Updating..."
python3 manage.py document_index reindex --no-progress-bar
echo ${index_version} | tee "${index_version_file}" >/dev/null
python3 manage.py document_index reindex
echo $index_version | tee $index_version_file >/dev/null
fi
}
@@ -89,64 +53,17 @@ superuser() {
fi
}
custom_container_init() {
# Mostly borrowed from the LinuxServer.io base image
# https://github.com/linuxserver/docker-baseimage-ubuntu/tree/bionic/root/etc/cont-init.d
local -r custom_script_dir="/custom-cont-init.d"
# Tamper checking.
# Don't run files which are owned by anyone except root
# Don't run files which are writeable by others
if [ -d "${custom_script_dir}" ]; then
if [ -n "$(/usr/bin/find "${custom_script_dir}" -maxdepth 1 ! -user root)" ]; then
echo "**** Potential tampering with custom scripts detected ****"
echo "**** The folder '${custom_script_dir}' must be owned by root ****"
return 0
fi
if [ -n "$(/usr/bin/find "${custom_script_dir}" -maxdepth 1 -perm -o+w)" ]; then
echo "**** The folder '${custom_script_dir}' or some of contents have write permissions for others, which is a security risk. ****"
echo "**** Please review the permissions and their contents to make sure they are owned by root, and can only be modified by root. ****"
return 0
fi
# Make sure custom init directory has files in it
if [ -n "$(/bin/ls -A "${custom_script_dir}" 2>/dev/null)" ]; then
echo "[custom-init] files found in ${custom_script_dir} executing"
# Loop over files in the directory
for SCRIPT in "${custom_script_dir}"/*; do
NAME="$(basename "${SCRIPT}")"
if [ -f "${SCRIPT}" ]; then
echo "[custom-init] ${NAME}: executing..."
/bin/bash "${SCRIPT}"
echo "[custom-init] ${NAME}: exited $?"
elif [ ! -f "${SCRIPT}" ]; then
echo "[custom-init] ${NAME}: is not a file"
fi
done
else
echo "[custom-init] no custom files found exiting..."
fi
fi
}
do_work() {
if [[ "${PAPERLESS_DBENGINE}" == "mariadb" ]]; then
wait_for_mariadb
elif [[ -n "${PAPERLESS_DBHOST}" ]]; then
if [[ -n "${PAPERLESS_DBHOST}" ]]; then
wait_for_postgres
fi
wait_for_redis
migrations
search_index
superuser
# Leave this last thing
custom_container_init
}
do_work

View File

@@ -1,7 +0,0 @@
#!/usr/bin/env bash
echo "Checking if we should start flower..."
if [[ -n "${PAPERLESS_ENABLE_FLOWER}" ]]; then
celery --app paperless flower
fi

View File

@@ -1,19 +1,6 @@
#!/usr/bin/env bash
set -eu
for command in decrypt_documents \
document_archiver \
document_exporter \
document_importer \
mail_fetcher \
document_create_classifier \
document_index \
document_renamer \
document_retagger \
document_thumbnails \
document_sanity_checker \
manage_superuser;
for command in document_archiver document_exporter document_importer mail_fetcher document_create_classifier document_index document_renamer document_retagger document_thumbnails document_sanity_checker manage_superuser;
do
echo "installing $command..."
sed "s/management_command/$command/g" management_script.sh > /usr/local/bin/$command

View File

@@ -1,15 +0,0 @@
#!/usr/bin/env bash
rootless_args=()
if [ "$(id -u)" == "$(id -u paperless)" ]; then
rootless_args=(
--user
paperless
--logfile
supervisord.log
--pidfile
supervisord.pid
)
fi
exec /usr/local/bin/supervisord -c /etc/supervisord.conf "${rootless_args[@]}"

View File

@@ -10,7 +10,7 @@ user=root
[program:gunicorn]
command=gunicorn -c /usr/src/paperless/gunicorn.conf.py paperless.asgi:application
user=paperless
priority = 1
stdout_logfile=/dev/stdout
stdout_logfile_maxbytes=0
stderr_logfile=/dev/stderr
@@ -19,41 +19,16 @@ stderr_logfile_maxbytes=0
[program:consumer]
command=python3 manage.py document_consumer
user=paperless
stopsignal=INT
priority = 20
stdout_logfile=/dev/stdout
stdout_logfile_maxbytes=0
stderr_logfile=/dev/stderr
stderr_logfile_maxbytes=0
[program:celery]
command = celery --app paperless worker --loglevel INFO
[program:scheduler]
command=python3 manage.py qcluster
user=paperless
stopasgroup = true
stopwaitsecs = 60
priority = 5
stdout_logfile=/dev/stdout
stdout_logfile_maxbytes=0
stderr_logfile=/dev/stderr
stderr_logfile_maxbytes=0
[program:celery-beat]
command = celery --app paperless beat --loglevel INFO
user=paperless
stopasgroup = true
priority = 10
stdout_logfile=/dev/stdout
stdout_logfile_maxbytes=0
stderr_logfile=/dev/stderr
stderr_logfile_maxbytes=0
[program:celery-flower]
command = /usr/local/bin/flower-conditional.sh
user = paperless
startsecs = 0
priority = 40
stdout_logfile=/dev/stdout
stdout_logfile_maxbytes=0
stderr_logfile=/dev/stderr

View File

@@ -1,44 +0,0 @@
#!/usr/bin/env python3
"""
Simple script which attempts to ping the Redis broker as set in the environment for
a certain number of times, waiting a little bit in between
"""
import os
import sys
import time
from typing import Final
from redis import Redis
if __name__ == "__main__":
MAX_RETRY_COUNT: Final[int] = 5
RETRY_SLEEP_SECONDS: Final[int] = 5
REDIS_URL: Final[str] = os.getenv("PAPERLESS_REDIS", "redis://localhost:6379")
print(f"Waiting for Redis...", flush=True)
attempt = 0
with Redis.from_url(url=REDIS_URL) as client:
while attempt < MAX_RETRY_COUNT:
try:
client.ping()
break
except Exception as e:
print(
f"Redis ping #{attempt} failed.\n"
f"Error: {str(e)}.\n"
f"Waiting {RETRY_SLEEP_SECONDS}s",
flush=True,
)
time.sleep(RETRY_SLEEP_SECONDS)
attempt += 1
if attempt >= MAX_RETRY_COUNT:
print(f"Failed to connect to redis using environment variable PAPERLESS_REDIS.")
sys.exit(os.EX_UNAVAILABLE)
else:
print(f"Connected to Redis broker.")
sys.exit(os.EX_OK)

17
docs/Dockerfile Normal file
View File

@@ -0,0 +1,17 @@
FROM python:3.5.1
# Install Sphinx and Pygments
RUN pip install Sphinx Pygments
# Setup directories, copy data
RUN mkdir /build
COPY . /build
WORKDIR /build/docs
# Build documentation
RUN make html
# Start webserver
WORKDIR /build/docs/_build/html
EXPOSE 8000/tcp
CMD ["python3", "-m", "http.server"]

View File

@@ -24,7 +24,6 @@ I18NSPHINXOPTS = $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) .
help:
@echo "Please use \`make <target>' where <target> is one of"
@echo " html to make standalone HTML files"
@echo " livehtml to preview changes with live reload in your browser"
@echo " dirhtml to make HTML files named index.html in directories"
@echo " singlehtml to make a single large HTML file"
@echo " pickle to make pickle files"
@@ -55,9 +54,6 @@ html:
@echo
@echo "Build finished. The HTML pages are in $(BUILDDIR)/html."
livehtml:
sphinx-autobuild "./" "$(BUILDDIR)" $(O)
dirhtml:
$(SPHINXBUILD) -b dirhtml $(ALLSPHINXOPTS) $(BUILDDIR)/dirhtml
@echo

View File

@@ -1,605 +0,0 @@
/* Variables */
:root {
--color-text-body: #5c5962;
--color-text-body-light: #fcfcfc;
--color-text-anchor: #7253ed;
--color-text-alt: rgba(0, 0, 0, 0.3);
--color-text-title: #27262b;
--color-text-code-inline: #e74c3c;
--color-text-code-nt: #062873;
--color-text-selection: #b19eff;
--color-bg-body: #fcfcfc;
--color-bg-body-alt: #f3f6f6;
--color-bg-side-nav: #f5f6fa;
--color-bg-side-nav-hover: #ebedf5;
--color-bg-code-block: var(--color-bg-side-nav);
--color-border: #eeebee;
--color-btn-neutral-bg: #f3f6f6;
--color-btn-neutral-bg-hover: #e5ebeb;
--color-success-title: #1abc9c;
--color-success-body: #dbfaf4;
--color-warning-title: #f0b37e;
--color-warning-body: #ffedcc;
--color-danger-title: #f29f97;
--color-danger-body: #fdf3f2;
--color-info-title: #6ab0de;
--color-info-body: #e7f2fa;
}
.dark-mode {
--color-text-body: #abb2bf;
--color-text-body-light: #9499a2;
--color-text-alt: rgba(0255, 255, 255, 0.5);
--color-text-title: var(--color-text-anchor);
--color-text-code-inline: #abb2bf;
--color-text-code-nt: #2063f3;
--color-text-selection: #030303;
--color-bg-body: #1d1d20 !important;
--color-bg-body-alt: #131315;
--color-bg-side-nav: #18181a;
--color-bg-side-nav-hover: #101216;
--color-bg-code-block: #101216;
--color-border: #47494f;
--color-btn-neutral-bg: #242529;
--color-btn-neutral-bg-hover: #101216;
--color-success-title: #02120f;
--color-success-body: #041b17;
--color-warning-title: #1b0e03;
--color-warning-body: #371d06;
--color-danger-title: #120902;
--color-danger-body: #1b0503;
--color-info-title: #020608;
--color-info-body: #06141e;
}
* {
transition: background-color 0.3s ease, border-color 0.3s ease;
}
/* Typography */
body {
font-family: system-ui,-apple-system,BlinkMacSystemFont,"Segoe UI",Roboto,"Helvetica Neue",Arial,sans-serif;
font-size: inherit;
line-height: 1.4;
color: var(--color-text-body);
}
.rst-content p {
word-break: break-word;
}
h1, h2, h3, h4, h5, h6 {
font-family: inherit;
}
.rst-content .toctree-wrapper>p.caption, .rst-content h1, .rst-content h2, .rst-content h3, .rst-content h4, .rst-content h5, .rst-content h6 {
padding-top: .5em;
}
p, .main-content-wrap, .rst-content .section ul, .rst-content .toctree-wrapper ul, .rst-content section ul, .wy-plain-list-disc, article ul {
line-height: 1.6;
}
pre, .code, .rst-content .linenodiv pre, .rst-content div[class^=highlight] pre, .rst-content pre.literal-block {
font-family: "SFMono-Regular", Menlo,Consolas, Monospace;
font-size: 0.75em;
line-height: 1.8;
}
.wy-menu-vertical li.toctree-l3,.wy-menu-vertical li.toctree-l4 {
font-size: 1rem
}
.rst-versions {
font-family: inherit;
line-height: 1;
}
footer, footer p {
font-size: .8rem;
}
footer .rst-footer-buttons {
font-size: 1rem;
}
@media (max-width: 400px) {
/* break code lines on mobile */
pre, code {
word-break: break-word;
}
}
/* Layout */
.wy-side-nav-search, .wy-menu-vertical {
width: auto;
}
.wy-nav-side {
z-index: 0;
display: flex;
flex-wrap: wrap;
background-color: var(--color-bg-side-nav);
}
.wy-side-scroll {
width: 100%;
overflow-y: auto;
}
@media (min-width: 66.5rem) {
.wy-side-scroll {
width:264px
}
}
@media (min-width: 50rem) {
.wy-nav-side {
flex-wrap: nowrap;
position: fixed;
width: 248px;
height: 100%;
flex-direction: column;
border-right: 1px solid var(--color-border);
align-items:flex-end
}
}
@media (min-width: 66.5rem) {
.wy-nav-side {
width: calc((100% - 1064px) / 2 + 264px);
min-width:264px
}
}
@media (min-width: 50rem) {
.wy-nav-content-wrap {
position: relative;
max-width: 800px;
margin-left:248px
}
}
@media (min-width: 66.5rem) {
.wy-nav-content-wrap {
margin-left:calc((100% - 1064px) / 2 + 264px)
}
}
/* Colors */
body.wy-body-for-nav,
.wy-nav-content {
background: var(--color-bg-body);
}
.wy-nav-side {
border-right: 1px solid var(--color-border);
}
.wy-side-nav-search, .wy-nav-top {
background: var(--color-bg-side-nav);
border-bottom: 1px solid var(--color-border);
}
.wy-nav-content-wrap {
background: inherit;
}
.wy-side-nav-search > a, .wy-nav-top a, .wy-nav-top i {
color: var(--color-text-title);
}
.wy-side-nav-search > a:hover, .wy-nav-top a:hover {
background: transparent;
}
.wy-side-nav-search > div.version {
color: var(--color-text-alt)
}
.wy-side-nav-search > div[role="search"] {
border-top: 1px solid var(--color-border);
}
.wy-menu-vertical li.toctree-l2.current>a, .wy-menu-vertical li.toctree-l2.current li.toctree-l3>a,
.wy-menu-vertical li.toctree-l3.current>a, .wy-menu-vertical li.toctree-l3.current li.toctree-l4>a {
background: var(--color-bg-side-nav);
}
.rst-content .highlighted {
background: #eedd85;
box-shadow: 0 0 0 2px #eedd85;
font-weight: 600;
}
.wy-side-nav-search input[type=text],
html.writer-html5 .rst-content table.docutils th {
color: var(--color-text-body);
}
.rst-content table.docutils:not(.field-list) tr:nth-child(2n-1) td,
.wy-table-backed,
.wy-table-odd td,
.wy-table-striped tr:nth-child(2n-1) td {
background-color: var(--color-bg-body-alt);
}
.rst-content table.docutils,
.wy-table-bordered-all,
html.writer-html5 .rst-content table.docutils th,
.rst-content table.docutils td,
.wy-table-bordered-all td,
hr {
border-color: var(--color-border) !important;
}
::selection {
background: var(--color-text-selection);
}
/* Ridiculous rules are taken from sphinx_rtd */
.rst-content .admonition-title,
.wy-alert-title {
color: var(--color-text-body-light);
}
.rst-content .hint,
.rst-content .important,
.rst-content .tip,
.rst-content .wy-alert-success,
.wy-alert.wy-alert-success {
background: var(--color-success-body);
}
.rst-content .hint .admonition-title,
.rst-content .hint .wy-alert-title,
.rst-content .important .admonition-title,
.rst-content .important .wy-alert-title,
.rst-content .tip .admonition-title,
.rst-content .tip .wy-alert-title,
.rst-content .wy-alert-success .admonition-title,
.rst-content .wy-alert-success .wy-alert-title,
.wy-alert.wy-alert-success .rst-content .admonition-title,
.wy-alert.wy-alert-success .wy-alert-title {
background-color: var(--color-success-title);
}
.rst-content .admonition-todo,
.rst-content .attention,
.rst-content .caution,
.rst-content .warning,
.rst-content .wy-alert-warning,
.wy-alert.wy-alert-warning {
background: var(--color-warning-body);
}
.rst-content .admonition-todo .admonition-title,
.rst-content .admonition-todo .wy-alert-title,
.rst-content .attention .admonition-title,
.rst-content .attention .wy-alert-title,
.rst-content .caution .admonition-title,
.rst-content .caution .wy-alert-title,
.rst-content .warning .admonition-title,
.rst-content .warning .wy-alert-title,
.rst-content .wy-alert-warning .admonition-title,
.rst-content .wy-alert-warning .wy-alert-title,
.rst-content .wy-alert.wy-alert-warning .admonition-title,
.wy-alert.wy-alert-warning .rst-content .admonition-title,
.wy-alert.wy-alert-warning .wy-alert-title {
background: var(--color-warning-title);
}
.rst-content .danger,
.rst-content .error,
.rst-content .wy-alert-danger,
.wy-alert.wy-alert-danger {
background: var(--color-danger-body);
}
.rst-content .danger .admonition-title,
.rst-content .danger .wy-alert-title,
.rst-content .error .admonition-title,
.rst-content .error .wy-alert-title,
.rst-content .wy-alert-danger .admonition-title,
.rst-content .wy-alert-danger .wy-alert-title,
.wy-alert.wy-alert-danger .rst-content .admonition-title,
.wy-alert.wy-alert-danger .wy-alert-title {
background: var(--color-danger-title);
}
.rst-content .note,
.rst-content .seealso,
.rst-content .wy-alert-info,
.wy-alert.wy-alert-info {
background: var(--color-info-body);
}
.rst-content .note .admonition-title,
.rst-content .note .wy-alert-title,
.rst-content .seealso .admonition-title,
.rst-content .seealso .wy-alert-title,
.rst-content .wy-alert-info .admonition-title,
.rst-content .wy-alert-info .wy-alert-title,
.wy-alert.wy-alert-info .rst-content .admonition-title,
.wy-alert.wy-alert-info .wy-alert-title {
background: var(--color-info-title);
}
/* Links */
a, a:visited,
.wy-menu-vertical a,
a.icon.icon-home,
.wy-menu-vertical li.toctree-l1.current > a.current {
color: var(--color-text-anchor);
text-decoration: none;
}
a:hover, .wy-breadcrumbs-aside a {
color: var(--color-text-anchor); /* reset */
}
.rst-versions a, .rst-versions .rst-current-version {
color: #var(--color-text-anchor);
}
.wy-nav-content a.reference, .wy-nav-content a:not([class]) {
background-image: linear-gradient(var(--color-border) 0%, var(--color-border) 100%);
background-repeat: repeat-x;
background-position: 0 100%;
background-size: 1px 1px;
}
.wy-nav-content a.reference:hover, .wy-nav-content a:not([class]):hover {
background-image: linear-gradient(rgba(114,83,237,0.45) 0%, rgba(114,83,237,0.45) 100%);
background-size: 1px 1px;
}
.wy-menu-vertical a:hover,
.wy-menu-vertical li.current a:hover,
.wy-menu-vertical a:active {
background: var(--color-bg-side-nav-hover) !important;
color: var(--color-text-body);
}
.wy-menu-vertical li.toctree-l1.current>a,
.wy-menu-vertical li.current>a,
.wy-menu-vertical li.on a {
background-color: var(--color-bg-side-nav-hover);
border: none;
font-weight: normal;
}
.wy-menu-vertical li.current {
background-color: inherit;
}
.wy-menu-vertical li.current a {
border-right: none;
}
.wy-menu-vertical li.toctree-l2 a,
.wy-menu-vertical li.toctree-l3 a,
.wy-menu-vertical li.toctree-l4 a,
.wy-menu-vertical li.toctree-l5 a,
.wy-menu-vertical li.toctree-l6 a,
.wy-menu-vertical li.toctree-l7 a,
.wy-menu-vertical li.toctree-l8 a,
.wy-menu-vertical li.toctree-l9 a,
.wy-menu-vertical li.toctree-l10 a {
color: var(--color-text-body);
}
a.image-reference, a.image-reference:hover {
background: none !important;
}
a.image-reference img {
cursor: zoom-in;
}
/* Code blocks */
.rst-content code, .rst-content tt, code {
padding: 0.25em;
font-weight: 400;
background-color: var(--color-bg-code-block);
border: 1px solid var(--color-border);
border-radius: 4px;
}
.rst-content div[class^=highlight], .rst-content pre.literal-block {
padding: 0.7rem;
margin-top: 0;
margin-bottom: 0.75rem;
overflow-x: auto;
background-color: var(--color-bg-side-nav);
border-color: var(--color-border);
border-radius: 4px;
box-shadow: none;
}
.rst-content .admonition-title,
.rst-content div.admonition,
.wy-alert-title {
padding: 10px 12px;
border-top-left-radius: 4px;
border-top-right-radius: 4px;
}
.highlight .go {
color: inherit;
}
.highlight .nt {
color: var(--color-text-code-nt);
}
.rst-content code.literal,
.rst-content tt.literal,
html.writer-html5 .rst-content dl.footnote code {
border-color: var(--color-border);
background-color: var(--color-border);
color: var(--color-text-code-inline)
}
/* Search */
.wy-side-nav-search input[type=text] {
border: none;
border-radius: 0;
background-color: transparent;
font-family: inherit;
font-size: .85rem;
box-shadow: none;
padding: .7rem 1rem .7rem 2.8rem;
margin: 0;
}
#rtd-search-form {
position: relative;
}
#rtd-search-form:before {
font: normal normal normal 14px/1 FontAwesome;
font-size: inherit;
text-rendering: auto;
-webkit-font-smoothing: antialiased;
-moz-osx-font-smoothing: grayscale;
content: "\f002";
color: var(--color-text-alt);
position: absolute;
left: 1.5rem;
top: .7rem;
}
/* Side nav */
.wy-side-nav-search {
padding: 1rem 0 0 0;
}
.wy-menu-vertical li a button.toctree-expand {
float: right;
margin-right: -1.5em;
padding: 0 .5em;
}
.wy-menu-vertical a,
.wy-menu-vertical li.current>a,
.wy-menu-vertical li.current li>a {
padding-right: 1.5em !important;
}
.wy-menu-vertical li.current li>a.current {
font-weight: 600;
}
/* Misc spacing */
.rst-content .admonition-title, .wy-alert-title {
padding: 10px 12px;
}
/* Buttons */
.btn {
display: inline-block;
box-sizing: border-box;
padding: 0.3em 1em;
margin: 0;
font-family: inherit;
font-size: inherit;
font-weight: 500;
line-height: 1.5;
color: #var(--color-text-anchor);
text-decoration: none;
vertical-align: baseline;
background-color: #f7f7f7;
border-width: 0;
border-radius: 4px;
box-shadow: 0 1px 2px rgba(0,0,0,0.12),0 3px 10px rgba(0,0,0,0.08);
appearance: none;
}
.btn:active {
padding: 0.3em 1em;
}
.rst-content .btn:focus {
outline: 1px solid #ccc;
}
.rst-content .btn-neutral, .rst-content .btn span.fa {
color: var(--color-text-body) !important;
}
.btn-neutral {
background-color: var(--color-btn-neutral-bg) !important;
color: var(--color-btn-neutral-text) !important;
border: 1px solid var(--color-btn-neutral-bg);
}
.btn:hover, .btn-neutral:hover {
background-color: var(--color-btn-neutral-bg-hover) !important;
}
/* Icon overrides */
.wy-side-nav-search a.icon-home:before {
display: none;
}
.fa-minus-square-o:before,.wy-menu-vertical li.current>a button.toctree-expand:before,.wy-menu-vertical li.on a button.toctree-expand:before {
content: "\f106"; /* fa-angle-up */
}
.fa-plus-square-o:before, .wy-menu-vertical li button.toctree-expand:before {
content: "\f107"; /* fa-angle-down */
}
/* Misc */
.wy-nav-top {
line-height: 36px;
}
.wy-nav-top > i {
font-size: 24px;
padding: 8px 0 0 2px;
color:#var(--color-text-anchor);
}
.rst-content table.docutils td,
.rst-content table.docutils th,
.rst-content table.field-list td,
.rst-content table.field-list th,
.wy-table td,
.wy-table th {
padding: 8px 14px;
}
.dark-mode-toggle {
position: absolute;
top: 14px;
right: 12px;
height: 20px;
width: 24px;
z-index: 10;
border: none;
background-color: transparent;
color: inherit;
opacity: 0.7;
}
.wy-nav-content-wrap {
z-index: 20;
}
.rst-content .toctree-wrapper {
display: none;
}
.redirect-notice {
font-size: 2.5rem;
}

14
docs/_static/custom.css vendored Normal file
View File

@@ -0,0 +1,14 @@
/* override table width restrictions */
@media screen and (min-width: 767px) {
.wy-table-responsive table td {
/* !important prevents the common CSS stylesheets from
overriding this as on RTD they are loaded after this stylesheet */
white-space: normal !important;
}
.wy-table-responsive {
overflow: visible !important;
}
}

View File

@@ -1,47 +0,0 @@
let toggleButton
let icon
function load() {
'use strict'
toggleButton = document.createElement('button')
toggleButton.setAttribute('title', 'Toggle dark mode')
toggleButton.classList.add('dark-mode-toggle')
icon = document.createElement('i')
icon.classList.add('fa', darkModeState ? 'fa-sun-o' : 'fa-moon-o')
toggleButton.appendChild(icon)
document.body.prepend(toggleButton)
// Listen for changes in the OS settings
// addListener is used because older versions of Safari don't support addEventListener
// prefersDarkQuery set in <head>
if (prefersDarkQuery) {
prefersDarkQuery.addListener(function (evt) {
toggleDarkMode(evt.matches)
})
}
// Initial setting depending on the prefers-color-mode or localstorage
// darkModeState should be set in the document <head> to prevent flash
if (darkModeState == undefined) darkModeState = false
toggleDarkMode(darkModeState)
// Toggles the "dark-mode" class on click and sets localStorage state
toggleButton.addEventListener('click', () => {
darkModeState = !darkModeState
toggleDarkMode(darkModeState)
localStorage.setItem('dark-mode', darkModeState)
})
}
function toggleDarkMode(state) {
document.documentElement.classList.toggle('dark-mode', state)
document.documentElement.classList.toggle('light-mode', !state)
icon.classList.remove('fa-sun-o')
icon.classList.remove('fa-moon-o')
icon.classList.add(state ? 'fa-sun-o' : 'fa-moon-o')
darkModeState = state
}
document.addEventListener('DOMContentLoaded', load)

BIN
docs/_static/screenshot.png vendored Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 445 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 661 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 457 KiB

After

Width:  |  Height:  |  Size: 106 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 436 KiB

After

Width:  |  Height:  |  Size: 167 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 462 KiB

After

Width:  |  Height:  |  Size: 28 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 608 KiB

After

Width:  |  Height:  |  Size: 306 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 698 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 706 KiB

After

Width:  |  Height:  |  Size: 410 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 480 KiB

After

Width:  |  Height:  |  Size: 137 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 848 KiB

After

Width:  |  Height:  |  Size: 293 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 703 KiB

After

Width:  |  Height:  |  Size: 260 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 388 KiB

After

Width:  |  Height:  |  Size: 158 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 26 KiB

After

Width:  |  Height:  |  Size: 32 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 54 KiB

After

Width:  |  Height:  |  Size: 61 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 517 KiB

After

Width:  |  Height:  |  Size: 261 KiB

View File

@@ -1,38 +0,0 @@
{% extends "!layout.html" %}
{% block extrahead %}
<script>
// MediaQueryList object
const prefersDarkQuery = window.matchMedia("(prefers-color-scheme: dark)");
const lsDark = localStorage.getItem("dark-mode");
let darkModeState = lsDark !== null ? lsDark == "true" : prefersDarkQuery.matches;
document.documentElement.classList.toggle("dark-mode", darkModeState);
document.documentElement.classList.toggle("light-mode", !darkModeState);
const RTD_TO_MKD = {
"index.html": "",
"setup.html": "setup",
"usage_overview.html": "usage",
"advanced_usage.html": "advanced_usage",
"administration.html": "administration",
"configuration.html": "configuration",
"api.html": "api",
"faq.html": "faq",
"troubleshooting.html": "troubleshooting",
"extending.html": "development",
"scanners.html": "",
"screenshots.html": "",
"changelog.html": "changelog",
}
const path = (RTD_TO_MKD[window.location.pathname.substring(window.location.pathname.lastIndexOf("/") + 1)] ?? "") + "/";
const hash = window.location.hash;
const redirectURL = new URL(path + hash, "https://docs.paperless-ngx.com/");
console.log(`Redirecting to ${redirectURL} in 3 seconds...`);
setTimeout(() => {
window.location.replace(redirectURL);
}, 3000);
</script>
{{ super() }}
{% endblock %}

View File

@@ -1,11 +1,499 @@
.. _administration:
**************
Administration
**************
.. cssclass:: redirect-notice
.. _administration-backup:
The Paperless-ngx documentation has permanently moved.
Making backups
##############
You will be redirected shortly...
Multiple options exist for making backups of your paperless instance,
depending on how you installed paperless.
Before making backups, make sure that paperless is not running.
Options available to any installation of paperless:
* Use the :ref:`document exporter <utilities-exporter>`.
The document exporter exports all your documents, thumbnails and
metadata to a specific folder. You may import your documents into a
fresh instance of paperless again or store your documents in another
DMS with this export.
* The document exporter is also able to update an already existing export.
Therefore, incremental backups with ``rsync`` are entirely possible.
.. caution::
You cannot import the export generated with one version of paperless in a
different version of paperless. The export contains an exact image of the
database, and migrations may change the database layout.
Options available to docker installations:
* Backup the docker volumes. These usually reside within
``/var/lib/docker/volumes`` on the host and you need to be root in order
to access them.
Paperless uses 3 volumes:
* ``paperless_media``: This is where your documents are stored.
* ``paperless_data``: This is where auxillary data is stored. This
folder also contains the SQLite database, if you use it.
* ``paperless_pgdata``: Exists only if you use PostgreSQL and contains
the database.
Options available to bare-metal and non-docker installations:
* Backup the entire paperless folder. This ensures that if your paperless instance
crashes at some point or your disk fails, you can simply copy the folder back
into place and it works.
When using PostgreSQL, you'll also have to backup the database.
.. _migrating-restoring:
Restoring
=========
.. _administration-updating:
Updating Paperless
##################
Docker Route
============
If a new release of paperless-ngx is available, upgrading depends on how you
installed paperless-ngx in the first place. The releases are available at the
`release page <https://github.com/paperless-ngx/paperless-ngx/releases>`_.
First of all, ensure that paperless is stopped.
.. code:: shell-session
$ cd /path/to/paperless
$ docker-compose down
After that, :ref:`make a backup <administration-backup>`.
A. If you pull the image from the docker hub, all you need to do is:
.. code:: shell-session
$ docker-compose pull
$ docker-compose up
The docker-compose files refer to the ``latest`` version, which is always the latest
stable release.
B. If you built the image yourself, do the following:
.. code:: shell-session
$ git pull
$ docker-compose build
$ docker-compose up
Running ``docker-compose up`` will also apply any new database migrations.
If you see everything working, press CTRL+C once to gracefully stop paperless.
Then you can start paperless-ngx with ``-d`` to have it run in the background.
.. note::
In version 0.9.14, the update process was changed. In 0.9.13 and earlier, the
docker-compose files specified exact versions and pull won't automatically
update to newer versions. In order to enable updates as described above, either
get the new ``docker-compose.yml`` file from `here <https://github.com/paperless-ngx/paperless-ngx/tree/master/docker/compose>`_
or edit the ``docker-compose.yml`` file, find the line that says
.. code::
image: ghcr.io/paperless-ngx/paperless-ngx:0.9.x
and replace the version with ``latest``:
.. code::
image: ghcr.io/paperless-ngx/paperless-ngx:latest
Bare Metal Route
================
After grabbing the new release and unpacking the contents, do the following:
1. Update dependencies. New paperless version may require additional
dependencies. The dependencies required are listed in the section about
:ref:`bare metal installations <setup-bare_metal>`.
2. Update python requirements. Keep in mind to activate your virtual environment
before that, if you use one.
.. code:: shell-session
$ pip install -r requirements.txt
3. Migrate the database.
.. code:: shell-session
$ cd src
$ python3 manage.py migrate
This might not actually do anything. Not every new paperless version comes with new
database migrations.
Downgrading Paperless
#####################
Downgrades are possible. However, some updates also contain database migrations (these change the layout of the database and may move data).
In order to move back from a version that applied database migrations, you'll have to revert the database migration *before* downgrading,
and then downgrade paperless.
This table lists the compatible versions for each database migration number.
+------------------+-----------------+
| Migration number | Version range |
+------------------+-----------------+
| 1011 | 1.0.0 |
+------------------+-----------------+
| 1012 | 1.1.0 - 1.2.1 |
+------------------+-----------------+
| 1014 | 1.3.0 - 1.3.1 |
+------------------+-----------------+
| 1016 | 1.3.2 - current |
+------------------+-----------------+
Execute the following management command to migrate your database:
.. code:: shell-session
$ python3 manage.py migrate documents <migration number>
.. note::
Some migrations cannot be undone. The command will issue errors if that happens.
.. _utilities-management-commands:
Management utilities
####################
Paperless comes with some management commands that perform various maintenance
tasks on your paperless instance. You can invoke these commands in the following way:
With docker-compose, while paperless is running:
.. code:: shell-session
$ cd /path/to/paperless
$ docker-compose exec webserver <command> <arguments>
With docker, while paperless is running:
.. code:: shell-session
$ docker exec -it <container-name> <command> <arguments>
Bare metal:
.. code:: shell-session
$ cd /path/to/paperless/src
$ python3 manage.py <command> <arguments>
All commands have built-in help, which can be accessed by executing them with
the argument ``--help``.
.. _utilities-exporter:
Document exporter
=================
The document exporter exports all your data from paperless into a folder for
backup or migration to another DMS.
If you use the document exporter within a cronjob to backup your data you might use the ``-T`` flag behind exec to suppress "The input device is not a TTY" errors. For example: ``docker-compose exec -T webserver document_exporter ../export``
.. code::
document_exporter target [-c] [-f] [-d]
optional arguments:
-c, --compare-checksums
-f, --use-filename-format
-d, --delete
``target`` is a folder to which the data gets written. This includes documents,
thumbnails and a ``manifest.json`` file. The manifest contains all metadata from
the database (correspondents, tags, etc).
When you use the provided docker compose script, specify ``../export`` as the
target. This path inside the container is automatically mounted on your host on
the folder ``export``.
If the target directory already exists and contains files, paperless will assume
that the contents of the export directory are a previous export and will attempt
to update the previous export. Paperless will only export changed and added files.
Paperless determines whether a file has changed by inspecting the file attributes
"date/time modified" and "size". If that does not work out for you, specify
``--compare-checksums`` and paperless will attempt to compare file checksums instead.
This is slower.
Paperless will not remove any existing files in the export directory. If you want
paperless to also remove files that do not belong to the current export such as files
from deleted documents, specify ``--delete``. Be careful when pointing paperless to
a directory that already contains other files.
The filenames generated by this command follow the format
``[date created] [correspondent] [title].[extension]``.
If you want paperless to use ``PAPERLESS_FILENAME_FORMAT`` for exported filenames
instead, specify ``--use-filename-format``.
.. _utilities-importer:
Document importer
=================
The document importer takes the export produced by the `Document exporter`_ and
imports it into paperless.
The importer works just like the exporter. You point it at a directory, and
the script does the rest of the work:
.. code::
document_importer source
When you use the provided docker compose script, put the export inside the
``export`` folder in your paperless source directory. Specify ``../export``
as the ``source``.
.. _utilities-retagger:
Document retagger
=================
Say you've imported a few hundred documents and now want to introduce
a tag or set up a new correspondent, and apply its matching to all of
the currently-imported docs. This problem is common enough that
there are tools for it.
.. code::
document_retagger [-h] [-c] [-T] [-t] [-i] [--use-first] [-f]
optional arguments:
-c, --correspondent
-T, --tags
-t, --document_type
-i, --inbox-only
--use-first
-f, --overwrite
Run this after changing or adding matching rules. It'll loop over all
of the documents in your database and attempt to match documents
according to the new rules.
Specify any combination of ``-c``, ``-T`` and ``-t`` to have the
retagger perform matching of the specified metadata type. If you don't
specify any of these options, the document retagger won't do anything.
Specify ``-i`` to have the document retagger work on documents tagged
with inbox tags only. This is useful when you don't want to mess with
your already processed documents.
When multiple document types or correspondents match a single document,
the retagger won't assign these to the document. Specify ``--use-first``
to override this behavior and just use the first correspondent or type
it finds. This option does not apply to tags, since any amount of tags
can be applied to a document.
Finally, ``-f`` specifies that you wish to overwrite already assigned
correspondents, types and/or tags. The default behavior is to not
assign correspondents and types to documents that have this data already
assigned. ``-f`` works differently for tags: By default, only additional tags get
added to documents, no tags will be removed. With ``-f``, tags that don't
match a document anymore get removed as well.
Managing the Automatic matching algorithm
=========================================
The *Auto* matching algorithm requires a trained neural network to work.
This network needs to be updated whenever somethings in your data
changes. The docker image takes care of that automatically with the task
scheduler. You can manually renew the classifier by invoking the following
management command:
.. code::
document_create_classifier
This command takes no arguments.
.. _`administration-index`:
Managing the document search index
==================================
The document search index is responsible for delivering search results for the
website. The document index is automatically updated whenever documents get
added to, changed, or removed from paperless. However, if the search yields
non-existing documents or won't find anything, you may need to recreate the
index manually.
.. code::
document_index {reindex,optimize}
Specify ``reindex`` to have the index created from scratch. This may take some
time.
Specify ``optimize`` to optimize the index. This updates certain aspects of
the index and usually makes queries faster and also ensures that the
autocompletion works properly. This command is regularly invoked by the task
scheduler.
.. _utilities-renamer:
Managing filenames
==================
If you use paperless' feature to
:ref:`assign custom filenames to your documents <advanced-file_name_handling>`,
you can use this command to move all your files after changing
the naming scheme.
.. warning::
Since this command moves you documents around alot, it is advised to to
a backup before. The renaming logic is robust and will never overwrite
or delete a file, but you can't ever be careful enough.
.. code::
document_renamer
The command takes no arguments and processes all your documents at once.
Learn how to use :ref:`Management Utilities<utilities-management-commands>`.
.. _utilities-sanity-checker:
Sanity checker
==============
Paperless has a built-in sanity checker that inspects your document collection for issues.
The issues detected by the sanity checker are as follows:
* Missing original files.
* Missing archive files.
* Inaccessible original files due to improper permissions.
* Inaccessible archive files due to improper permissions.
* Corrupted original documents by comparing their checksum against what is stored in the database.
* Corrupted archive documents by comparing their checksum against what is stored in the database.
* Missing thumbnails.
* Inaccessible thumbnails due to improper permissions.
* Documents without any content (warning).
* Orphaned files in the media directory (warning). These are files that are not referenced by any document im paperless.
.. code::
document_sanity_checker
The command takes no arguments. Depending on the size of your document archive, this may take some time.
Fetching e-mail
===============
Paperless automatically fetches your e-mail every 10 minutes by default. If
you want to invoke the email consumer manually, call the following management
command:
.. code::
mail_fetcher
The command takes no arguments and processes all your mail accounts and rules.
.. _utilities-archiver:
Creating archived documents
===========================
Paperless stores archived PDF/A documents alongside your original documents.
These archived documents will also contain selectable text for image-only
originals.
These documents are derived from the originals, which are always stored
unmodified. If coming from an earlier version of paperless, your documents
won't have archived versions.
This command creates PDF/A documents for your documents.
.. code::
document_archiver --overwrite --document <id>
This command will only attempt to create archived documents when no archived
document exists yet, unless ``--overwrite`` is specified. If ``--document <id>``
is specified, the archiver will only process that document.
.. note::
This command essentially performs OCR on all your documents again,
according to your settings. If you run this with ``PAPERLESS_OCR_MODE=redo``,
it will potentially run for a very long time. You can cancel the command
at any time, since this command will skip already archived versions the next time
it is run.
.. note::
Some documents will cause errors and cannot be converted into PDF/A documents,
such as encrypted PDF documents. The archiver will skip over these documents
each time it sees them.
.. _utilities-encyption:
Managing encryption
===================
Documents can be stored in Paperless using GnuPG encryption.
.. danger::
Encryption is deprecated since paperless-ngx 0.9 and doesn't really provide any
additional security, since you have to store the passphrase in a configuration
file on the same system as the encrypted documents for paperless to work.
Furthermore, the entire text content of the documents is stored plain in the
database, even if your documents are encrypted. Filenames are not encrypted as
well.
Also, the web server provides transparent access to your encrypted documents.
Consider running paperless on an encrypted filesystem instead, which will then
at least provide security against physical hardware theft.
Enabling encryption
-------------------
Enabling encryption is no longer supported.
Disabling encryption
--------------------
Basic usage to disable encryption of your document store:
(Note: If ``PAPERLESS_PASSPHRASE`` isn't set already, you need to specify it here)
.. code::
decrypt_documents [--passphrase SECR3TP4SSPHRA$E]

View File

@@ -1,11 +1,292 @@
.. _advanced_usage:
***************
Advanced topics
***************
.. cssclass:: redirect-notice
Paperless offers a couple features that automate certain tasks and make your life
easier.
The Paperless-ngx documentation has permanently moved.
.. _advanced-matching:
You will be redirected shortly...
Matching tags, correspondents and document types
################################################
Paperless will compare the matching algorithms defined by every tag and
correspondent already set in your database to see if they apply to the text in
a document. In other words, if you defined a tag called ``Home Utility``
that had a ``match`` property of ``bc hydro`` and a ``matching_algorithm`` of
``literal``, Paperless will automatically tag your newly-consumed document with
your ``Home Utility`` tag so long as the text ``bc hydro`` appears in the body
of the document somewhere.
The matching logic is quite powerful. It supports searching the text of your
document with different algorithms, and as such, some experimentation may be
necessary to get things right.
In order to have a tag, correspondent, or type assigned automatically to newly
consumed documents, assign a match and matching algorithm using the web
interface. These settings define when to assign correspondents, tags, and types
to documents.
The following algorithms are available:
* **Any:** Looks for any occurrence of any word provided in match in the PDF.
If you define the match as ``Bank1 Bank2``, it will match documents containing
either of these terms.
* **All:** Requires that every word provided appears in the PDF, albeit not in the
order provided.
* **Literal:** Matches only if the match appears exactly as provided (i.e. preserve ordering) in the PDF.
* **Regular expression:** Parses the match as a regular expression and tries to
find a match within the document.
* **Fuzzy match:** I dont know. Look at the source.
* **Auto:** Tries to automatically match new documents. This does not require you
to set a match. See the notes below.
When using the *any* or *all* matching algorithms, you can search for terms
that consist of multiple words by enclosing them in double quotes. For example,
defining a match text of ``"Bank of America" BofA`` using the *any* algorithm,
will match documents that contain either "Bank of America" or "BofA", but will
not match documents containing "Bank of South America".
Then just save your tag/correspondent and run another document through the
consumer. Once complete, you should see the newly-created document,
automatically tagged with the appropriate data.
.. _advanced-automatic_matching:
Automatic matching
==================
Paperless-ngx comes with a new matching algorithm called *Auto*. This matching
algorithm tries to assign tags, correspondents, and document types to your
documents based on how you have already assigned these on existing documents. It
uses a neural network under the hood.
If, for example, all your bank statements of your account 123 at the Bank of
America are tagged with the tag "bofa_123" and the matching algorithm of this
tag is set to *Auto*, this neural network will examine your documents and
automatically learn when to assign this tag.
Paperless tries to hide much of the involved complexity with this approach.
However, there are a couple caveats you need to keep in mind when using this
feature:
* Changes to your documents are not immediately reflected by the matching
algorithm. The neural network needs to be *trained* on your documents after
changes. Paperless periodically (default: once each hour) checks for changes
and does this automatically for you.
* The Auto matching algorithm only takes documents into account which are NOT
placed in your inbox (i.e. have any inbox tags assigned to them). This ensures
that the neural network only learns from documents which you have correctly
tagged before.
* The matching algorithm can only work if there is a correlation between the
tag, correspondent, or document type and the document itself. Your bank
statements usually contain your bank account number and the name of the bank,
so this works reasonably well, However, tags such as "TODO" cannot be
automatically assigned.
* The matching algorithm needs a reasonable number of documents to identify when
to assign tags, correspondents, and types. If one out of a thousand documents
has the correspondent "Very obscure web shop I bought something five years
ago", it will probably not assign this correspondent automatically if you buy
something from them again. The more documents, the better.
* Paperless also needs a reasonable amount of negative examples to decide when
not to assign a certain tag, correspondent or type. This will usually be the
case as you start filling up paperless with documents. Example: If all your
documents are either from "Webshop" and "Bank", paperless will assign one of
these correspondents to ANY new document, if both are set to automatic matching.
Hooking into the consumption process
####################################
Sometimes you may want to do something arbitrary whenever a document is
consumed. Rather than try to predict what you may want to do, Paperless lets
you execute scripts of your own choosing just before or after a document is
consumed using a couple simple hooks.
Just write a script, put it somewhere that Paperless can read & execute, and
then put the path to that script in ``paperless.conf`` or ``docker-compose.env`` with the variable name
of either ``PAPERLESS_PRE_CONSUME_SCRIPT`` or
``PAPERLESS_POST_CONSUME_SCRIPT``.
.. important::
These scripts are executed in a **blocking** process, which means that if
a script takes a long time to run, it can significantly slow down your
document consumption flow. If you want things to run asynchronously,
you'll have to fork the process in your script and exit.
Pre-consumption script
======================
Executed after the consumer sees a new document in the consumption folder, but
before any processing of the document is performed. This script receives exactly
one argument:
* Document file name
A simple but common example for this would be creating a simple script like
this:
``/usr/local/bin/ocr-pdf``
.. code:: bash
#!/usr/bin/env bash
pdf2pdfocr.py -i ${1}
``/etc/paperless.conf``
.. code:: bash
...
PAPERLESS_PRE_CONSUME_SCRIPT="/usr/local/bin/ocr-pdf"
...
This will pass the path to the document about to be consumed to ``/usr/local/bin/ocr-pdf``,
which will in turn call `pdf2pdfocr.py`_ on your document, which will then
overwrite the file with an OCR'd version of the file and exit. At which point,
the consumption process will begin with the newly modified file.
.. _pdf2pdfocr.py: https://github.com/LeoFCardoso/pdf2pdfocr
.. _advanced-post_consume_script:
Post-consumption script
=======================
Executed after the consumer has successfully processed a document and has moved it
into paperless. It receives the following arguments:
* Document id
* Generated file name
* Source path
* Thumbnail path
* Download URL
* Thumbnail URL
* Correspondent
* Tags
The script can be in any language, but for a simple shell script
example, you can take a look at `post-consumption-example.sh`_ in this project.
The post consumption script cannot cancel the consumption process.
Docker
------
Assumed you have ``/home/foo/paperless-ngx/scripts/post-consumption-example.sh``.
You can pass that script into the consumer container via a host mount in your ``docker-compose.yml``.
.. code:: bash
...
consumer:
...
volumes:
...
- /home/paperless-ngx/scripts:/path/in/container/scripts/
...
Example (docker-compose.yml): ``- /home/foo/paperless-ngx/scripts:/usr/src/paperless/scripts``
which in turn requires the variable ``PAPERLESS_POST_CONSUME_SCRIPT`` in ``docker-compose.env`` to point to ``/path/in/container/scripts/post-consumption-example.sh``.
Example (docker-compose.env): ``PAPERLESS_POST_CONSUME_SCRIPT=/usr/src/paperless/scripts/post-consumption-example.sh``
Troubleshooting:
- Monitor the docker-compose log ``cd ~/paperless-ngx; docker-compose logs -f``
- Check your script's permission e.g. in case of permission error ``sudo chmod 755 post-consumption-example.sh``
- Pipe your scripts's output to a log file e.g. ``echo "${DOCUMENT_ID}" | tee --append /usr/src/paperless/scripts/post-consumption-example.log``
.. _post-consumption-example.sh: https://github.com/jonaswinkler/paperless-ngx/blob/master/scripts/post-consumption-example.sh
.. _advanced-file_name_handling:
File name handling
##################
By default, paperless stores your documents in the media directory and renames them
using the identifier which it has assigned to each document. You will end up getting
files like ``0000123.pdf`` in your media directory. This isn't necessarily a bad
thing, because you normally don't have to access these files manually. However, if
you wish to name your files differently, you can do that by adjusting the
``PAPERLESS_FILENAME_FORMAT`` configuration option.
This variable allows you to configure the filename (folders are allowed) using
placeholders. For example, configuring this to
.. code:: bash
PAPERLESS_FILENAME_FORMAT={created_year}/{correspondent}/{title}
will create a directory structure as follows:
.. code::
2019/
My bank/
Statement January.pdf
Statement February.pdf
2020/
My bank/
Statement January.pdf
Letter.pdf
Letter_01.pdf
Shoe store/
My new shoes.pdf
.. danger::
Do not manually move your files in the media folder. Paperless remembers the
last filename a document was stored as. If you do rename a file, paperless will
report your files as missing and won't be able to find them.
Paperless provides the following placeholders withing filenames:
* ``{asn}``: The archive serial number of the document, or "none".
* ``{correspondent}``: The name of the correspondent, or "none".
* ``{document_type}``: The name of the document type, or "none".
* ``{tag_list}``: A comma separated list of all tags assigned to the document.
* ``{title}``: The title of the document.
* ``{created}``: The full date (ISO format) the document was created.
* ``{created_year}``: Year created only.
* ``{created_month}``: Month created only (number 01-12).
* ``{created_day}``: Day created only (number 01-31).
* ``{added}``: The full date (ISO format) the document was added to paperless.
* ``{added_year}``: Year added only.
* ``{added_month}``: Month added only (number 01-12).
* ``{added_day}``: Day added only (number 01-31).
Paperless will try to conserve the information from your database as much as possible.
However, some characters that you can use in document titles and correspondent names (such
as ``: \ /`` and a couple more) are not allowed in filenames and will be replaced with dashes.
If paperless detects that two documents share the same filename, paperless will automatically
append ``_01``, ``_02``, etc to the filename. This happens if all the placeholders in a filename
evaluate to the same value.
.. hint::
Paperless checks the filename of a document whenever it is saved. Therefore,
you need to update the filenames of your documents and move them after altering
this setting by invoking the :ref:`document renamer <utilities-renamer>`.
.. warning::
Make absolutely sure you get the spelling of the placeholders right, or else
paperless will use the default naming scheme instead.
.. caution::
As of now, you could totally tell paperless to store your files anywhere outside
the media directory by setting
.. code::
PAPERLESS_FILENAME_FORMAT=../../my/custom/location/{title}
However, keep in mind that inside docker, if files get stored outside of the
predefined volumes, they will be lost after a restart of paperless.

View File

@@ -1,12 +1,300 @@
.. _api:
************
The REST API
************
.. cssclass:: redirect-notice
Paperless makes use of the `Django REST Framework`_ standard API interface.
It provides a browsable API for most of its endpoints, which you can inspect
at ``http://<paperless-host>:<port>/api/``. This also documents most of the
available filters and ordering fields.
The Paperless-ngx documentation has permanently moved.
.. _Django REST Framework: http://django-rest-framework.org/
You will be redirected shortly...
The API provides 5 main endpoints:
* ``/api/documents/``: Full CRUD support, except POSTing new documents. See below.
* ``/api/correspondents/``: Full CRUD support.
* ``/api/document_types/``: Full CRUD support.
* ``/api/logs/``: Read-Only.
* ``/api/tags/``: Full CRUD support.
All of these endpoints except for the logging endpoint
allow you to fetch, edit and delete individual objects
by appending their primary key to the path, for example ``/api/documents/454/``.
The objects served by the document endpoint contain the following fields:
* ``id``: ID of the document. Read-only.
* ``title``: Title of the document.
* ``content``: Plain text content of the document.
* ``tags``: List of IDs of tags assigned to this document, or empty list.
* ``document_type``: Document type of this document, or null.
* ``correspondent``: Correspondent of this document or null.
* ``created``: The date at which this document was created.
* ``modified``: The date at which this document was last edited in paperless. Read-only.
* ``added``: The date at which this document was added to paperless. Read-only.
* ``archive_serial_number``: The identifier of this document in a physical document archive.
* ``original_file_name``: Verbose filename of the original document. Read-only.
* ``archived_file_name``: Verbose filename of the archived document. Read-only. Null if no archived document is available.
Downloading documents
#####################
In addition to that, the document endpoint offers these additional actions on
individual documents:
* ``/api/documents/<pk>/download/``: Download the document.
* ``/api/documents/<pk>/preview/``: Display the document inline,
without downloading it.
* ``/api/documents/<pk>/thumb/``: Download the PNG thumbnail of a document.
Paperless generates archived PDF/A documents from consumed files and stores both
the original files as well as the archived files. By default, the endpoints
for previews and downloads serve the archived file, if it is available.
Otherwise, the original file is served.
Some document cannot be archived.
The endpoints correctly serve the response header fields ``Content-Disposition``
and ``Content-Type`` to indicate the filename for download and the type of content of
the document.
In order to download or preview the original document when an archived document is available,
supply the query parameter ``original=true``.
.. hint::
Paperless used to provide these functionality at ``/fetch/<pk>/preview``,
``/fetch/<pk>/thumb`` and ``/fetch/<pk>/doc``. Redirects to the new URLs
are in place. However, if you use these old URLs to access documents, you
should update your app or script to use the new URLs.
Getting document metadata
#########################
The api also has an endpoint to retrieve read-only metadata about specific documents. this
information is not served along with the document objects, since it requires reading
files and would therefore slow down document lists considerably.
Access the metadata of a document with an ID ``id`` at ``/api/documents/<id>/metadata/``.
The endpoint reports the following data:
* ``original_checksum``: MD5 checksum of the original document.
* ``original_size``: Size of the original document, in bytes.
* ``original_mime_type``: Mime type of the original document.
* ``media_filename``: Current filename of the document, under which it is stored inside the media directory.
* ``has_archive_version``: True, if this document is archived, false otherwise.
* ``original_metadata``: A list of metadata associated with the original document. See below.
* ``archive_checksum``: MD5 checksum of the archived document, or null.
* ``archive_size``: Size of the archived document in bytes, or null.
* ``archive_metadata``: Metadata associated with the archived document, or null. See below.
File metadata is reported as a list of objects in the following form:
.. code:: json
[
{
"namespace": "http://ns.adobe.com/pdf/1.3/",
"prefix": "pdf",
"key": "Producer",
"value": "SparklePDF, Fancy edition"
},
]
``namespace`` and ``prefix`` can be null. The actual metadata reported depends on the file type and the metadata
available in that specific document. Paperless only reports PDF metadata at this point.
Authorization
#############
The REST api provides three different forms of authentication.
1. Basic authentication
Authorize by providing a HTTP header in the form
.. code::
Authorization: Basic <credentials>
where ``credentials`` is a base64-encoded string of ``<username>:<password>``
2. Session authentication
When you're logged into paperless in your browser, you're automatically
logged into the API as well and don't need to provide any authorization
headers.
3. Token authentication
Paperless also offers an endpoint to acquire authentication tokens.
POST a username and password as a form or json string to ``/api/token/``
and paperless will respond with a token, if the login data is correct.
This token can be used to authenticate other requests with the
following HTTP header:
.. code::
Authorization: Token <token>
Tokens can be managed and revoked in the paperless admin.
Searching for documents
#######################
Full text searching is available on the ``/api/documents/`` endpoint. Two specific
query parameters cause the API to return full text search results:
* ``/api/documents/?query=your%20search%20query``: Search for a document using a full text query.
For details on the syntax, see :ref:`basic-usage_searching`.
* ``/api/documents/?more_like=1234``: Search for documents similar to the document with id 1234.
Pagination works exactly the same as it does for normal requests on this endpoint.
Certain limitations apply to full text queries:
* Results are always sorted by search score. The results matching the query best will show up first.
* Only a small subset of filtering parameters are supported.
Furthermore, each returned document has an additional ``__search_hit__`` attribute with various information
about the search results:
.. code::
{
"count": 31,
"next": "http://localhost:8000/api/documents/?page=2&query=test",
"previous": null,
"results": [
...
{
"id": 123,
"title": "title",
"content": "content",
...
"__search_hit__": {
"score": 0.343,
"highlights": "text <span class=\"match\">Test</span> text",
"rank": 23
}
},
...
]
}
* ``score`` is an indication how well this document matches the query relative to the other search results.
* ``highlights`` is an excerpt from the document content and highlights the search terms with ``<span>`` tags as shown above.
* ``rank`` is the index of the search results. The first result will have rank 0.
``/api/search/autocomplete/``
=============================
Get auto completions for a partial search term.
Query parameters:
* ``term``: The incomplete term.
* ``limit``: Amount of results. Defaults to 10.
Results returned by the endpoint are ordered by importance of the term in the
document index. The first result is the term that has the highest Tf/Idf score
in the index.
.. code:: json
[
"term1",
"term3",
"term6",
"term4"
]
.. _api-file_uploads:
POSTing documents
#################
The API provides a special endpoint for file uploads:
``/api/documents/post_document/``
POST a multipart form to this endpoint, where the form field ``document`` contains
the document that you want to upload to paperless. The filename is sanitized and
then used to store the document in a temporary directory, and the consumer will
be instructed to consume the document from there.
The endpoint supports the following optional form fields:
* ``title``: Specify a title that the consumer should use for the document.
* ``correspondent``: Specify the ID of a correspondent that the consumer should use for the document.
* ``document_type``: Similar to correspondent.
* ``tags``: Similar to correspondent. Specify this multiple times to have multiple tags added
to the document.
The endpoint will immediately return "OK" if the document consumption process
was started successfully. No additional status information about the consumption
process itself is available, since that happens in a different process.
.. _api-versioning:
API Versioning
##############
The REST API is versioned since Paperless-ngx 1.3.0.
* Versioning ensures that changes to the API don't break older clients.
* Clients specify the specific version of the API they wish to use with every request and Paperless will handle the request using the specified API version.
* Even if the underlying data model changes, older API versions will always serve compatible data.
* If no version is specified, Paperless will serve version 1 to ensure compatibility with older clients that do not request a specific API version.
API versions are specified by submitting an additional HTTP ``Accept`` header with every request:
.. code::
Accept: application/json; version=6
If an invalid version is specified, Paperless 1.3.0 will respond with "406 Not Acceptable" and an error message in the body.
Earlier versions of Paperless will serve API version 1 regardless of whether a version is specified via the ``Accept`` header.
If a client wishes to verify whether it is compatible with any given server, the following procedure should be performed:
1. Perform an *authenticated* request against any API endpoint. If the server is on version 1.3.0 or newer, the server will
add two custom headers to the response:
.. code::
X-Api-Version: 2
X-Version: 1.3.0
2. Determine whether the client is compatible with this server based on the presence/absence of these headers and their values if present.
API Changelog
=============
Version 1
---------
Initial API version.
Version 2
---------
* Added field ``Tag.color``. This read/write string field contains a hex color such as ``#a6cee3``.
* Added read-only field ``Tag.text_color``. This field contains the text color to use for a specific tag, which is either black or white depending on the brightness of ``Tag.color``.
* Removed field ``Tag.colour``.

File diff suppressed because it is too large Load Diff

View File

@@ -2,8 +2,6 @@ import sphinx_rtd_theme
__version__ = None
__full_version_str__ = None
__major_minor_version_str__ = None
exec(open("../src/paperless/version.py").read())
@@ -14,17 +12,13 @@ extensions = [
"sphinx.ext.imgmath",
"sphinx.ext.viewcode",
"sphinx_rtd_theme",
"myst_parser",
]
# Add any paths that contain templates here, relative to this directory.
templates_path = ["_templates"]
# templates_path = ['_templates']
# The suffix of source filenames.
source_suffix = {
".rst": "restructuredtext",
".md": "markdown",
}
source_suffix = ".rst"
# The encoding of source files.
# source_encoding = 'utf-8-sig'
@@ -47,9 +41,9 @@ copyright = "2015-2022, Daniel Quinn, Jonas Winkler, and the paperless-ngx team"
#
# The short X.Y version.
version = __major_minor_version_str__
version = ".".join([str(_) for _ in __version__[:2]])
# The full version, including alpha/beta/rc tags.
release = __full_version_str__
release = ".".join([str(_) for _ in __version__[:3]])
# The language for content autogenerated by Sphinx. Refer to documentation
# for a list of supported languages.
@@ -125,16 +119,6 @@ html_theme_path = []
# so a file named "default.css" will overwrite the builtin "default.css".
html_static_path = ["_static"]
# These paths are either relative to html_static_path
# or fully qualified paths (eg. https://...)
html_css_files = [
"css/custom.css",
]
html_js_files = [
"js/darkmode.js",
]
# Add any extra paths that contain custom files (such as robots.txt or
# .htaccess) here, relative to this directory. These files are copied
# directly to the root of the documentation.

View File

@@ -4,9 +4,763 @@
Configuration
*************
Paperless provides a wide range of customizations.
Depending on how you run paperless, these settings have to be defined in different
places.
.. cssclass:: redirect-notice
* If you run paperless on docker, ``paperless.conf`` is not used. Rather, configure
paperless by copying necessary options to ``docker-compose.env``.
* If you are running paperless on anything else, paperless will search for the
configuration file in these locations and use the first one it finds:
The Paperless-ngx documentation has permanently moved.
.. code::
You will be redirected shortly...
/path/to/paperless/paperless.conf
/etc/paperless.conf
/usr/local/etc/paperless.conf
Required services
#################
PAPERLESS_REDIS=<url>
This is required for processing scheduled tasks such as email fetching, index
optimization and for training the automatic document matcher.
Defaults to redis://localhost:6379.
PAPERLESS_DBHOST=<hostname>
By default, sqlite is used as the database backend. This can be changed here.
Set PAPERLESS_DBHOST and PostgreSQL will be used instead of mysql.
PAPERLESS_DBPORT=<port>
Adjust port if necessary.
Default is 5432.
PAPERLESS_DBNAME=<name>
Database name in PostgreSQL.
Defaults to "paperless".
PAPERLESS_DBUSER=<name>
Database user in PostgreSQL.
Defaults to "paperless".
PAPERLESS_DBPASS=<password>
Database password for PostgreSQL.
Defaults to "paperless".
PAPERLESS_DBSSLMODE=<mode>
SSL mode to use when connecting to PostgreSQL.
See `the official documentation about sslmode <https://www.postgresql.org/docs/current/libpq-ssl.html>`_.
Default is ``prefer``.
Paths and folders
#################
PAPERLESS_CONSUMPTION_DIR=<path>
This where your documents should go to be consumed. Make sure that it exists
and that the user running the paperless service can read/write its contents
before you start Paperless.
Don't change this when using docker, as it only changes the path within the
container. Change the local consumption directory in the docker-compose.yml
file instead.
Defaults to "../consume/", relative to the "src" directory.
PAPERLESS_DATA_DIR=<path>
This is where paperless stores all its data (search index, SQLite database,
classification model, etc).
Defaults to "../data/", relative to the "src" directory.
PAPERLESS_TRASH_DIR=<path>
Instead of removing deleted documents, they are moved to this directory.
This must be writeable by the user running paperless. When running inside
docker, ensure that this path is within a permanent volume (such as
"../media/trash") so it won't get lost on upgrades.
Defaults to empty (i.e. really delete documents).
PAPERLESS_MEDIA_ROOT=<path>
This is where your documents and thumbnails are stored.
You can set this and PAPERLESS_DATA_DIR to the same folder to have paperless
store all its data within the same volume.
Defaults to "../media/", relative to the "src" directory.
PAPERLESS_STATICDIR=<path>
Override the default STATIC_ROOT here. This is where all static files
created using "collectstatic" manager command are stored.
Unless you're doing something fancy, there is no need to override this.
Defaults to "../static/", relative to the "src" directory.
PAPERLESS_FILENAME_FORMAT=<format>
Changes the filenames paperless uses to store documents in the media directory.
See :ref:`advanced-file_name_handling` for details.
Default is none, which disables this feature.
PAPERLESS_LOGGING_DIR=<path>
This is where paperless will store log files.
Defaults to "``PAPERLESS_DATA_DIR``/log/".
Logging
#######
PAPERLESS_LOGROTATE_MAX_SIZE=<num>
Maximum file size for log files before they are rotated, in bytes.
Defaults to 1 MiB.
PAPERLESS_LOGROTATE_MAX_BACKUPS=<num>
Number of rotated log files to keep.
Defaults to 20.
Hosting & Security
##################
PAPERLESS_SECRET_KEY=<key>
Paperless uses this to make session tokens. If you expose paperless on the
internet, you need to change this, since the default secret is well known.
Use any sequence of characters. The more, the better. You don't need to
remember this. Just face-roll your keyboard.
Default is listed in the file ``src/paperless/settings.py``.
PAPERLESS_ALLOWED_HOSTS<comma-separated-list>
If you're planning on putting Paperless on the open internet, then you
really should set this value to the domain name you're using. Failing to do
so leaves you open to HTTP host header attacks:
https://docs.djangoproject.com/en/3.1/topics/security/#host-header-validation
Just remember that this is a comma-separated list, so "example.com" is fine,
as is "example.com,www.example.com", but NOT " example.com" or "example.com,"
Defaults to "*", which is all hosts.
PAPERLESS_CORS_ALLOWED_HOSTS<comma-separated-list>
You need to add your servers to the list of allowed hosts that can do CORS
calls. Set this to your public domain name.
Defaults to "http://localhost:8000".
PAPERLESS_FORCE_SCRIPT_NAME=<path>
To host paperless under a subpath url like example.com/paperless you set
this value to /paperless. No trailing slash!
Defaults to none, which hosts paperless at "/".
PAPERLESS_STATIC_URL=<path>
Override the STATIC_URL here. Unless you're hosting Paperless off a
subdomain like /paperless/, you probably don't need to change this.
Defaults to "/static/".
PAPERLESS_AUTO_LOGIN_USERNAME=<username>
Specify a username here so that paperless will automatically perform login
with the selected user.
.. danger::
Do not use this when exposing paperless on the internet. There are no
checks in place that would prevent you from doing this.
Defaults to none, which disables this feature.
PAPERLESS_ADMIN_USER=<username>
If this environment variable is specified, Paperless automatically creates
a superuser with the provided username at start. This is useful in cases
where you can not run the `createsuperuser` command seperately, such as Kubernetes
or AWS ECS.
Requires `PAPERLESS_ADMIN_PASSWORD` to be set.
.. note::
This will not change an existing [super]user's password, nor will
it recreate a user that already exists. You can leave this throughout
the lifecycle of the containers.
PAPERLESS_ADMIN_MAIL=<email>
(Optional) Specify superuser email address. Only used when
`PAPERLESS_ADMIN_USER` is set.
Defaults to ``root@localhost``.
PAPERLESS_ADMIN_PASSWORD=<password>
Only used when `PAPERLESS_ADMIN_USER` is set.
This will be the password of the automatically created superuser.
PAPERLESS_COOKIE_PREFIX=<str>
Specify a prefix that is added to the cookies used by paperless to identify
the currently logged in user. This is useful for when you're running two
instances of paperless on the same host.
After changing this, you will have to login again.
Defaults to ``""``, which does not alter the cookie names.
PAPERLESS_ENABLE_HTTP_REMOTE_USER=<bool>
Allows authentication via HTTP_REMOTE_USER which is used by some SSO
applications.
.. warning::
This will allow authentication by simply adding a ``Remote-User: <username>`` header
to a request. Use with care! You especially *must* ensure that any such header is not
passed from your proxy server to paperless.
If you're exposing paperless to the internet directly, do not use this.
Also see the warning `in the official documentation <https://docs.djangoproject.com/en/3.1/howto/auth-remote-user/#configuration>`.
Defaults to `false` which disables this feature.
PAPERLESS_HTTP_REMOTE_USER_HEADER_NAME=<str>
If `PAPERLESS_ENABLE_HTTP_REMOTE_USER` is enabled, this property allows to
customize the name of the HTTP header from which the authenticated username
is extracted. Values are in terms of
[HttpRequest.META](https://docs.djangoproject.com/en/3.1/ref/request-response/#django.http.HttpRequest.META).
Thus, the configured value must start with `HTTP_` followed by the
normalized actual header name.
Defaults to `HTTP_REMOTE_USER`.
PAPERLESS_LOGOUT_REDIRECT_URL=<str>
URL to redirect the user to after a logout. This can be used together with
`PAPERLESS_ENABLE_HTTP_REMOTE_USER` to redirect the user back to the SSO
application's logout page.
Defaults to None, which disables this feature.
.. _configuration-ocr:
OCR settings
############
Paperless uses `OCRmyPDF <https://ocrmypdf.readthedocs.io/en/latest/>`_ for
performing OCR on documents and images. Paperless uses sensible defaults for
most settings, but all of them can be configured to your needs.
PAPERLESS_OCR_LANGUAGE=<lang>
Customize the language that paperless will attempt to use when
parsing documents.
It should be a 3-letter language code consistent with ISO
639: https://www.loc.gov/standards/iso639-2/php/code_list.php
Set this to the language most of your documents are written in.
This can be a combination of multiple languages such as ``deu+eng``,
in which case tesseract will use whatever language matches best.
Keep in mind that tesseract uses much more cpu time with multiple
languages enabled.
Defaults to "eng".
Note: If your language contains a '-' such as chi-sim, you must use chi_sim
PAPERLESS_OCR_MODE=<mode>
Tell paperless when and how to perform ocr on your documents. Four modes
are available:
* ``skip``: Paperless skips all pages and will perform ocr only on pages
where no text is present. This is the safest option.
* ``skip_noarchive``: In addition to skip, paperless won't create an
archived version of your documents when it finds any text in them.
This is useful if you don't want to have two almost-identical versions
of your digital documents in the media folder. This is the fastest option.
* ``redo``: Paperless will OCR all pages of your documents and attempt to
replace any existing text layers with new text. This will be useful for
documents from scanners that already performed OCR with insufficient
results. It will also perform OCR on purely digital documents.
This option may fail on some documents that have features that cannot
be removed, such as forms. In this case, the text from the document is
used instead.
* ``force``: Paperless rasterizes your documents, converting any text
into images and puts the OCRed text on top. This works for all documents,
however, the resulting document may be significantly larger and text
won't appear as sharp when zoomed in.
The default is ``skip``, which only performs OCR when necessary and always
creates archived documents.
Read more about this in the `OCRmyPDF documentation <https://ocrmypdf.readthedocs.io/en/latest/advanced.html#when-ocr-is-skipped>`_.
PAPERLESS_OCR_CLEAN=<mode>
Tells paperless to use ``unpaper`` to clean any input document before
sending it to tesseract. This uses more resources, but generally results
in better OCR results. The following modes are available:
* ``clean``: Apply unpaper.
* ``clean-final``: Apply unpaper, and use the cleaned images to build the
output file instead of the original images.
* ``none``: Do not apply unpaper.
Defaults to ``clean``.
.. note::
``clean-final`` is incompatible with ocr mode ``redo``. When both
``clean-final`` and the ocr mode ``redo`` is configured, ``clean``
is used instead.
PAPERLESS_OCR_DESKEW=<bool>
Tells paperless to correct skewing (slight rotation of input images mainly
due to improper scanning)
Defaults to ``true``, which enables this feature.
.. note::
Deskewing is incompatible with ocr mode ``redo``. Deskewing will get
disabled automatically if ``redo`` is used as the ocr mode.
PAPERLESS_OCR_ROTATE_PAGES=<bool>
Tells paperless to correct page rotation (90°, 180° and 270° rotation).
If you notice that paperless is not rotating incorrectly rotated
pages (or vice versa), try adjusting the threshold up or down (see below).
Defaults to ``true``, which enables this feature.
PAPERLESS_OCR_ROTATE_PAGES_THRESHOLD=<num>
Adjust the threshold for automatic page rotation by ``PAPERLESS_OCR_ROTATE_PAGES``.
This is an arbitrary value reported by tesseract. "15" is a very conservative value,
whereas "2" is a very aggressive option and will often result in correctly rotated pages
being rotated as well.
Defaults to "12".
PAPERLESS_OCR_OUTPUT_TYPE=<type>
Specify the the type of PDF documents that paperless should produce.
* ``pdf``: Modify the PDF document as little as possible.
* ``pdfa``: Convert PDF documents into PDF/A-2b documents, which is a
subset of the entire PDF specification and meant for storing
documents long term.
* ``pdfa-1``, ``pdfa-2``, ``pdfa-3`` to specify the exact version of
PDF/A you wish to use.
If not specified, ``pdfa`` is used. Remember that paperless also keeps
the original input file as well as the archived version.
PAPERLESS_OCR_PAGES=<num>
Tells paperless to use only the specified amount of pages for OCR. Documents
with less than the specified amount of pages get OCR'ed completely.
Specifying 1 here will only use the first page.
When combined with ``PAPERLESS_OCR_MODE=redo`` or ``PAPERLESS_OCR_MODE=force``,
paperless will not modify any text it finds on excluded pages and copy it
verbatim.
Defaults to 0, which disables this feature and always uses all pages.
PAPERLESS_OCR_IMAGE_DPI=<num>
Paperless will OCR any images you put into the system and convert them
into PDF documents. This is useful if your scanner produces images.
In order to do so, paperless needs to know the DPI of the image.
Most images from scanners will have this information embedded and
paperless will detect and use that information. In case this fails, it
uses this value as a fallback.
Set this to the DPI your scanner produces images at.
Default is none, which will automatically calculate image DPI so that
the produced PDF documents are A4 sized.
PAPERLESS_OCR_MAX_IMAGE_PIXELS=<num>
Paperless will not OCR images that have more pixels than this limit.
This is intended to prevent decompression bombs from overloading paperless.
Increasing this limit is desired if you face a DecompressionBombError despite
the concerning file not being malicious; this could e.g. be caused by invalidly
recognized metadata.
If you have enough resources or if you are certain that your uploaded files
are not malicious you can increase this value to your needs.
The default value is 256000000, an image with more pixels than that would not be parsed.
PAPERLESS_OCR_USER_ARGS=<json>
OCRmyPDF offers many more options. Use this parameter to specify any
additional arguments you wish to pass to OCRmyPDF. Since Paperless uses
the API of OCRmyPDF, you have to specify these in a format that can be
passed to the API. See `the API reference of OCRmyPDF <https://ocrmypdf.readthedocs.io/en/latest/api.html#reference>`_
for valid parameters. All command line options are supported, but they
use underscores instead of dashes.
.. caution::
Paperless has been tested to work with the OCR options provided
above. There are many options that are incompatible with each other,
so specifying invalid options may prevent paperless from consuming
any documents.
Specify arguments as a JSON dictionary. Keep note of lower case booleans
and double quoted parameter names and strings. Examples:
.. code:: json
{"deskew": true, "optimize": 3, "unpaper_args": "--pre-rotate 90"}
.. _configuration-tika:
Tika settings
#############
Paperless can make use of `Tika <https://tika.apache.org/>`_ and
`Gotenberg <https://gotenberg.dev/>`_ for parsing and
converting "Office" documents (such as ".doc", ".xlsx" and ".odt"). If you
wish to use this, you must provide a Tika server and a Gotenberg server,
configure their endpoints, and enable the feature.
PAPERLESS_TIKA_ENABLED=<bool>
Enable (or disable) the Tika parser.
Defaults to false.
PAPERLESS_TIKA_ENDPOINT=<url>
Set the endpoint URL were Paperless can reach your Tika server.
Defaults to "http://localhost:9998".
PAPERLESS_TIKA_GOTENBERG_ENDPOINT=<url>
Set the endpoint URL were Paperless can reach your Gotenberg server.
Defaults to "http://localhost:3000".
If you run paperless on docker, you can add those services to the docker-compose
file (see the provided ``docker-compose.tika.yml`` file for reference). The changes
requires are as follows:
.. code:: yaml
services:
# ...
webserver:
# ...
environment:
# ...
PAPERLESS_TIKA_ENABLED: 1
PAPERLESS_TIKA_GOTENBERG_ENDPOINT: http://gotenberg:3000
PAPERLESS_TIKA_ENDPOINT: http://tika:9998
# ...
gotenberg:
image: gotenberg/gotenberg:7
restart: unless-stopped
command:
- "gotenberg"
- "--chromium-disable-routes=true"
tika:
image: apache/tika
restart: unless-stopped
Add the configuration variables to the environment of the webserver (alternatively
put the configuration in the ``docker-compose.env`` file) and add the additional
services below the webserver service. Watch out for indentation.
Make sure to use the correct format `PAPERLESS_TIKA_ENABLED = 1` so python_dotenv can parse the statement correctly.
Software tweaks
###############
PAPERLESS_TASK_WORKERS=<num>
Paperless does multiple things in the background: Maintain the search index,
maintain the automatic matching algorithm, check emails, consume documents,
etc. This variable specifies how many things it will do in parallel.
PAPERLESS_THREADS_PER_WORKER=<num>
Furthermore, paperless uses multiple threads when consuming documents to
speed up OCR. This variable specifies how many pages paperless will process
in parallel on a single document.
.. caution::
Ensure that the product
PAPERLESS_TASK_WORKERS * PAPERLESS_THREADS_PER_WORKER
does not exceed your CPU core count or else paperless will be extremely slow.
If you want paperless to process many documents in parallel, choose a high
worker count. If you want paperless to process very large documents faster,
use a higher thread per worker count.
The default is a balance between the two, according to your CPU core count,
with a slight favor towards threads per worker:
+----------------+---------+---------+
| CPU core count | Workers | Threads |
+----------------+---------+---------+
| 1 | 1 | 1 |
+----------------+---------+---------+
| 2 | 2 | 1 |
+----------------+---------+---------+
| 4 | 2 | 2 |
+----------------+---------+---------+
| 6 | 2 | 3 |
+----------------+---------+---------+
| 8 | 2 | 4 |
+----------------+---------+---------+
| 12 | 3 | 4 |
+----------------+---------+---------+
| 16 | 4 | 4 |
+----------------+---------+---------+
If you only specify PAPERLESS_TASK_WORKERS, paperless will adjust
PAPERLESS_THREADS_PER_WORKER automatically.
PAPERLESS_WORKER_TIMEOUT=<num>
Machines with few cores or weak ones might not be able to finish OCR on
large documents within the default 1800 seconds. So extending this timeout
may prove to be useful on weak hardware setups.
PAPERLESS_TIME_ZONE=<timezone>
Set the time zone here.
See https://docs.djangoproject.com/en/3.1/ref/settings/#std:setting-TIME_ZONE
for details on how to set it.
Defaults to UTC.
.. _configuration-polling:
PAPERLESS_CONSUMER_POLLING=<num>
If paperless won't find documents added to your consume folder, it might
not be able to automatically detect filesystem changes. In that case,
specify a polling interval in seconds here, which will then cause paperless
to periodically check your consumption directory for changes. This will also
disable listening for file system changes with ``inotify``.
Defaults to 0, which disables polling and uses filesystem notifications.
PAPERLESS_CONSUMER_DELETE_DUPLICATES=<bool>
When the consumer detects a duplicate document, it will not touch the
original document. This default behavior can be changed here.
Defaults to false.
PAPERLESS_CONSUMER_RECURSIVE=<bool>
Enable recursive watching of the consumption directory. Paperless will
then pickup files from files in subdirectories within your consumption
directory as well.
Defaults to false.
PAPERLESS_CONSUMER_SUBDIRS_AS_TAGS=<bool>
Set the names of subdirectories as tags for consumed files.
E.g. <CONSUMPTION_DIR>/foo/bar/file.pdf will add the tags "foo" and "bar" to
the consumed file. Paperless will create any tags that don't exist yet.
This is useful for sorting documents with certain tags such as ``car`` or
``todo`` prior to consumption. These folders won't be deleted.
PAPERLESS_CONSUMER_RECURSIVE must be enabled for this to work.
Defaults to false.
PAPERLESS_CONVERT_MEMORY_LIMIT=<num>
On smaller systems, or even in the case of Very Large Documents, the consumer
may explode, complaining about how it's "unable to extend pixel cache". In
such cases, try setting this to a reasonably low value, like 32. The
default is to use whatever is necessary to do everything without writing to
disk, and units are in megabytes.
For more information on how to use this value, you should search
the web for "MAGICK_MEMORY_LIMIT".
Defaults to 0, which disables the limit.
PAPERLESS_CONVERT_TMPDIR=<path>
Similar to the memory limit, if you've got a small system and your OS mounts
/tmp as tmpfs, you should set this to a path that's on a physical disk, like
/home/your_user/tmp or something. ImageMagick will use this as scratch space
when crunching through very large documents.
For more information on how to use this value, you should search
the web for "MAGICK_TMPDIR".
Default is none, which disables the temporary directory.
PAPERLESS_OPTIMIZE_THUMBNAILS=<bool>
Use optipng to optimize thumbnails. This usually reduces the size of
thumbnails by about 20%, but uses considerable compute time during
consumption.
Defaults to true.
PAPERLESS_POST_CONSUME_SCRIPT=<filename>
After a document is consumed, Paperless can trigger an arbitrary script if
you like. This script will be passed a number of arguments for you to work
with. For more information, take a look at :ref:`advanced-post_consume_script`.
The default is blank, which means nothing will be executed.
PAPERLESS_FILENAME_DATE_ORDER=<format>
Paperless will check the document text for document date information.
Use this setting to enable checking the document filename for date
information. The date order can be set to any option as specified in
https://dateparser.readthedocs.io/en/latest/settings.html#date-order.
The filename will be checked first, and if nothing is found, the document
text will be checked as normal.
Defaults to none, which disables this feature.
PAPERLESS_THUMBNAIL_FONT_NAME=<filename>
Paperless creates thumbnails for plain text files by rendering the content
of the file on an image and uses a predefined font for that. This
font can be changed here.
Note that this won't have any effect on already generated thumbnails.
Defaults to ``/usr/share/fonts/liberation/LiberationSerif-Regular.ttf``.
PAPERLESS_IGNORE_DATES=<string>
Paperless parses a documents creation date from filename and file content.
You may specify a comma separated list of dates that should be ignored during
this process. This is useful for special dates (like date of birth) that appear
in documents regularly but are very unlikely to be the documents creation date.
You may specify dates in a multitude of formats supported by dateparser (see
https://dateparser.readthedocs.io/en/latest/#popular-formats) but as the dates
need to be comma separated, the options are limited.
Example: "2020-12-02,22.04.1999"
Defaults to an empty string to not ignore any dates.
PAPERLESS_DATE_ORDER=<format>
Paperless will try to determine the document creation date from its contents.
Specify the date format Paperless should expect to see within your documents.
This option defaults to DMY which translates to day first, month second, and year
last order. Characters D, M, or Y can be shuffled to meet the required order.
PAPERLESS_CONSUMER_IGNORE_PATTERNS=<json>
By default, paperless ignores certain files and folders in the consumption
directory, such as system files created by the Mac OS.
This can be adjusted by configuring a custom json array with patterns to exclude.
Defautls to ``[".DS_STORE/*", "._*", ".stfolder/*"]``.
Binaries
########
There are a few external software packages that Paperless expects to find on
your system when it starts up. Unless you've done something creative with
their installation, you probably won't need to edit any of these. However,
if you've installed these programs somewhere where simply typing the name of
the program doesn't automatically execute it (ie. the program isn't in your
$PATH), then you'll need to specify the literal path for that program.
PAPERLESS_CONVERT_BINARY=<path>
Defaults to "/usr/bin/convert".
PAPERLESS_GS_BINARY=<path>
Defaults to "/usr/bin/gs".
PAPERLESS_OPTIPNG_BINARY=<path>
Defaults to "/usr/bin/optipng".
.. _configuration-docker:
Docker-specific options
#######################
These options don't have any effect in ``paperless.conf``. These options adjust
the behavior of the docker container. Configure these in `docker-compose.env`.
PAPERLESS_WEBSERVER_WORKERS=<num>
The number of worker processes the webserver should spawn. More worker processes
usually result in the front end to load data much quicker. However, each worker process
also loads the entire application into memory separately, so increasing this value
will increase RAM usage.
Consider configuring this to 1 on low power devices with limited amount of RAM.
Defaults to 2.
PAPERLESS_PORT=<port>
The port number the webserver will listen on inside the container. There are
special setups where you may need this to avoid collisions with other
services (like using podman with multiple containers in one pod).
Don't change this when using Docker. To change the port the webserver is
reachable outside of the container, instead refer to the "ports" key in
``docker-compose.yml``.
Defaults to 8000.
USERMAP_UID=<uid>
The ID of the paperless user in the container. Set this to your actual user ID on the
host system, which you can get by executing
.. code:: shell-session
$ id -u
Paperless will change ownership on its folders to this user, so you need to get this right
in order to be able to write to the consumption directory.
Defaults to 1000.
USERMAP_GID=<gid>
The ID of the paperless Group in the container. Set this to your actual group ID on the
host system, which you can get by executing
.. code:: shell-session
$ id -g
Paperless will change ownership on its folders to this group, so you need to get this right
in order to be able to write to the consumption directory.
Defaults to 1000.
PAPERLESS_OCR_LANGUAGES=<list>
Additional OCR languages to install. By default, paperless comes with
English, German, Italian, Spanish and French. If your language is not in this list, install
additional languages with this configuration option:
.. code:: bash
PAPERLESS_OCR_LANGUAGES=tur ces
To actually use these languages, also set the default OCR language of paperless:
.. code:: bash
PAPERLESS_OCR_LANGUAGE=tur
Defaults to none, which does not install any additional languages.

View File

@@ -1,12 +1,425 @@
.. _extending:
*************************
Paperless-ngx Development
*************************
#########################
This section describes the steps you need to take to start development on paperless-ngx.
Check out the source from github. The repository is organized in the following way:
* ``main`` always represents the latest release and will only see changes
when a new release is made.
* ``dev`` contains the code that will be in the next release.
* ``feature-X`` contain bigger changes that will be in some release, but not
necessarily the next one.
When making functional changes to paperless, *always* make your changes on the ``dev`` branch.
Apart from that, the folder structure is as follows:
* ``docs/`` - Documentation.
* ``src-ui/`` - Code of the front end.
* ``src/`` - Code of the back end.
* ``scripts/`` - Various scripts that help with different parts of development.
* ``docker/`` - Files required to build the docker image.
Contributing to Paperless
=========================
Maybe you've been using Paperless for a while and want to add a feature or two,
or maybe you've come across a bug that you have some ideas how to solve. The
beauty of open source software is that you can see what's wrong and help to get
it fixed for everyone!
Before contributing please review our `code of conduct`_ and other important
information in the `contributing guidelines`_.
.. _code-formatting-with-pre-commit-hooks:
Code formatting with pre-commit Hooks
=====================================
To ensure a consistent style and formatting across the project source, the project
utilizes a Git `pre-commit` hook to perform some formatting and linting before a
commit is allowed. That way, everyone uses the same style and some common issues
can be caught early on. See below for installation instructions.
Once installed, hooks will run when you commit. If the formatting isn't quite right
or a linter catches something, the commit will be rejected. You'll need to look at the
output and fix the issue. Some hooks, such as the Python formatting tool `black`,
will format failing files, so all you need to do is `git add` those files again and
retry your commit.
Initial setup and first start
=============================
After you forked and cloned the code from github you need to perform a first-time setup.
To do the setup you need to perform the steps from the following chapters in a certain order:
1. Install prerequisites + pipenv as mentioned in :ref:`Bare metal route <setup-bare_metal>`
2. Copy ``paperless.conf.example`` to ``paperless.conf`` and enable debug mode.
3. Install the Angular CLI interface:
.. code:: shell-session
$ npm install -g @angular/cli
4. Install pre-commit
.. code:: shell-session
pre-commit install
5. Create ``consume`` and ``media`` folders in the cloned root folder.
.. code:: shell-session
mkdir -p consume media
6. You can now either ...
* install redis or
* use the included scripts/start-services.sh to use docker to fire up a redis instance (and some other services such as tika, gotenberg and a postgresql server) or
* spin up a bare redis container
.. code:: shell-session
docker run -d -p 6379:6379 --restart unless-stopped redis:latest
7. Install the python dependencies by performing in the src/ directory.
.. code:: shell-session
pipenv install --dev
* Make sure you're using python 3.9.x or lower. Otherwise you might get issues with building dependencies. You can use `pyenv <https://github.com/pyenv/pyenv>`_ to install a specific python version.
8. Generate the static UI so you can perform a login to get session that is required for frontend development (this needs to be done one time only). From src-ui directory:
.. code:: shell-session
npm install .
./node_modules/.bin/ng build --configuration production
9. Apply migrations and create a superuser for your dev instance:
.. code:: shell-session
python3 manage.py migrate
python3 manage.py createsuperuser
10. Now spin up the dev backend. Depending on which part of paperless you're developing for, you need to have some or all of them running.
.. code:: shell-session
python3 manage.py runserver & python3 manage.py document_consumer & python3 manage.py qcluster
11. Login with the superuser credentials provided in step 8 at ``http://localhost:8000`` to create a session that enables you to use the backend.
Backend development environment is now ready, to start Frontend development go to ``/src-ui`` and run ``ng serve``. From there you can use ``http://localhost:4200`` for a preview.
Back end development
====================
The backend is a django application. PyCharm works well for development, but you can use whatever
you want.
Configure the IDE to use the src/ folder as the base source folder. Configure the following
launch configurations in your IDE:
* python3 manage.py runserver
* python3 manage.py qcluster
* python3 manage.py document_consumer
To start them all:
.. code:: shell-session
python3 manage.py runserver & python3 manage.py document_consumer & python3 manage.py qcluster
Testing and code style:
* Run ``pytest`` in the src/ directory to execute all tests. This also generates a HTML coverage
report. When runnings test, paperless.conf is loaded as well. However: the tests rely on the default
configuration. This is not ideal. But for now, make sure no settings except for DEBUG are overridden when testing.
* Coding style is enforced by the Git pre-commit hooks. These will ensure your code is formatted and do some
linting when you do a `git commit`.
* You can also run ``black`` manually to format your code
.. note::
The line length rule E501 is generally useful for getting multiple source files
next to each other on the screen. However, in some cases, its just not possible
to make some lines fit, especially complicated IF cases. Append ``# NOQA: E501``
to disable this check for certain lines.
Front end development
=====================
The front end is built using Angular. In order to get started, you need ``npm``.
Install the Angular CLI interface with
.. code:: shell-session
$ npm install -g @angular/cli
and make sure that it's on your path. Next, in the src-ui/ directory, install the
required dependencies of the project.
.. code:: shell-session
$ npm install
You can launch a development server by running
.. code:: shell-session
$ ng serve
This will automatically update whenever you save. However, in-place compilation might fail
on syntax errors, in which case you need to restart it.
By default, the development server is available on ``http://localhost:4200/`` and is configured
to access the API at ``http://localhost:8000/api/``, which is the default of the backend.
If you enabled DEBUG on the back end, several security overrides for allowed hosts, CORS and
X-Frame-Options are in place so that the front end behaves exactly as in production. This also
relies on you being logged into the back end. Without a valid session, The front end will simply
not work.
Testing and code style:
* The frontend code (.ts, .html, .scss) use ``prettier`` for code formatting via the Git
``pre-commit`` hooks which run automatically on commit. See
:ref:`above <code-formatting-with-pre-commit-hooks>` for installation. You can also run this
via cli with a command such as
.. code:: shell-session
$ git ls-files -- '*.ts' | xargs pre-commit run prettier --files
* Frontend testing uses jest and cypress. There is currently a need for significantly more
frontend tests. Unit tests and e2e tests, respectively, can be run non-interactively with:
.. code:: shell-session
$ ng test
$ npm run e2e:ci
Cypress also includes a UI which can be run from within the ``src-ui`` directory with
.. code:: shell-session
$ ./node_modules/.bin/cypress open
In order to build the front end and serve it as part of django, execute
.. code:: shell-session
$ ng build --prod
This will build the front end and put it in a location from which the Django server will serve
it as static content. This way, you can verify that authentication is working.
.. cssclass:: redirect-notice
Localization
============
The Paperless-ngx documentation has permanently moved.
Paperless is available in many different languages. Since paperless consists both of a django
application and an Angular front end, both these parts have to be translated separately.
You will be redirected shortly...
Front end localization
----------------------
* The Angular front end does localization according to the `Angular documentation <https://angular.io/guide/i18n>`_.
* The source language of the project is "en_US".
* The source strings end up in the file "src-ui/messages.xlf".
* The translated strings need to be placed in the "src-ui/src/locale/" folder.
* In order to extract added or changed strings from the source files, call ``ng xi18n --ivy``.
Adding new languages requires adding the translated files in the "src-ui/src/locale/" folder and adjusting a couple files.
1. Adjust "src-ui/angular.json":
.. code:: json
"i18n": {
"sourceLocale": "en-US",
"locales": {
"de": "src/locale/messages.de.xlf",
"nl-NL": "src/locale/messages.nl_NL.xlf",
"fr": "src/locale/messages.fr.xlf",
"en-GB": "src/locale/messages.en_GB.xlf",
"pt-BR": "src/locale/messages.pt_BR.xlf",
"language-code": "language-file"
}
}
2. Add the language to the available options in "src-ui/src/app/services/settings.service.ts":
.. code:: typescript
getLanguageOptions(): LanguageOption[] {
return [
{code: "en-us", name: $localize`English (US)`, englishName: "English (US)", dateInputFormat: "mm/dd/yyyy"},
{code: "en-gb", name: $localize`English (GB)`, englishName: "English (GB)", dateInputFormat: "dd/mm/yyyy"},
{code: "de", name: $localize`German`, englishName: "German", dateInputFormat: "dd.mm.yyyy"},
{code: "nl", name: $localize`Dutch`, englishName: "Dutch", dateInputFormat: "dd-mm-yyyy"},
{code: "fr", name: $localize`French`, englishName: "French", dateInputFormat: "dd/mm/yyyy"},
{code: "pt-br", name: $localize`Portuguese (Brazil)`, englishName: "Portuguese (Brazil)", dateInputFormat: "dd/mm/yyyy"}
// Add your new language here
]
}
``dateInputFormat`` is a special string that defines the behavior of the date input fields and absolutely needs to contain "dd", "mm" and "yyyy".
3. Import and register the Angular data for this locale in "src-ui/src/app/app.module.ts":
.. code:: typescript
import localeDe from '@angular/common/locales/de';
registerLocaleData(localeDe)
Back end localization
---------------------
A majority of the strings that appear in the back end appear only when the admin is used. However,
some of these are still shown on the front end (such as error messages).
* The django application does localization according to the `django documentation <https://docs.djangoproject.com/en/3.1/topics/i18n/translation/>`_.
* The source language of the project is "en_US".
* Localization files end up in the folder "src/locale/".
* In order to extract strings from the application, call ``python3 manage.py makemessages -l en_US``. This is important after making changes to translatable strings.
* The message files need to be compiled for them to show up in the application. Call ``python3 manage.py compilemessages`` to do this. The generated files don't get
committed into git, since these are derived artifacts. The build pipeline takes care of executing this command.
Adding new languages requires adding the translated files in the "src/locale/" folder and adjusting the file "src/paperless/settings.py" to include the new language:
.. code:: python
LANGUAGES = [
("en-us", _("English (US)")),
("en-gb", _("English (GB)")),
("de", _("German")),
("nl-nl", _("Dutch")),
("fr", _("French")),
("pt-br", _("Portuguese (Brazil)")),
# Add language here.
]
Building the documentation
==========================
The documentation is built using sphinx. I've configured ReadTheDocs to automatically build
the documentation when changes are pushed. If you want to build the documentation locally,
this is how you do it:
1. Install python dependencies.
.. code:: shell-session
$ cd /path/to/paperless
$ pipenv install --dev
2. Build the documentation
.. code:: shell-session
$ cd /path/to/paperless/docs
$ pipenv run make clean html
This will build the HTML documentation, and put the resulting files in the ``_build/html``
directory.
Building the Docker image
=========================
Building the docker image from source:
.. code:: shell-session
docker build . -t <your-tag>
Extending Paperless
===================
Paperless does not have any fancy plugin systems and will probably never have. However,
some parts of the application have been designed to allow easy integration of additional
features without any modification to the base code.
Making custom parsers
---------------------
Paperless uses parsers to add documents to paperless. A parser is responsible for:
* Retrieve the content from the original
* Create a thumbnail
* Optional: Retrieve a created date from the original
* Optional: Create an archived document from the original
Custom parsers can be added to paperless to support more file types. In order to do that,
you need to write the parser itself and announce its existence to paperless.
The parser itself must extend ``documents.parsers.DocumentParser`` and must implement the
methods ``parse`` and ``get_thumbnail``. You can provide your own implementation to
``get_date`` if you don't want to rely on paperless' default date guessing mechanisms.
.. code:: python
class MyCustomParser(DocumentParser):
def parse(self, document_path, mime_type):
# This method does not return anything. Rather, you should assign
# whatever you got from the document to the following fields:
# The content of the document.
self.text = "content"
# Optional: path to a PDF document that you created from the original.
self.archive_path = os.path.join(self.tempdir, "archived.pdf")
# Optional: "created" date of the document.
self.date = get_created_from_metadata(document_path)
def get_thumbnail(self, document_path, mime_type):
# This should return the path to a thumbnail you created for this
# document.
return os.path.join(self.tempdir, "thumb.png")
If you encounter any issues during parsing, raise a ``documents.parsers.ParseError``.
The ``self.tempdir`` directory is a temporary directory that is guaranteed to be empty
and removed after consumption finished. You can use that directory to store any
intermediate files and also use it to store the thumbnail / archived document.
After that, you need to announce your parser to paperless. You need to connect a
handler to the ``document_consumer_declaration`` signal. Have a look in the file
``src/paperless_tesseract/apps.py`` on how that's done. The handler is a method
that returns information about your parser:
.. code:: python
def myparser_consumer_declaration(sender, **kwargs):
return {
"parser": MyCustomParser,
"weight": 0,
"mime_types": {
"application/pdf": ".pdf",
"image/jpeg": ".jpg",
}
}
* ``parser`` is a reference to a class that extends ``DocumentParser``.
* ``weight`` is used whenever two or more parsers are able to parse a file: The parser with
the higher weight wins. This can be used to override the parsers provided by
paperless.
* ``mime_types`` is a dictionary. The keys are the mime types your parser supports and the value
is the default file extension that paperless should use when storing files and serving them for
download. We could guess that from the file extensions, but some mime types have many extensions
associated with them and the python methods responsible for guessing the extension do not always
return the same value.
.. _code of conduct: https://github.com/paperless-ngx/paperless-ngx/blob/main/CODE_OF_CONDUCT.md
.. _contributing guidelines: https://github.com/paperless-ngx/paperless-ngx/blob/main/CONTRIBUTING.md

View File

@@ -1,12 +1,117 @@
.. _faq:
**************************
Frequently asked questions
**************************
**Q:** *What's the general plan for Paperless-ngx?*
.. cssclass:: redirect-notice
**A:** While Paperless-ngx is already considered largely "feature-complete" it is a community-driven
project and development will be guided in this way. New features can be submitted via
GitHub discussions and "up-voted" by the community but this is not a garauntee the feature
will be implemented. This project will always be open to collaboration in the form of PRs,
ideas etc.
The Paperless-ngx documentation has permanently moved.
**Q:** *I'm using docker. Where are my documents?*
You will be redirected shortly...
**A:** Your documents are stored inside the docker volume ``paperless_media``.
Docker manages this volume automatically for you. It is a persistent storage
and will persist as long as you don't explicitly delete it. The actual location
depends on your host operating system. On Linux, chances are high that this location
is
.. code::
/var/lib/docker/volumes/paperless_media/_data
.. caution::
Do not mess with this folder. Don't change permissions and don't move
files around manually. This folder is meant to be entirely managed by docker
and paperless.
**Q:** *Let's say I want to switch tools in a year. Can I easily move to other systems?*
**A:** Your documents are stored as plain files inside the media folder. You can always drag those files
out of that folder to use them elsewhere. Here are a couple notes about that.
* Paperless-ngx never modifies your original documents. It keeps checksums of all documents and uses a
scheduled sanity checker to check that they remain the same.
* By default, paperless uses the internal ID of each document as its filename. This might not be very
convenient for export. However, you can adjust the way files are stored in paperless by
:ref:`configuring the filename format <advanced-file_name_handling>`.
* :ref:`The exporter <utilities-exporter>` is another easy way to get your files out of paperless with reasonable file names.
**Q:** *What file types does paperless-ngx support?*
**A:** Currently, the following files are supported:
* PDF documents, PNG images, JPEG images, TIFF images and GIF images are processed with OCR and converted into PDF documents.
* Plain text documents are supported as well and are added verbatim
to paperless.
* With the optional Tika integration enabled (see :ref:`Configuration <configuration-tika>`), Paperless also supports various
Office documents (.docx, .doc, odt, .ppt, .pptx, .odp, .xls, .xlsx, .ods).
Paperless-ngx determines the type of a file by inspecting its content. The
file extensions do not matter.
**Q:** *Will paperless-ngx run on Raspberry Pi?*
**A:** The short answer is yes. I've tested it on a Raspberry Pi 3 B.
The long answer is that certain parts of
Paperless will run very slow, such as the OCR. On Raspberry Pi,
try to OCR documents before feeding them into paperless so that paperless can
reuse the text. The web interface is a lot snappier, since it runs
in your browser and paperless has to do much less work to serve the data.
.. note::
You can adjust some of the settings so that paperless uses less processing
power. See :ref:`setup-less_powerful_devices` for details.
**Q:** *How do I install paperless-ngx on Raspberry Pi?*
**A:** Docker images are available for arm and arm64 hardware, so just follow
the docker-compose instructions. Apart from more required disk space compared to
a bare metal installation, docker comes with close to zero overhead, even on
Raspberry Pi.
If you decide to got with the bare metal route, be aware that some of the
python requirements do not have precompiled packages for ARM / ARM64. Installation
of these will require additional development libraries and compilation will take
a long time.
**Q:** *How do I run this on Unraid?*
**A:** Paperless-ngx is available as `community app <https://unraid.net/community/apps?q=paperless-ngx>`_
in Unraid. `Uli Fahrer <https://github.com/Tooa>`_ created a container template for that.
**Q:** *How do I run this on my toaster?*
**A:** I honestly don't know! As for all other devices that might be able
to run paperless, you're a bit on your own. If you can't run the docker image,
the documentation has instructions for bare metal installs. I'm running
paperless on an i3 processor from 2015 or so. This is also what I use to test
new releases with. Apart from that, I also have a Raspberry Pi, which I
occasionally build the image on and see if it works.
**Q:** *How do I proxy this with NGINX?*
**A:** See :ref:`here <setup-nginx>`.
.. _faq-mod_wsgi:
**Q:** *How do I get WebSocket support with Apache mod_wsgi*?
**A:** ``mod_wsgi`` by itself does not support ASGI. Paperless will continue
to work with WSGI, but certain features such as status notifications about
document consumption won't be available.
If you want to continue using ``mod_wsgi``, you will have to run an ASGI-enabled
web server as well that processes WebSocket connections, and configure Apache to
redirect WebSocket connections to this server. Multiple options for ASGI servers
exist:
* ``gunicorn`` with ``uvicorn`` as the worker implementation (the default of paperless)
* ``daphne`` as a standalone server, which is the reference implementation for ASGI.
* ``uvicorn`` as a standalone server

View File

@@ -2,24 +2,74 @@
Paperless
*********
Paperless is a simple Django application running in two parts:
a *Consumer* (the thing that does the indexing) and
the *Web server* (the part that lets you search &
download already-indexed documents). If you want to learn more about its
functions keep on reading after the installation section.
.. cssclass:: redirect-notice
The Paperless-ngx documentation has permanently moved.
Why This Exists
===============
You will be redirected shortly...
Paper is a nightmare. Environmental issues aside, there's no excuse for it in
the 21st century. It takes up space, collects dust, doesn't support any form
of a search feature, indexing is tedious, it's heavy and prone to damage &
loss.
I wrote this to make "going paperless" easier. I do not have to worry about
finding stuff again. I feed documents right from the post box into the scanner
and then shred them. Perhaps you might find it useful too.
Paperless-ngx
=============
Paperless-ngx is a document management system that transforms your physical
documents into a searchable online archive so you can keep, well, *less paper*.
Paperless-ngx forked from paperless-ng to continue the great work and
distribute responsibility of supporting and advancing the project among a team
of people.
NG stands for both Angular (the framework used for the
Frontend) and next-gen. Publishing this project under a different name also
avoids confusion between paperless and paperless-ngx.
If you want to learn about what's different in paperless-ngx from Paperless, check out these
resources in the documentation:
* :ref:`Some screenshots <screenshots>` of the new UI are available.
* Read :ref:`this section <advanced-automatic_matching>` if you want to
learn about how paperless automates all tagging using machine learning.
* Paperless now comes with a :ref:`proper email consumer <usage-email>`
that's fully tested and production ready.
* Paperless creates searchable PDF/A documents from whatever you you put into
the consumption directory. This means that you can select text in
image-only documents coming from your scanner.
* See :ref:`this note <utilities-encyption>` about GnuPG encryption in
paperless-ngx.
* Paperless is now integrated with a
:ref:`task processing queue <setup-task_processor>` that tells you
at a glance when and why something is not working.
* The :ref:`changelog <paperless_changelog>` contains a detailed list of all changes
in paperless-ngx.
Contents
========
.. toctree::
:maxdepth: 1
screenshots
scanners
administration
advanced_usage
usage_overview
setup
troubleshooting
changelog
configuration
extending
api
faq
setup
usage_overview
advanced_usage
administration
configuration
api
faq
troubleshooting
extending
scanners
screenshots
changelog

View File

@@ -1 +0,0 @@
myst-parser==0.18.1

View File

@@ -1,12 +1,134 @@
.. _scanners:
*******************
Scanners & Software
*******************
***********************
Scanner recommendations
***********************
As Paperless operates by watching a folder for new files, doesn't care what
scanner you use, but sometimes finding a scanner that will write to an FTP,
NFS, or SMB server can be difficult. This page is here to help you find one
that works right for you based on recommendations from other Paperless users.
.. cssclass:: redirect-notice
Physical scanners
=================
The Paperless-ngx documentation has permanently moved.
+---------+----------------+-----+-----+-----+------+----------+----------------+
| Brand | Model | Supports | Recommended By |
+---------+----------------+-----+-----+-----+------+----------+----------------+
| | | FTP | NFS | SMB | SMTP | API [1]_ | |
+=========+================+=====+=====+=====+======+==========+================+
| Brother | `ADS-1700W`_ | yes | | yes | yes | |`holzhannes`_ |
+---------+----------------+-----+-----+-----+------+----------+----------------+
| Brother | `ADS-1600W`_ | yes | | yes | yes | |`holzhannes`_ |
+---------+----------------+-----+-----+-----+------+----------+----------------+
| Brother | `ADS-1500W`_ | yes | | yes | yes | |`danielquinn`_ |
+---------+----------------+-----+-----+-----+------+----------+----------------+
| Brother | `ADS-1100W`_ | yes | | | | |`ytzelf`_ |
+---------+----------------+-----+-----+-----+------+----------+----------------+
| Brother | `ADS-2800W`_ | yes | yes | | yes | yes |`philpagel`_ |
+---------+----------------+-----+-----+-----+------+----------+----------------+
| Brother | `MFC-J6930DW`_ | yes | | | | |`ayounggun`_ |
+---------+----------------+-----+-----+-----+------+----------+----------------+
| Brother | `MFC-L5850DW`_ | yes | | | yes | |`holzhannes`_ |
+---------+----------------+-----+-----+-----+------+----------+----------------+
| Brother | `MFC-L2750DW`_ | yes | | yes | yes | |`muued`_ |
+---------+----------------+-----+-----+-----+------+----------+----------------+
| Brother | `MFC-J5910DW`_ | yes | | | | |`bmsleight`_ |
+---------+----------------+-----+-----+-----+------+----------+----------------+
| Brother | `MFC-8950DW`_ | yes | | | yes | yes |`philpagel`_ |
+---------+----------------+-----+-----+-----+------+----------+----------------+
| Brother | `MFC-9142CDN`_ | yes | | yes | | |`REOLDEV`_ |
+---------+----------------+-----+-----+-----+------+----------+----------------+
| Fujitsu | `ix500`_ | yes | | yes | | |`eonist`_ |
+---------+----------------+-----+-----+-----+------+----------+----------------+
| Epson | `ES-580W`_ | yes | | yes | yes | |`fignew`_ |
+---------+----------------+-----+-----+-----+------+----------+----------------+
| Epson | `WF-7710DWF`_ | yes | | yes | | |`Skylinar`_ |
+---------+----------------+-----+-----+-----+------+----------+----------------+
| Fujitsu | `S1300i`_ | yes | | yes | | |`jonaswinkler`_ |
+---------+----------------+-----+-----+-----+------+----------+----------------+
| Doxie | `Q2`_ | | | | | yes |`Unkn0wnCat`_ |
+---------+----------------+-----+-----+-----+------+----------+----------------+
.. _MFC-L5850DW: https://www.brother-usa.com/products/mfcl5850dw
.. _MFC-L2750DW: https://www.brother.de/drucker/laserdrucker/mfc-l2750dw
.. _ADS-1700W: https://www.brother-usa.com/products/ads1700w
.. _ADS-1600W: https://www.brother-usa.com/products/ads1600w
.. _ADS-1500W: https://www.brother.ca/en/p/ads1500w
.. _ADS-1100W: https://support.brother.com/g/b/downloadtop.aspx?c=fr&lang=fr&prod=ads1100w_eu_as_cn
.. _ADS-2800W: https://www.brother-usa.com/products/ads2800w
.. _MFC-J6930DW: https://www.brother.ca/en/p/MFCJ6930DW
.. _MFC-J5910DW: https://www.brother.co.uk/printers/inkjet-printers/mfcj5910dw
.. _MFC-8950DW: https://www.brother-usa.com/products/mfc8950dw
.. _MFC-9142CDN: https://www.brother.co.uk/printers/laser-printers/mfc9140cdn
.. _ES-580W: https://epson.com/Support/Scanners/ES-Series/Epson-WorkForce-ES-580W/s/SPT_B11B258201
.. _WF-7710DWF: https://www.epson.de/en/products/printers/inkjet-printers/for-home/workforce-wf-7710dwf
.. _ix500: http://www.fujitsu.com/us/products/computing/peripheral/scanners/scansnap/ix500/
.. _S1300i: https://www.fujitsu.com/global/products/computing/peripheral/scanners/soho/s1300i/
.. _Q2: https://www.getdoxie.com/product/doxie-q/
.. _ayounggun: https://github.com/ayounggun
.. _bmsleight: https://github.com/bmsleight
.. _danielquinn: https://github.com/danielquinn
.. _eonist: https://github.com/eonist
.. _fignew: https://github.com/fignew
.. _holzhannes: https://github.com/holzhannes
.. _jonaswinkler: https://github.com/jonaswinkler
.. _REOLDEV: https://github.com/REOLDEV
.. _Skylinar: https://github.com/Skylinar
.. _ytzelf: https://github.com/ytzelf
.. _Unkn0wnCat: https://github.com/Unkn0wnCat
.. _muued: https://github.com/muued
.. _philpagel: https://github.com/philpagel
.. [1] Scanners with API Integration allow to push scanned documents directly to :ref:`Paperless API <api-file_uploads>`, sometimes referred to as Webhook or Document POST.
Mobile phone software
=====================
You can use your phone to "scan" documents. The regular camera app will work, but may have too low contrast for OCR to work well. Apps specifically for scanning are recommended.
+-----------------------------+----------------+-----+-----+-----+-------+--------+------------------+
| Name | OS | Supports | Recommended By |
+-----------------------------+----------------+-----+-----+-----+-------+--------+------------------+
| | | FTP | NFS | SMB | Email | WebDAV | |
+=============================+================+=====+=====+=====+=======+========+==================+
| `Office Lens`_ | Android | ? | ? | ? | ? | ? | `jonaswinkler`_ |
+-----------------------------+----------------+-----+-----+-----+-------+--------+------------------+
| `Genius Scan`_ | Android | yes | no | yes | yes | yes | `hannahswain`_ |
+-----------------------------+----------------+-----+-----+-----+-------+--------+------------------+
| `OpenScan`_ | Android | no | no | no | no | no | `benjaminfrank`_ |
+-----------------------------+----------------+-----+-----+-----+-------+--------+------------------+
| `OCR Scanner - QuickScan`_ | iOS | no | no | no | no | yes | `holzhannes`_ |
+-----------------------------+----------------+-----+-----+-----+-------+--------+------------------+
On Android, you can use these applications in combination with one of the :ref:`Paperless-ngx compatible apps <usage-mobile_upload>` to "Share" the documents produced by these scanner apps with paperless. On iOS, you can share the scanned documents via iOS-Sharing to other mail, WebDav or FTP apps.
.. _Office Lens: https://play.google.com/store/apps/details?id=com.microsoft.office.officelens
.. _Genius Scan: https://play.google.com/store/apps/details?id=com.thegrizzlylabs.geniusscan.free
.. _OCR Scanner - QuickScan: https://apps.apple.com/us/app/quickscan-scanner-text-ocr/id1513790291
.. _OpenScan: https://github.com/Ethereal-Developers-Inc/OpenScan
.. _hannahswain: https://github.com/hannahswain
.. _benjaminfrank: https://github.com/benjaminfrank
API Scanning Setup
==================
This sections contains information on how to set up scanners to post directly to :ref:`Paperless API <api-file_uploads>`.
Doxie Q2
--------
This part assumes your Doxie is connected to WiFi and you know its IP.
1. Open your Doxie web UI by navigating to its IP address
2. Navigate to Options -> Webhook
3. Set the *URL* to ``https://[your-paperless-ngx-instance]/api/documents/post_document/``
4. Set the *File Parameter Name* to ``document``
5. Add the username and password to the respective fields (Consider creating a user just for your Doxie)
6. Click *Submit* at the bottom of the page
Congrats, you can now scan directly from your Doxie to your Paperless-ngx instance!
You will be redirected shortly...

View File

@@ -4,9 +4,41 @@
Screenshots
***********
This is what paperless-ngx looks like. You shouldn't use paperless to index
research papers though, its a horrible tool for that job.
.. cssclass:: redirect-notice
The dashboard shows customizable views on your document and allows document uploads:
The Paperless-ngx documentation has permanently moved.
.. image:: _static/screenshots/dashboard.png
You will be redirected shortly...
The document list provides three different styles to scroll through your documents:
.. image:: _static/screenshots/documents-table.png
.. image:: _static/screenshots/documents-smallcards.png
.. image:: _static/screenshots/documents-largecards.png
Extensive filtering mechanisms:
.. image:: _static/screenshots/documents-filter.png
Side-by-side editing of documents. Optimized for 1080p.
.. image:: _static/screenshots/editing.png
Tag editing. This looks about the same for correspondents and document types.
.. image:: _static/screenshots/new-tag.png
Searching provides auto complete and highlights the results.
.. image:: _static/screenshots/search-preview.png
.. image:: _static/screenshots/search-results.png
Fancy mail filters!
.. image:: _static/screenshots/mail-rules-edited.png
Mobile support in the future? This kinda works, however some layouts are still
too wide.
.. image:: _static/screenshots/mobile.png

View File

@@ -1,12 +1,787 @@
.. _setup:
*****
Setup
*****
Overview of Paperless-ngx
#########################
.. cssclass:: redirect-notice
Compared to paperless, paperless-ngx works a little different under the hood and has
more moving parts that work together. While this increases the complexity of
the system, it also brings many benefits.
The Paperless-ngx documentation has permanently moved.
Paperless consists of the following components:
You will be redirected shortly...
* **The webserver:** This is pretty much the same as in paperless. It serves
the administration pages, the API, and the new frontend. This is the main
tool you'll be using to interact with paperless. You may start the webserver
with
.. code:: shell-session
$ cd /path/to/paperless/src/
$ gunicorn -c ../gunicorn.conf.py paperless.wsgi
or by any other means such as Apache ``mod_wsgi``.
* **The consumer:** This is what watches your consumption folder for documents.
However, the consumer itself does not really consume your documents.
Now it notifies a task processor that a new file is ready for consumption.
I suppose it should be named differently.
This was also used to check your emails, but that's now done elsewhere as well.
Start the consumer with the management command ``document_consumer``:
.. code:: shell-session
$ cd /path/to/paperless/src/
$ python3 manage.py document_consumer
.. _setup-task_processor:
* **The task processor:** Paperless relies on `Django Q <https://django-q.readthedocs.io/en/latest/>`_
for doing most of the heavy lifting. This is a task queue that accepts tasks from
multiple sources and processes these in parallel. It also comes with a scheduler that executes
certain commands periodically.
This task processor is responsible for:
* Consuming documents. When the consumer finds new documents, it notifies the task processor to
start a consumption task.
* The task processor also performs the consumption of any documents you upload through
the web interface.
* Consuming emails. It periodically checks your configured accounts for new emails and
notifies the task processor to consume the attachment of an email.
* Maintaining the search index and the automatic matching algorithm. These are things that paperless
needs to do from time to time in order to operate properly.
This allows paperless to process multiple documents from your consumption folder in parallel! On
a modern multi core system, this makes the consumption process with full OCR blazingly fast.
The task processor comes with a built-in admin interface that you can use to check whenever any of the
tasks fail and inspect the errors (i.e., wrong email credentials, errors during consuming a specific
file, etc).
You may start the task processor by executing:
.. code:: shell-session
$ cd /path/to/paperless/src/
$ python3 manage.py qcluster
* A `redis <https://redis.io/>`_ message broker: This is a really lightweight service that is responsible
for getting the tasks from the webserver and the consumer to the task scheduler. These run in a different
process (maybe even on different machines!), and therefore, this is necessary.
* Optional: A database server. Paperless supports both PostgreSQL and SQLite for storing its data.
Installation
############
You can go multiple routes to setup and run Paperless:
* :ref:`Use the easy install docker script <setup-docker_script>`
* :ref:`Pull the image from Docker Hub <setup-docker_hub>`
* :ref:`Build the Docker image yourself <setup-docker_build>`
* :ref:`Install Paperless directly on your system manually (bare metal) <setup-bare_metal>`
The Docker routes are quick & easy. These are the recommended routes. This configures all the stuff
from the above automatically so that it just works and uses sensible defaults for all configuration options.
Here you find a cheat-sheet for docker beginners: `CLI Basics <https://www.sehn.tech/refs/devops-with-docker/>`_
The bare metal route is complicated to setup but makes it easier
should you want to contribute some code back. You need to configure and
run the above mentioned components yourself.
.. _CLI Basics: https://www.sehn.tech/refs/devops-with-docker/
.. _setup-docker_script:
Install Paperless from Docker Hub using the installation script
===============================================================
Paperless provides an interactive installation script. This script will ask you
for a couple configuration options, download and create the necessary configuration files, pull the docker image, start paperless and create your user account. This script essentially
performs all the steps described in :ref:`setup-docker_hub` automatically.
1. Make sure that docker and docker-compose are installed.
2. Download and run the installation script:
.. code:: shell-session
$ bash -c "$(curl -L https://raw.githubusercontent.com/paperless-ngx/paperless-ngx/main/install-paperless-ngx.sh)"
.. _setup-docker_hub:
Install Paperless from Docker Hub
=================================
1. Login with your user and create a folder in your home-directory `mkdir -v ~/paperless-ngx` to have a place for your configuration files and consumption directory.
2. Go to the `/docker/compose directory on the project page <https://github.com/paperless-ngx/paperless-ngx/tree/master/docker/compose>`_
and download one of the `docker-compose.*.yml` files, depending on which database backend you
want to use. Rename this file to `docker-compose.yml`.
If you want to enable optional support for Office documents, download a file with `-tika` in the file name.
Download the ``docker-compose.env`` file and the ``.env`` file as well and store them
in the same directory.
.. hint::
For new installations, it is recommended to use PostgreSQL as the database
backend.
3. Install `Docker`_ and `docker-compose`_.
.. caution::
If you want to use the included ``docker-compose.*.yml`` file, you
need to have at least Docker version **17.09.0** and docker-compose
version **1.17.0**.
To check do: `docker-compose -v` or `docker -v`
See the `Docker installation guide`_ on how to install the current
version of Docker for your operating system or Linux distribution of
choice. To get the latest version of docker-compose, follow the
`docker-compose installation guide`_ if your package repository doesn't
include it.
.. _Docker installation guide: https://docs.docker.com/engine/installation/
.. _docker-compose installation guide: https://docs.docker.com/compose/install/
4. Modify ``docker-compose.yml`` to your preferences. You may want to change the path
to the consumption directory. Find the line that specifies where
to mount the consumption directory:
.. code::
- ./consume:/usr/src/paperless/consume
Replace the part BEFORE the colon with a local directory of your choice:
.. code::
- /home/jonaswinkler/paperless-inbox:/usr/src/paperless/consume
Don't change the part after the colon or paperless wont find your documents.
You may also need to change the default port that the webserver will use
from the default (8000):
.. code::
ports:
- 8000:8000
Replace the part BEFORE the colon with a port of your choice:
.. code::
ports:
- 8010:8000
Don't change the part after the colon or edit other lines that refer to
port 8000. Modifying the part before the colon will map requests on another
port to the webserver running on the default port.
5. Modify ``docker-compose.env``, following the comments in the file. The
most important change is to set ``USERMAP_UID`` and ``USERMAP_GID``
to the uid and gid of your user on the host system. Use ``id -u`` and
``id -g`` to get these.
This ensures that
both the docker container and you on the host machine have write access
to the consumption directory. If your UID and GID on the host system is
1000 (the default for the first normal user on most systems), it will
work out of the box without any modifications. `id "username"` to check.
.. note::
You can copy any setting from the file ``paperless.conf.example`` and paste it here.
Have a look at :ref:`configuration` to see what's available.
.. caution::
Some file systems such as NFS network shares don't support file system
notifications with ``inotify``. When storing the consumption directory
on such a file system, paperless will not pick up new files
with the default configuration. You will need to use ``PAPERLESS_CONSUMER_POLLING``,
which will disable inotify. See :ref:`here <configuration-polling>`.
6. Run ``docker-compose pull``, followed by ``docker-compose up -d``.
This will pull the image, create and start the necessary containers.
7. To be able to login, you will need a super user. To create it, execute the
following command:
.. code-block:: shell-session
$ docker-compose run --rm webserver createsuperuser
This will prompt you to set a username, an optional e-mail address and
finally a password (at least 8 characters).
8. The default ``docker-compose.yml`` exports the webserver on your local port
8000. If you did not change this, you should now be able to visit your
Paperless instance at ``http://127.0.0.1:8000`` or your servers IP-Address:8000.
Use the login credentials you have created with the previous step.
.. _Docker: https://www.docker.com/
.. _docker-compose: https://docs.docker.com/compose/install/
.. _setup-docker_build:
Build the Docker image yourself
===============================
1. Clone the entire repository of paperless:
.. code:: shell-session
git clone https://github.com/paperless-ngx/paperless-ngx
The master branch always reflects the latest stable version.
2. Copy one of the ``docker/compose/docker-compose.*.yml`` to ``docker-compose.yml`` in the root folder,
depending on which database backend you want to use. Copy
``docker-compose.env`` into the project root as well.
3. In the ``docker-compose.yml`` file, find the line that instructs docker-compose to pull the paperless image from Docker Hub:
.. code:: yaml
webserver:
image: ghcr.io/paperless-ngx/paperless-ngx:latest
and replace it with a line that instructs docker-compose to build the image from the current working directory instead:
.. code:: yaml
webserver:
build: .
4. Follow steps 3 to 8 of :ref:`setup-docker_hub`. When asked to run
``docker-compose pull`` to pull the image, do
.. code:: shell-session
$ docker-compose build
instead to build the image.
.. _setup-bare_metal:
Bare Metal Route
================
Paperless runs on linux only. The following procedure has been tested on a minimal
installation of Debian/Buster, which is the current stable release at the time of
writing. Windows is not and will never be supported.
1. Install dependencies. Paperless requires the following packages.
* ``python3`` 3.8, 3.9
* ``python3-pip``
* ``python3-dev``
* ``fonts-liberation`` for generating thumbnails for plain text files
* ``imagemagick`` >= 6 for PDF conversion
* ``optipng`` for optimizing thumbnails
* ``gnupg`` for handling encrypted documents
* ``libpq-dev`` for PostgreSQL
* ``libmagic-dev`` for mime type detection
* ``mime-support`` for mime type detection
Use this list for your preferred package management:
.. code::
python3 python3-pip python3-dev imagemagick fonts-liberation optipng gnupg libpq-dev libmagic-dev mime-support
These dependencies are required for OCRmyPDF, which is used for text recognition.
* ``unpaper``
* ``ghostscript``
* ``icc-profiles-free``
* ``qpdf``
* ``liblept5``
* ``libxml2``
* ``pngquant``
* ``zlib1g``
* ``tesseract-ocr`` >= 4.0.0 for OCR
* ``tesseract-ocr`` language packs (``tesseract-ocr-eng``, ``tesseract-ocr-deu``, etc)
Use this list for your preferred package management:
.. code::
unpaper ghostscript icc-profiles-free qpdf liblept5 libxml2 pngquant zlib1g tesseract-ocr
On Raspberry Pi, these libraries are required as well:
* ``libatlas-base-dev``
* ``libxslt1-dev``
You will also need ``build-essential``, ``python3-setuptools`` and ``python3-wheel``
for installing some of the python dependencies.
2. Install ``redis`` >= 5.0 and configure it to start automatically.
3. Optional. Install ``postgresql`` and configure a database, user and password for paperless. If you do not wish
to use PostgreSQL, SQLite is available as well.
4. Get the release archive from `<https://github.com/paperless-ngx/paperless-ngx/releases>`_.
If you clone the git repo as it is, you also have to compile the front end by yourself.
Extract the archive to a place from where you wish to execute it, such as ``/opt/paperless``.
5. Configure paperless. See :ref:`configuration` for details. Edit the included ``paperless.conf`` and adjust the
settings to your needs. Required settings for getting paperless running are:
* ``PAPERLESS_REDIS`` should point to your redis server, such as redis://localhost:6379.
* ``PAPERLESS_DBHOST`` should be the hostname on which your PostgreSQL server is running. Do not configure this
to use SQLite instead. Also configure port, database name, user and password as necessary.
* ``PAPERLESS_CONSUMPTION_DIR`` should point to a folder which paperless should watch for documents. You might
want to have this somewhere else. Likewise, ``PAPERLESS_DATA_DIR`` and ``PAPERLESS_MEDIA_ROOT`` define where
paperless stores its data. If you like, you can point both to the same directory.
* ``PAPERLESS_SECRET_KEY`` should be a random sequence of characters. It's used for authentication. Failure
to do so allows third parties to forge authentication credentials.
Many more adjustments can be made to paperless, especially the OCR part. The following options are recommended
for everyone:
* Set ``PAPERLESS_OCR_LANGUAGE`` to the language most of your documents are written in.
* Set ``PAPERLESS_TIME_ZONE`` to your local time zone.
6. Create a system user under which you wish to run paperless.
.. code:: shell-session
adduser paperless --system --home /opt/paperless --group
7. Ensure that these directories exist
and that the paperless user has write permissions to the following directories:
* ``/opt/paperless/media``
* ``/opt/paperless/data``
* ``/opt/paperless/consume``
Adjust as necessary if you configured different folders.
8. Install python requirements from the ``requirements.txt`` file.
It is up to you if you wish to use a virtual environment or not. First you should update your pip, so it gets the actual packages.
.. code:: shell-session
sudo -Hu paperless pip3 install --upgrade pip
.. code:: shell-session
sudo -Hu paperless pip3 install -r requirements.txt
This will install all python dependencies in the home directory of
the new paperless user.
9. Go to ``/opt/paperless/src``, and execute the following commands:
.. code:: bash
# This creates the database schema.
sudo -Hu paperless python3 manage.py migrate
# This creates your first paperless user
sudo -Hu paperless python3 manage.py createsuperuser
10. Optional: Test that paperless is working by executing
.. code:: bash
# This collects static files from paperless and django.
sudo -Hu paperless python3 manage.py runserver
and pointing your browser to http://localhost:8000/.
.. warning::
This is a development server which should not be used in
production. It is not audited for security and performance
is inferior to production ready web servers.
.. hint::
This will not start the consumer. Paperless does this in a
separate process.
11. Setup systemd services to run paperless automatically. You may
use the service definition files included in the ``scripts`` folder
as a starting point.
Paperless needs the ``webserver`` script to run the webserver, the
``consumer`` script to watch the input folder, and the ``scheduler``
script to run tasks such as email checking and document consumption.
The ``socket`` script enables ``gunicorn`` to run on port 80 without
root privileges. For this you need to uncomment the ``Require=paperless-webserver.socket``
in the ``webserver`` script and configure ``gunicorn`` to listen on port 80 (see ``paperless/gunicorn.conf.py``).
You may need to adjust the path to the ``gunicorn`` executable. This
will be installed as part of the python dependencies, and is either located
in the ``bin`` folder of your virtual environment, or in ``~/.local/bin/`` if
no virtual environment is used.
These services rely on redis and optionally the database server, but
don't need to be started in any particular order. The example files
depend on redis being started. If you use a database server, you should
add additional dependencies.
.. caution::
The included scripts run a ``gunicorn`` standalone server,
which is fine for running paperless. It does support SSL,
however, the documentation of GUnicorn states that you should
use a proxy server in front of gunicorn instead.
For instructions on how to use nginx for that,
:ref:`see the instructions below <setup-nginx>`.
12. Optional: Install a samba server and make the consumption folder
available as a network share.
13. Configure ImageMagick to allow processing of PDF documents. Most distributions have
this disabled by default, since PDF documents can contain malware. If
you don't do this, paperless will fall back to ghostscript for certain steps
such as thumbnail generation.
Edit ``/etc/ImageMagick-6/policy.xml`` and adjust
.. code::
<policy domain="coder" rights="none" pattern="PDF" />
to
.. code::
<policy domain="coder" rights="read|write" pattern="PDF" />
14. Optional: Install the `jbig2enc <https://ocrmypdf.readthedocs.io/en/latest/jbig2.html>`_
encoder. This will reduce the size of generated PDF documents. You'll most likely need
to compile this by yourself, because this software has been patented until around 2017 and
binary packages are not available for most distributions.
Migrating to Paperless-ngx
##########################
Migration is possible both from Paperless-ng or directly from the 'original' Paperless.
Migrating from Paperless-ng
===========================
Paperless-ngx is meant to be a drop-in replacement for Paperless-ng and thus upgrading should be
trivial for most users, especially when using docker. However, as with any major change, it is
recommended to take a full backup first. Once you are ready, simply change the docker image to
point to the new source. E.g. if using Docker Compose, edit ``docker-compose.yml`` and change:
.. code::
image: jonaswinkler/paperless-ng:latest
to
.. code::
image: ghcr.io/paperless-ngx/paperless-ngx:latest
and then run ``docker-compose up -d`` which will pull the new image recreate the container.
That's it!
Users who installed with the bare-metal route should also update their Git clone to point to
``https://github.com/paperless-ngx/paperless-ngx``, e.g. using the command
``git remote set-url origin https://github.com/paperless-ngx/paperless-ngx`` and then pull the
lastest version.
Migrating from Paperless
========================
At its core, paperless-ngx is still paperless and fully compatible. However, some
things have changed under the hood, so you need to adapt your setup depending on
how you installed paperless.
This setup describes how to update an existing paperless Docker installation.
The important things to keep in mind are as follows:
* Read the :ref:`changelog <paperless_changelog>` and take note of breaking changes.
* You should decide if you want to stick with SQLite or want to migrate your database
to PostgreSQL. See :ref:`setup-sqlite_to_psql` for details on how to move your data from
SQLite to PostgreSQL. Both work fine with paperless. However, if you already have a
database server running for other services, you might as well use it for paperless as well.
* The task scheduler of paperless, which is used to execute periodic tasks
such as email checking and maintenance, requires a `redis`_ message broker
instance. The docker-compose route takes care of that.
* The layout of the folder structure for your documents and data remains the
same, so you can just plug your old docker volumes into paperless-ngx and
expect it to find everything where it should be.
Migration to paperless-ngx is then performed in a few simple steps:
1. Stop paperless.
.. code:: bash
$ cd /path/to/current/paperless
$ docker-compose down
2. Do a backup for two purposes: If something goes wrong, you still have your
data. Second, if you don't like paperless-ngx, you can switch back to
paperless.
3. Download the latest release of paperless-ngx. You can either go with the
docker-compose files from `here <https://github.com/paperless-ngx/paperless-ngx/tree/master/docker/compose>`__
or clone the repository to build the image yourself (see :ref:`above <setup-docker_build>`).
You can either replace your current paperless folder or put paperless-ngx
in a different location.
.. caution::
Paperless-ngx includes a ``.env`` file. This will set the
project name for docker compose to ``paperless``, which will also define the name
of the volumes by paperless-ngx. However, if you experience that paperless-ngx
is not using your old paperless volumes, verify the names of your volumes with
.. code:: shell-session
$ docker volume ls | grep _data
and adjust the project name in the ``.env`` file so that it matches the name
of the volumes before the ``_data`` part.
4. Download the ``docker-compose.sqlite.yml`` file to ``docker-compose.yml``.
If you want to switch to PostgreSQL, do that after you migrated your existing
SQLite database.
5. Adjust ``docker-compose.yml`` and ``docker-compose.env`` to your needs.
See :ref:`setup-docker_hub` for details on which edits are advised.
6. :ref:`Update paperless. <administration-updating>`
7. In order to find your existing documents with the new search feature, you need
to invoke a one-time operation that will create the search index:
.. code:: shell-session
$ docker-compose run --rm webserver document_index reindex
This will migrate your database and create the search index. After that,
paperless will take care of maintaining the index by itself.
8. Start paperless-ngx.
.. code:: bash
$ docker-compose up -d
This will run paperless in the background and automatically start it on system boot.
9. Paperless installed a permanent redirect to ``admin/`` in your browser. This
redirect is still in place and prevents access to the new UI. Clear your
browsing cache in order to fix this.
10. Optionally, follow the instructions below to migrate your existing data to PostgreSQL.
.. _setup-sqlite_to_psql:
Moving data from SQLite to PostgreSQL
=====================================
Moving your data from SQLite to PostgreSQL is done via executing a series of django
management commands as below.
.. caution::
Make sure that your SQLite database is migrated to the latest version.
Starting paperless will make sure that this is the case. If your try to
load data from an old database schema in SQLite into a newer database
schema in PostgreSQL, you will run into trouble.
.. warning::
On some database fields, PostgreSQL enforces predefined limits on maximum
length, whereas SQLite does not. The fields in question are the title of documents
(128 characters), names of document types, tags and correspondents (128 characters),
and filenames (1024 characters). If you have data in these fields that surpasses these
limits, migration to PostgreSQL is not possible and will fail with an error.
1. Stop paperless, if it is running.
2. Tell paperless to use PostgreSQL:
a) With docker, copy the provided ``docker-compose.postgres.yml`` file to
``docker-compose.yml``. Remember to adjust the consumption directory,
if necessary.
b) Without docker, configure the database in your ``paperless.conf`` file.
See :ref:`configuration` for details.
3. Open a shell and initialize the database:
a) With docker, run the following command to open a shell within the paperless
container:
.. code:: shell-session
$ cd /path/to/paperless
$ docker-compose run --rm webserver /bin/bash
This will launch the container and initialize the PostgreSQL database.
b) Without docker, remember to activate any virtual environment, switch to
the ``src`` directory and create the database schema:
.. code:: shell-session
$ cd /path/to/paperless/src
$ python3 manage.py migrate
This will not copy any data yet.
4. Dump your data from SQLite:
.. code:: shell-session
$ python3 manage.py dumpdata --database=sqlite --exclude=contenttypes --exclude=auth.Permission > data.json
5. Load your data into PostgreSQL:
.. code:: shell-session
$ python3 manage.py loaddata data.json
6. If operating inside Docker, you may exit the shell now.
.. code:: shell-session
$ exit
7. Start paperless.
Moving back to Paperless
========================
Lets say you migrated to Paperless-ngx and used it for a while, but decided that
you don't like it and want to move back (If you do, send me a mail about what
part you didn't like!), you can totally do that with a few simple steps.
Paperless-ngx modified the database schema slightly, however, these changes can
be reverted while keeping your current data, so that your current data will
be compatible with original Paperless.
Execute this:
.. code:: shell-session
$ cd /path/to/paperless
$ docker-compose run --rm webserver migrate documents 0023
Or without docker:
.. code:: shell-session
$ cd /path/to/paperless/src
$ python3 manage.py migrate documents 0023
After that, you need to clear your cookies (Paperless-ngx comes with updated
dependencies that do cookie-processing differently) and probably your cache
as well.
.. _setup-less_powerful_devices:
Considerations for less powerful devices
########################################
Paperless runs on Raspberry Pi. However, some things are rather slow on the Pi and
configuring some options in paperless can help improve performance immensely:
* Stick with SQLite to save some resources.
* Consider setting ``PAPERLESS_OCR_PAGES`` to 1, so that paperless will only OCR
the first page of your documents. In most cases, this page contains enough
information to be able to find it.
* ``PAPERLESS_TASK_WORKERS`` and ``PAPERLESS_THREADS_PER_WORKER`` are configured
to use all cores. The Raspberry Pi models 3 and up have 4 cores, meaning that
paperless will use 2 workers and 2 threads per worker. This may result in
sluggish response times during consumption, so you might want to lower these
settings (example: 2 workers and 1 thread to always have some computing power
left for other tasks).
* Keep ``PAPERLESS_OCR_MODE`` at its default value ``skip`` and consider OCR'ing
your documents before feeding them into paperless. Some scanners are able to
do this! You might want to even specify ``skip_noarchive`` to skip archive
file generation for already ocr'ed documents entirely.
* If you want to perform OCR on the device, consider using ``PAPERLESS_OCR_CLEAN=none``.
This will speed up OCR times and use less memory at the expense of slightly worse
OCR results.
* Set ``PAPERLESS_OPTIMIZE_THUMBNAILS`` to 'false' if you want faster consumption
times. Thumbnails will be about 20% larger.
* If using docker, consider setting ``PAPERLESS_WEBSERVER_WORKERS`` to
1. This will save some memory.
* Use the arm compatible docker-compose if you're wanting to use Tika on something like
a raspberry pi. The official apache/tika image does not support the arm architecture.
For details, refer to :ref:`configuration`.
.. note::
Updating the :ref:`automatic matching algorithm <advanced-automatic_matching>`
takes quite a bit of time. However, the update mechanism checks if your
data has changed before doing the heavy lifting. If you experience the
algorithm taking too much cpu time, consider changing the schedule in the
admin interface to daily. You can also manually invoke the task
by changing the date and time of the next run to today/now.
The actual matching of the algorithm is fast and works on Raspberry Pi as
well as on any other device.
.. _redis: https://redis.io/
.. _setup-nginx:
Using nginx as a reverse proxy
##############################
If you want to expose paperless to the internet, you should hide it behind a
reverse proxy with SSL enabled.
In addition to the usual configuration for SSL,
the following configuration is required for paperless to operate:
.. code:: nginx
http {
# Adjust as required. This is the maximum size for file uploads.
# The default value 1M might be a little too small.
client_max_body_size 10M;
server {
location / {
# Adjust host and port as required.
proxy_pass http://localhost:8000/;
# These configuration options are required for WebSockets to work.
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
proxy_redirect off;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Host $server_name;
}
}
}
Also read `this <https://channels.readthedocs.io/en/stable/deploying.html#nginx-supervisor-ubuntu>`__, towards the end of the section.

View File

@@ -1,12 +1,237 @@
.. _troubleshooting:
***************
Troubleshooting
***************
No files are added by the consumer
##################################
.. cssclass:: redirect-notice
Check for the following issues:
The Paperless-ngx documentation has permanently moved.
* Ensure that the directory you're putting your documents in is the folder
paperless is watching. With docker, this setting is performed in the
``docker-compose.yml`` file. Without docker, look at the ``CONSUMPTION_DIR``
setting. Don't adjust this setting if you're using docker.
* Ensure that redis is up and running. Paperless does its task processing
asynchronously, and for documents to arrive at the task processor, it needs
redis to run.
* Ensure that the task processor is running. Docker does this automatically.
Manually invoke the task processor by executing
You will be redirected shortly...
.. code:: shell-session
$ python3 manage.py qcluster
* Look at the output of paperless and inspect it for any errors.
* Go to the admin interface, and check if there are failed tasks. If so, the
tasks will contain an error message.
Consumer warns ``OCR for XX failed``
####################################
If you find the OCR accuracy to be too low, and/or the document consumer warns
that ``OCR for XX failed, but we're going to stick with what we've got since
FORGIVING_OCR is enabled``, then you might need to install the
`Tesseract language files <http://packages.ubuntu.com/search?keywords=tesseract-ocr>`_
marching your document's languages.
As an example, if you are running Paperless-ngx from any Ubuntu or Debian
box, and your documents are written in Spanish you may need to run::
apt-get install -y tesseract-ocr-spa
Consumer fails to pickup any new files
######################################
If you notice that the consumer will only pickup files in the consumption
directory at startup, but won't find any other files added later, you will need to
enable filesystem polling with the configuration option
``PAPERLESS_CONSUMER_POLLING``, see :ref:`here <configuration-polling>`.
This will disable listening to filesystem changes with inotify and paperless will
manually check the consumption directory for changes instead.
Paperless always redirects to /admin
####################################
You probably had the old paperless installed at some point. Paperless installed
a permanent redirect to /admin in your browser, and you need to clear your
browsing data / cache to fix that.
Operation not permitted
#######################
You might see errors such as:
.. code:: shell-session
chown: changing ownership of '../export': Operation not permitted
The container tries to set file ownership on the listed directories. This is
required so that the user running paperless inside docker has write permissions
to these folders. This happens when pointing these directories to NFS shares,
for example.
Ensure that ``chown`` is possible on these directories.
Classifier error: No training data available
############################################
This indicates that the Auto matching algorithm found no documents to learn from.
This may have two reasons:
* You don't use the Auto matching algorithm: The error can be safely ignored in this case.
* You are using the Auto matching algorithm: The classifier explicitly excludes documents
with Inbox tags. Verify that there are documents in your archive without inbox tags.
The algorithm will only learn from documents not in your inbox.
UserWarning in sklearn on every single document
###############################################
You may encounter warnings like this:
.. code::
/usr/local/lib/python3.7/site-packages/sklearn/base.py:315:
UserWarning: Trying to unpickle estimator CountVectorizer from version 0.23.2 when using version 0.24.0.
This might lead to breaking code or invalid results. Use at your own risk.
This happens when certain dependencies of paperless that are responsible for the auto matching algorithm are
updated. After updating these, your current training data *might* not be compatible anymore. This can be ignored
in most cases. This warning will disappear automatically when paperless updates the training data.
If you want to get rid of the warning or actually experience issues with automatic matching, delete
the file ``classification_model.pickle`` in the data directory and let paperless recreate it.
504 Server Error: Gateway Timeout when adding Office documents
##############################################################
You may experience these errors when using the optional TIKA integration:
.. code::
requests.exceptions.HTTPError: 504 Server Error: Gateway Timeout for url: http://gotenberg:3000/forms/libreoffice/convert
Gotenberg is a server that converts Office documents into PDF documents and has a default timeout of 30 seconds.
When conversion takes longer, Gotenberg raises this error.
You can increase the timeout by configuring a command flag for Gotenberg (see also `here <https://gotenberg.dev/docs/modules/api#properties>`__).
If using docker-compose, this is achieved by the following configuration change in the ``docker-compose.yml`` file:
.. code:: yaml
gotenberg:
image: gotenberg/gotenberg:7
restart: unless-stopped
command:
- "gotenberg"
- "--chromium-disable-routes=true"
- "--api-timeout=60"
Permission denied errors in the consumption directory
#####################################################
You might encounter errors such as:
.. code:: shell-session
The following error occured while consuming document.pdf: [Errno 13] Permission denied: '/usr/src/paperless/src/../consume/document.pdf'
This happens when paperless does not have permission to delete files inside the consumption directory.
Ensure that ``USERMAP_UID`` and ``USERMAP_GID`` are set to the user id and group id you use on the host operating system, if these are
different from ``1000``. See :ref:`setup-docker_hub`.
Also ensure that you are able to read and write to the consumption directory on the host.
OSError: [Errno 19] No such device when consuming files
#######################################################
If you experience errors such as:
.. code:: shell-session
File "/usr/local/lib/python3.7/site-packages/whoosh/codec/base.py", line 570, in open_compound_file
return CompoundStorage(dbfile, use_mmap=storage.supports_mmap)
File "/usr/local/lib/python3.7/site-packages/whoosh/filedb/compound.py", line 75, in __init__
self._source = mmap.mmap(fileno, 0, access=mmap.ACCESS_READ)
OSError: [Errno 19] No such device
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/django_q/cluster.py", line 436, in worker
res = f(*task["args"], **task["kwargs"])
File "/usr/src/paperless/src/documents/tasks.py", line 73, in consume_file
override_tag_ids=override_tag_ids)
File "/usr/src/paperless/src/documents/consumer.py", line 271, in try_consume_file
raise ConsumerError(e)
Paperless uses a search index to provide better and faster full text searching. This search index is stored inside
the ``data`` folder. The search index uses memory-mapped files (mmap). The above error indicates that paperless
was unable to create and open these files.
This happens when you're trying to store the data directory on certain file systems (mostly network shares)
that don't support memory-mapped files.
Web-UI stuck at "Loading..."
############################
This might have multiple reasons.
1. If you built the docker image yourself or deployed using the bare metal route,
make sure that there are files in ``<paperless-root>/static/frontend/<lang-code>/``.
If there are no files, make sure that you executed ``collectstatic`` successfully, either
manually or as part of the docker image build.
If the front end is still missing, make sure that the front end is compiled (files present in
``src/documents/static/frontend``). If it is not, you need to compile the front end yourself
or download the release archive instead of cloning the repository.
2. Check the output of the web server. You might see errors like this:
.. code::
[2021-01-25 10:08:04 +0000] [40] [ERROR] Socket error processing request.
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/gunicorn/workers/sync.py", line 134, in handle
self.handle_request(listener, req, client, addr)
File "/usr/local/lib/python3.7/site-packages/gunicorn/workers/sync.py", line 190, in handle_request
util.reraise(*sys.exc_info())
File "/usr/local/lib/python3.7/site-packages/gunicorn/util.py", line 625, in reraise
raise value
File "/usr/local/lib/python3.7/site-packages/gunicorn/workers/sync.py", line 178, in handle_request
resp.write_file(respiter)
File "/usr/local/lib/python3.7/site-packages/gunicorn/http/wsgi.py", line 396, in write_file
if not self.sendfile(respiter):
File "/usr/local/lib/python3.7/site-packages/gunicorn/http/wsgi.py", line 386, in sendfile
sent += os.sendfile(sockno, fileno, offset + sent, count)
OSError: [Errno 22] Invalid argument
To fix this issue, add
.. code::
SENDFILE=0
to your `docker-compose.env` file.
Error while reading metadata
############################
You might find messages like these in your log files:
.. code::
[WARNING] [paperless.parsing.tesseract] Error while reading metadata
This indicates that paperless failed to read PDF metadata from one of your documents. This happens when you
open the affected documents in paperless for editing. Paperless will continue to work, and will simply not
show the invalid metadata.

View File

@@ -1,12 +1,406 @@
.. _usage_overview:
**************
Usage Overview
**************
Paperless is an application that manages your personal documents. With
the help of a document scanner (see :ref:`scanners`), paperless transforms
your wieldy physical document binders into a searchable archive and
provides many utilities for finding and managing your documents.
.. cssclass:: redirect-notice
The Paperless-ngx documentation has permanently moved.
Terms and definitions
#####################
You will be redirected shortly...
Paperless essentially consists of two different parts for managing your
documents:
* The *consumer* watches a specified folder and adds all documents in that
folder to paperless.
* The *web server* provides a UI that you use to manage and search for your
scanned documents.
Each document has a couple of fields that you can assign to them:
* A *Document* is a piece of paper that sometimes contains valuable
information.
* The *correspondent* of a document is the person, institution or company that
a document either originates from, or is sent to.
* A *tag* is a label that you can assign to documents. Think of labels as more
powerful folders: Multiple documents can be grouped together with a single
tag, however, a single document can also have multiple tags. This is not
possible with folders. The reason folders are not implemented in paperless
is simply that tags are much more versatile than folders.
* A *document type* is used to demarcate the type of a document such as letter,
bank statement, invoice, contract, etc. It is used to identify what a document
is about.
* The *date added* of a document is the date the document was scanned into
paperless. You cannot and should not change this date.
* The *date created* of a document is the date the document was initially issued.
This can be the date you bought a product, the date you signed a contract, or
the date a letter was sent to you.
* The *archive serial number* (short: ASN) of a document is the identifier of
the document in your physical document binders. See
:ref:`usage-recommended_workflow` below.
* The *content* of a document is the text that was OCR'ed from the document.
This text is fed into the search engine and is used for matching tags,
correspondents and document types.
Frontend overview
#################
.. warning::
TBD. Add some fancy screenshots!
Adding documents to paperless
#############################
Once you've got Paperless setup, you need to start feeding documents into it.
When adding documents to paperless, it will perform the following operations on
your documents:
1. OCR the document, if it has no text. Digital documents usually have text,
and this step will be skipped for those documents.
2. Paperless will create an archiveable PDF/A document from your document.
If this document is coming from your scanner, it will have embedded selectable text.
3. Paperless performs automatic matching of tags, correspondents and types on the
document before storing it in the database.
.. hint::
This process can be configured to fit your needs. If you don't want paperless
to create archived versions for digital documents, you can configure that by
configuring ``PAPERLESS_OCR_MODE=skip_noarchive``. Please read the
:ref:`relevant section in the documentation <configuration-ocr>`.
.. note::
No matter which options you choose, Paperless will always store the original
document that it found in the consumption directory or in the mail and
will never overwrite that document. Archived versions are stored alongside the
original versions.
The consumption directory
=========================
The primary method of getting documents into your database is by putting them in
the consumption directory. The consumer runs in an infinite loop, looking for new
additions to this directory. When it finds them, the consumer goes about the process
of parsing them with the OCR, indexing what it finds, and storing it in the media directory.
Getting stuff into this directory is up to you. If you're running Paperless
on your local computer, you might just want to drag and drop files there, but if
you're running this on a server and want your scanner to automatically push
files to this directory, you'll need to setup some sort of service to accept the
files from the scanner. Typically, you're looking at an FTP server like
`Proftpd`_ or a Windows folder share with `Samba`_.
.. _Proftpd: http://www.proftpd.org/
.. _Samba: http://www.samba.org/
.. TODO: hyperref to configuration of the location of this magic folder.
Dashboard upload
================
The dashboard has a file drop field to upload documents to paperless. Simply drag a file
onto this field or select a file with the file dialog. Multiple files are supported.
.. _usage-mobile_upload:
Mobile upload
=============
The mobile app over at `<https://github.com/qcasey/paperless_share>`_ allows Android users
to share any documents with paperless. This can be combined with any of the mobile
scanning apps out there, such as Office Lens.
Furthermore, there is the `Paperless App <https://github.com/bauerj/paperless_app>`_ as well,
which not only has document upload, but also document browsing and download features.
.. _usage-email:
IMAP (Email)
============
You can tell paperless-ngx to consume documents from your email accounts.
This is a very flexible and powerful feature, if you regularly received documents
via mail that you need to archive. The mail consumer can be configured by using the
admin interface in the following manner:
1. Define e-mail accounts.
2. Define mail rules for your account.
These rules perform the following:
1. Connect to the mail server.
2. Fetch all matching mails (as defined by folder, maximum age and the filters)
3. Check if there are any consumable attachments.
4. If so, instruct paperless to consume the attachments and optionally
use the metadata provided in the rule for the new document.
5. If documents were consumed from a mail, the rule action is performed
on that mail.
Paperless will completely ignore mails that do not match your filters. It will also
only perform the action on mails that it has consumed documents from.
The actions all ensure that the same mail is not consumed twice by different means.
These are as follows:
* **Delete:** Immediately deletes mail that paperless has consumed documents from.
Use with caution.
* **Mark as read:** Mark consumed mail as read. Paperless will not consume documents
from already read mails. If you read a mail before paperless sees it, it will be
ignored.
* **Flag:** Sets the 'important' flag on mails with consumed documents. Paperless
will not consume flagged mails.
* **Move to folder:** Moves consumed mails out of the way so that paperless wont
consume them again.
.. caution::
The mail consumer will perform these actions on all mails it has consumed
documents from. Keep in mind that the actual consumption process may fail
for some reason, leaving you with missing documents in paperless.
.. note::
With the correct set of rules, you can completely automate your email documents.
Create rules for every correspondent you receive digital documents from and
paperless will read them automatically. The default action "mark as read" is
pretty tame and will not cause any damage or data loss whatsoever.
You can also setup a special folder in your mail account for paperless and use
your favorite mail client to move to be consumed mails into that folder
automatically or manually and tell paperless to move them to yet another folder
after consumption. It's up to you.
.. note::
Paperless will process the rules in the order defined in the admin page.
You can define catch-all rules and have them executed last to consume
any documents not matched by previous rules. Such a rule may assign an "Unknown
mail document" tag to consumed documents so you can inspect them further.
Paperless is set up to check your mails every 10 minutes. This can be configured on the
'Scheduled tasks' page in the admin.
REST API
========
You can also submit a document using the REST API, see :ref:`api-file_uploads` for details.
.. _basic-searching:
Best practices
##############
Paperless offers a couple tools that help you organize your document collection. However,
it is up to you to use them in a way that helps you organize documents and find specific
documents when you need them. This section offers a couple ideas for managing your collection.
Document types allow you to classify documents according to what they are. You can define
types such as "Receipt", "Invoice", or "Contract". If you used to collect all your receipts
in a single binder, you can recreate that system in paperless by defining a document type,
assigning documents to that type and then filtering by that type to only see all receipts.
Not all documents need document types. Sometimes its hard to determine what the type of a
document is or it is hard to justify creating a document type that you only need once or twice.
This is okay. As long as the types you define help you organize your collection in the way
you want, paperless is doing its job.
Tags can be used in many different ways. Think of tags are more versatile folders or binders.
If you have a binder for documents related to university / your car or health care, you can
create these binders in paperless by creating tags and assigning them to relevant documents.
Just as with documents, you can filter the document list by tags and only see documents of
a certain topic.
With physical documents, you'll often need to decide which folder the document belongs to.
The advantage of tags over folders and binders is that a single document can have multiple
tags. A physical document cannot magically appear in two different folders, but with tags,
this is entirely possible.
.. hint::
This can be used in many different ways. One example: Imagine you're working on a particular
task, such as signing up for university. Usually you'll need to collect a bunch of different
documents that are already sorted into various folders. With the tag system of paperless,
you can create a new group of documents that are relevant to this task without destroying
the already existing organization. When you're done with the task, you could delete the
tag again, which would be equal to sorting documents back into the folder they belong into.
Or keep the tag, up to you.
All of the logic above applies to correspondents as well. Attach them to documents if you
feel that they help you organize your collection.
When you've started organizing your documents, create a couple saved views for document collections
you regularly access. This is equal to having labeled physical binders on your desk, except
that these saved views are dynamic and simply update themselves as you add documents to the system.
Here are a couple examples of tags and types that you could use in your collection.
* An ``inbox`` tag for newly added documents that you haven't manually edited yet.
* A tag ``car`` for everything car related (repairs, registration, insurance, etc)
* A tag ``todo`` for documents that you still need to do something with, such as reply, or
perform some task online.
* A tag ``bank account x`` for all bank statement related to that account.
* A tag ``mail`` for anything that you added to paperless via its mail processing capabilities.
* A tag ``missing_metadata`` when you still need to add some metadata to a document, but can't
or don't want to do this right now.
.. _basic-usage_searching:
Searching
#########
Paperless offers an extensive searching mechanism that is designed to allow you to quickly
find a document you're looking for (for example, that thing that just broke and you bought
a couple months ago, that contract you signed 8 years ago).
When you search paperless for a document, it tries to match this query against your documents.
Paperless will look for matching documents by inspecting their content, title, correspondent,
type and tags. Paperless returns a scored list of results, so that documents matching your query
better will appear further up in the search results.
By default, paperless returns only documents which contain all words typed in the search bar.
However, paperless also offers advanced search syntax if you want to drill down the results
further.
Matching documents with logical expressions:
.. code::
shopname AND (product1 OR product2)
Matching specific tags, correspondents or types:
.. code::
type:invoice tag:unpaid
correspondent:university certificate
Matching dates:
.. code::
created:[2005 to 2009]
added:yesterday
modified:today
Matching inexact words:
.. code::
produ*name
.. note::
Inexact terms are hard for search indexes. These queries might take a while to execute. That's why paperless offers
auto complete and query correction.
All of these constructs can be combined as you see fit.
If you want to learn more about the query language used by paperless, paperless uses Whoosh's default query language.
Head over to `Whoosh query language <https://whoosh.readthedocs.io/en/latest/querylang.html>`_.
For details on what date parsing utilities are available, see
`Date parsing <https://whoosh.readthedocs.io/en/latest/dates.html#parsing-date-queries>`_.
.. _usage-recommended_workflow:
The recommended workflow
########################
Once you have familiarized yourself with paperless and are ready to use it
for all your documents, the recommended workflow for managing your documents
is as follows. This workflow also takes into account that some documents
have to be kept in physical form, but still ensures that you get all the
advantages for these documents as well.
The following diagram shows how easy it is to manage your documents.
.. image:: _static/recommended_workflow.png
Preparations in paperless
=========================
* Create an inbox tag that gets assigned to all new documents.
* Create a TODO tag.
Processing of the physical documents
====================================
Keep a physical inbox. Whenever you receive a document that you need to
archive, put it into your inbox. Regularly, do the following for all documents
in your inbox:
1. For each document, decide if you need to keep the document in physical
form. This applies to certain important documents, such as contracts and
certificates.
2. If you need to keep the document, write a running number on the document
before scanning, starting at one and counting upwards. This is the archive
serial number, or ASN in short.
3. Scan the document.
4. If the document has an ASN assigned, store it in a *single* binder, sorted
by ASN. Don't order this binder in any other way.
5. If the document has no ASN, throw it away. Yay!
Over time, you will notice that your physical binder will fill up. If it is
full, label the binder with the range of ASNs in this binder (i.e., "Documents
1 to 343"), store the binder in your cellar or elsewhere, and start a new
binder.
The idea behind this process is that you will never have to use the physical
binders to find a document. If you need a specific physical document, you
may find this document by:
1. Searching in paperless for the document.
2. Identify the ASN of the document, since it appears on the scan.
3. Grab the relevant document binder and get the document. This is easy since
they are sorted by ASN.
Processing of documents in paperless
====================================
Once you have scanned in a document, proceed in paperless as follows.
1. If the document has an ASN, assign the ASN to the document.
2. Assign a correspondent to the document (i.e., your employer, bank, etc)
This isn't strictly necessary but helps in finding a document when you need
it.
3. Assign a document type (i.e., invoice, bank statement, etc) to the document
This isn't strictly necessary but helps in finding a document when you need
it.
4. Assign a proper title to the document (the name of an item you bought, the
subject of the letter, etc)
5. Check that the date of the document is correct. Paperless tries to read
the date from the content of the document, but this fails sometimes if the
OCR is bad or multiple dates appear on the document.
6. Remove inbox tags from the documents.
.. hint::
You can setup manual matching rules for your correspondents and tags and
paperless will assign them automatically. After consuming a couple documents,
you can even ask paperless to *learn* when to assign tags and correspondents
by itself. For details on this feature, see :ref:`advanced-matching`.
Task management
===============
Some documents require attention and require you to act on the document. You
may take two different approaches to handle these documents based on how
regularly you intend to scan documents and use paperless.
* If you scan and process your documents in paperless regularly, assign a
TODO tag to all scanned documents that you need to process. Create a saved
view on the dashboard that shows all documents with this tag.
* If you do not scan documents regularly and use paperless solely for archiving,
create a physical todo box next to your physical inbox and put documents you
need to process in the TODO box. When you performed the task associated with
the document, move it to the inbox.

View File

@@ -1,17 +1,9 @@
import os
# See https://docs.gunicorn.org/en/stable/settings.html for
# explanations of settings
bind = f'{os.getenv("PAPERLESS_BIND_ADDR", "[::]")}:{os.getenv("PAPERLESS_PORT", 8000)}'
workers = int(os.getenv("PAPERLESS_WEBSERVER_WORKERS", 1))
bind = f'0.0.0.0:{os.getenv("PAPERLESS_PORT", 8000)}'
workers = int(os.getenv("PAPERLESS_WEBSERVER_WORKERS", 2))
worker_class = "paperless.workers.ConfigurableWorker"
timeout = 120
preload_app = True
# https://docs.gunicorn.org/en/stable/faq.html#blocking-os-fchmod
worker_tmp_dir = "/dev/shm"
def pre_fork(server, worker):
@@ -32,7 +24,7 @@ def worker_int(worker):
## get traceback info
import threading, sys, traceback
id2name = {th.ident: th.name for th in threading.enumerate()}
id2name = dict([(th.ident, th.name) for th in threading.enumerate()])
code = []
for threadId, stack in sys._current_frames().items():
code.append("\n# Thread: %s(%d)" % (id2name.get(threadId, ""), threadId))

View File

@@ -1,4 +1,4 @@
#!/usr/bin/env bash
#!/bin/bash
ask() {
while true ; do
@@ -47,29 +47,24 @@ if [[ $(id -u) == "0" ]] ; then
exit 1
fi
if ! command -v wget &> /dev/null ; then
if [[ -z $(which wget) ]] ; then
echo "wget executable not found. Is wget installed?"
exit 1
fi
if ! command -v docker &> /dev/null ; then
if [[ -z $(which docker) ]] ; then
echo "docker executable not found. Is docker installed?"
exit 1
fi
DOCKER_COMPOSE_CMD="docker-compose"
if ! command -v ${DOCKER_COMPOSE_CMD} ; then
if docker compose version &> /dev/null ; then
DOCKER_COMPOSE_CMD="docker compose"
else
echo "docker-compose executable not found. Is docker-compose installed?"
exit 1
fi
if [[ -z $(which docker-compose) ]] ; then
echo "docker-compose executable not found. Is docker-compose installed?"
exit 1
fi
# Check if user has permissions to run Docker by trying to get the status of Docker (docker status).
# If this fails, the user probably does not have permissions for Docker.
if ! docker stats --no-stream &> /dev/null ; then
if [ ! "$(docker stats --no-stream 2>/dev/null 1>&2)" ] ; then
echo ""
echo "WARN: It look like the current user does not have Docker permissions."
echo "WARN: Use 'sudo usermod -aG docker $USER' to assign Docker permissions to the user."
@@ -92,14 +87,6 @@ echo ""
echo "1. Application configuration"
echo "============================"
echo ""
echo "The URL paperless will be available at. This is required if the"
echo "installation will be accessible via the web, otherwise can be left blank."
echo ""
ask "URL" ""
URL=$ask_result
echo ""
echo "The port on which the paperless webserver will listen for incoming"
echo "connections."
@@ -118,12 +105,12 @@ ask "Current time zone" "$default_time_zone"
TIME_ZONE=$ask_result
echo ""
echo "Database backend: PostgreSQL, MariaDB, and SQLite are available. Use PostgreSQL"
echo "Database backend: PostgreSQL and SQLite are available. Use PostgreSQL"
echo "if unsure. If you're running on a low-power device such as Raspberry"
echo "Pi, use SQLite to save resources."
echo ""
ask "Database backend" "postgres" "postgres sqlite mariadb"
ask "Database backend" "postgres" "postgres sqlite"
DATABASE_BACKEND=$ask_result
echo ""
@@ -214,9 +201,9 @@ echo ""
ask_docker_folder "Data folder" ""
DATA_FOLDER=$ask_result
if [[ "$DATABASE_BACKEND" == "postgres" || "$DATABASE_BACKEND" == "mariadb" ]] ; then
if [[ "$DATABASE_BACKEND" == "postgres" ]] ; then
echo ""
echo "The database folder, where your database stores its data."
echo "The database folder, where postgres stores its data."
echo "Leave empty to have this managed by docker."
echo ""
echo "CAUTION: If specified, you must specify an absolute path starting with /"
@@ -224,7 +211,7 @@ if [[ "$DATABASE_BACKEND" == "postgres" || "$DATABASE_BACKEND" == "mariadb" ]] ;
echo ""
ask_docker_folder "Database folder" ""
DATABASE_FOLDER=$ask_result
POSTGRES_FOLDER=$ask_result
fi
echo ""
@@ -278,16 +265,14 @@ if [[ -z $DATA_FOLDER ]] ; then
else
echo "Data folder: $DATA_FOLDER"
fi
if [[ "$DATABASE_BACKEND" == "postgres" || "$DATABASE_BACKEND" == "mariadb" ]] ; then
if [[ -z $DATABASE_FOLDER ]] ; then
echo "Database folder: Managed by docker"
if [[ "$DATABASE_BACKEND" == "postgres" ]] ; then
if [[ -z $POSTGRES_FOLDER ]] ; then
echo "Database (postgres) folder: Managed by docker"
else
echo "Database folder: $DATABASE_FOLDER"
echo "Database (postgres) folder: $POSTGRES_FOLDER"
fi
fi
echo ""
echo "URL: $URL"
echo "Port: $PORT"
echo "Database: $DATABASE_BACKEND"
echo "Tika enabled: $TIKA_ENABLED"
@@ -320,15 +305,9 @@ wget "https://raw.githubusercontent.com/paperless-ngx/paperless-ngx/main/docker/
SECRET_KEY=$(tr -dc 'a-zA-Z0-9' < /dev/urandom | fold -w 64 | head -n 1)
DEFAULT_LANGUAGES=("deu eng fra ita spa")
_split_langs="${OCR_LANGUAGE//+/ }"
read -r -a OCR_LANGUAGES_ARRAY <<< "${_split_langs}"
DEFAULT_LANGUAGES="deu eng fra ita spa"
{
if [[ ! $URL == "" ]] ; then
echo "PAPERLESS_URL=$URL"
fi
if [[ ! $USERMAP_UID == "1000" ]] ; then
echo "USERMAP_UID=$USERMAP_UID"
fi
@@ -338,8 +317,8 @@ read -r -a OCR_LANGUAGES_ARRAY <<< "${_split_langs}"
echo "PAPERLESS_TIME_ZONE=$TIME_ZONE"
echo "PAPERLESS_OCR_LANGUAGE=$OCR_LANGUAGE"
echo "PAPERLESS_SECRET_KEY=$SECRET_KEY"
if [[ ! ${DEFAULT_LANGUAGES[*]} =~ ${OCR_LANGUAGES_ARRAY[*]} ]] ; then
echo "PAPERLESS_OCR_LANGUAGES=${OCR_LANGUAGES_ARRAY[*]}"
if [[ ! " ${DEFAULT_LANGUAGES[*]} " =~ ${OCR_LANGUAGE} ]] ; then
echo "PAPERLESS_OCR_LANGUAGES=$OCR_LANGUAGE"
fi
} > docker-compose.env
@@ -357,16 +336,9 @@ if [[ -n $DATA_FOLDER ]] ; then
sed -i "/^\s*data:/d" docker-compose.yml
fi
# If the database folder was provided (not blank), replace the pgdata/dbdata volume with a bind mount
# of the provided folder
if [[ -n $DATABASE_FOLDER ]] ; then
if [[ "$DATABASE_BACKEND" == "postgres" ]] ; then
sed -i "s#- pgdata:/var/lib/postgresql/data#- $DATABASE_FOLDER:/var/lib/postgresql/data#g" docker-compose.yml
sed -i "/^\s*pgdata:/d" docker-compose.yml
elif [[ "$DATABASE_BACKEND" == "mariadb" ]]; then
sed -i "s#- dbdata:/var/lib/mysql#- $DATABASE_FOLDER:/var/lib/mysql#g" docker-compose.yml
sed -i "/^\s*dbdata:/d" docker-compose.yml
fi
if [[ -n $POSTGRES_FOLDER ]] ; then
sed -i "s#- pgdata:/var/lib/postgresql/data#- $POSTGRES_FOLDER:/var/lib/postgresql/data#g" docker-compose.yml
sed -i "/^\s*pgdata:/d" docker-compose.yml
fi
# remove trailing blank lines from end of file
@@ -379,8 +351,8 @@ if [ "$l1" -eq "$l2" ] ; then
fi
${DOCKER_COMPOSE_CMD} pull
docker-compose pull
${DOCKER_COMPOSE_CMD} run --rm -e DJANGO_SUPERUSER_PASSWORD="$PASSWORD" webserver createsuperuser --noinput --username "$USERNAME" --email "$EMAIL"
docker-compose run --rm -e DJANGO_SUPERUSER_PASSWORD="$PASSWORD" webserver createsuperuser --noinput --username "$USERNAME" --email "$EMAIL"
${DOCKER_COMPOSE_CMD} up --detach
docker-compose up -d

View File

@@ -23,15 +23,12 @@
#PAPERLESS_MEDIA_ROOT=../media
#PAPERLESS_STATICDIR=../static
#PAPERLESS_FILENAME_FORMAT=
#PAPERLESS_FILENAME_FORMAT_REMOVE_NONE=
# Security and hosting
#PAPERLESS_SECRET_KEY=change-me
#PAPERLESS_URL=https://example.com
#PAPERLESS_CSRF_TRUSTED_ORIGINS=https://example.com # can be set using PAPERLESS_URL
#PAPERLESS_ALLOWED_HOSTS=example.com,www.example.com # can be set using PAPERLESS_URL
#PAPERLESS_CORS_ALLOWED_HOSTS=https://localhost:8080,https://example.com # can be set using PAPERLESS_URL
#PAPERLESS_ALLOWED_HOSTS=example.com,www.example.com
#PAPERLESS_CORS_ALLOWED_HOSTS=http://example.com,http://localhost:8000
#PAPERLESS_FORCE_SCRIPT_NAME=
#PAPERLESS_STATIC_URL=/static/
#PAPERLESS_AUTO_LOGIN_USERNAME=
@@ -61,18 +58,15 @@
#PAPERLESS_CONSUMER_POLLING=10
#PAPERLESS_CONSUMER_DELETE_DUPLICATES=false
#PAPERLESS_CONSUMER_RECURSIVE=false
#PAPERLESS_CONSUMER_IGNORE_PATTERNS=[".DS_STORE/*", "._*", ".stfolder/*", ".stversions/*", ".localized/*", "desktop.ini"]
#PAPERLESS_CONSUMER_IGNORE_PATTERNS=[".DS_STORE/*", "._*", ".stfolder/*"]
#PAPERLESS_CONSUMER_SUBDIRS_AS_TAGS=false
#PAPERLESS_CONSUMER_ENABLE_BARCODES=false
#PAPERLESS_CONSUMER_BARCODE_STRING=PATCHT
#PAPERLESS_OPTIMIZE_THUMBNAILS=true
#PAPERLESS_PRE_CONSUME_SCRIPT=/path/to/an/arbitrary/script.sh
#PAPERLESS_POST_CONSUME_SCRIPT=/path/to/an/arbitrary/script.sh
#PAPERLESS_FILENAME_DATE_ORDER=YMD
#PAPERLESS_FILENAME_PARSE_TRANSFORMS=[]
#PAPERLESS_NUMBER_OF_SUGGESTED_DATES=5
#PAPERLESS_THUMBNAIL_FONT_NAME=
#PAPERLESS_IGNORE_DATES=
#PAPERLESS_ENABLE_UPDATE_CHECK=
# Tika settings
@@ -84,3 +78,4 @@
#PAPERLESS_CONVERT_BINARY=/usr/bin/convert
#PAPERLESS_GS_BINARY=/usr/bin/gs
#PAPERLESS_OPTIPNG_BINARY=/usr/bin/optipng

111
requirements.txt Normal file
View File

@@ -0,0 +1,111 @@
#
# These requirements were autogenerated by pipenv
# To regenerate from the project's Pipfile, run:
#
# pipenv lock --requirements
#
-i https://pypi.python.org/simple
--extra-index-url https://www.piwheels.org/simple
aioredis==1.3.1
anyio==3.5.0; python_full_version >= '3.6.2'
arrow==1.2.2; python_version >= '3.6'
asgiref==3.5.0; python_version >= '3.7'
async-timeout==4.0.2; python_version >= '3.6'
attrs==21.4.0; python_version >= '2.7' and python_version not in '3.0, 3.1, 3.2, 3.3, 3.4'
autobahn==22.2.2; python_version >= '3.7'
automat==20.2.0
backports.zoneinfo==0.2.1; python_version < '3.9'
blessed==1.19.1; python_version >= '2.7'
certifi==2021.10.8
cffi==1.15.0
channels-redis==3.4.0
channels==3.0.4
chardet==4.0.0; python_version >= '3.1'
charset-normalizer==2.0.12; python_version >= '3'
click==8.0.4; python_version >= '3.6'
coloredlogs==15.0.1; python_version >= '2.7' and python_version not in '3.0, 3.1, 3.2, 3.3, 3.4'
concurrent-log-handler==0.9.20
constantly==15.1.0
cryptography==36.0.2; python_version >= '3.6'
daphne==3.0.2; python_version >= '3.6'
dateparser==1.1.1
django-cors-headers==3.11.0
django-extensions==3.1.5
django-filter==21.1
django-picklefield==3.0.1; python_version >= '3'
django-q==1.3.9
django==4.0.3
djangorestframework==3.13.1
filelock==3.6.0
fuzzywuzzy[speedup]==0.18.0
gunicorn==20.1.0
h11==0.13.0; python_version >= '3.6'
hiredis==2.0.0; python_version >= '3.6'
httptools==0.4.0
humanfriendly==10.0; python_version >= '2.7' and python_version not in '3.0, 3.1, 3.2, 3.3, 3.4'
hyperlink==21.0.0
idna==3.3; python_version >= '3.5'
imap-tools==0.53.0
img2pdf==0.4.3
importlib-resources==5.6.0; python_version < '3.9'
incremental==21.3.0
inotify-simple==1.3.5; python_version >= '2.7' and python_version not in '3.0, 3.1, 3.2, 3.3'
inotifyrecursive==0.3.5
joblib==1.1.0; python_version >= '3.6'
langdetect==1.0.9
lxml==4.8.0; python_version >= '2.7' and python_version not in '3.0, 3.1, 3.2, 3.3, 3.4'
msgpack==1.0.3
numpy==1.22.3; python_version >= '3.8'
ocrmypdf==13.4.1
packaging==21.3; python_version >= '3.6'
pathvalidate==2.5.0
pdfminer.six==20211012
pikepdf==5.1.0
pillow==9.0.1
pluggy==1.0.0; python_version >= '3.6'
portalocker==2.4.0; python_version >= '3'
psycopg2==2.9.3
pyasn1-modules==0.2.8
pyasn1==0.4.8
pycparser==2.21
pyopenssl==22.0.0
pyparsing==3.0.7; python_version >= '3.6'
python-dateutil==2.8.2
python-dotenv==0.20.0
python-gnupg==0.4.8
python-levenshtein==0.12.2
python-magic==0.4.25
pytz-deprecation-shim==0.1.0.post0; python_version >= '2.7' and python_version not in '3.0, 3.1, 3.2, 3.3, 3.4, 3.5'
pytz==2022.1
pyyaml==6.0
redis==3.5.3
regex==2022.3.2; python_version >= '3.6'
reportlab==3.6.9; python_version >= '3.7' and python_version < '4'
requests==2.27.1; python_version >= '2.7' and python_version not in '3.0, 3.1, 3.2, 3.3, 3.4, 3.5'
scikit-learn==1.0.2
scipy==1.8.0; python_version < '3.11' and python_version >= '3.8'
service-identity==21.1.0
setuptools==61.1.1; python_version >= '3.7'
six==1.16.0; python_version >= '2.7' and python_version not in '3.0, 3.1, 3.2, 3.3'
sniffio==1.2.0; python_version >= '3.5'
sqlparse==0.4.2; python_version >= '3.5'
threadpoolctl==3.1.0; python_version >= '3.6'
tika==1.24
tqdm==4.63.1
twisted[tls]==22.2.0; python_full_version >= '3.6.7'
txaio==22.2.1; python_version >= '3.6'
typing-extensions==4.1.1; python_version >= '3.6'
tzdata==2022.1; python_version >= '3.6'
tzlocal==4.1; python_version >= '3.6'
urllib3==1.26.9; python_version >= '2.7' and python_version not in '3.0, 3.1, 3.2, 3.3, 3.4' and python_version < '4'
uvicorn[standard]==0.17.6
uvloop==0.16.0
watchdog==2.1.7
watchgod==0.8.1
wcwidth==0.2.5
websockets==10.2
whitenoise==6.0.0
whoosh==2.7.4
zipp==3.7.0; python_version < '3.9'
zope.interface==5.4.0; python_version >= '2.7' and python_version not in '3.0, 3.1, 3.2, 3.3, 3.4'

View File

@@ -1,12 +1,12 @@
[Unit]
Description=Paperless Celery Beat
Description=Paperless scheduler
Requires=redis.service
[Service]
User=paperless
Group=paperless
WorkingDirectory=/opt/paperless/src
ExecStart=celery --app paperless beat --loglevel INFO
ExecStart=python3 manage.py qcluster
[Install]
WantedBy=multi-user.target

View File

@@ -1,12 +0,0 @@
[Unit]
Description=Paperless Celery Workers
Requires=redis.service
[Service]
User=paperless
Group=paperless
WorkingDirectory=/opt/paperless/src
ExecStart=celery --app paperless worker --loglevel INFO
[Install]
WantedBy=multi-user.target

View File

@@ -1,16 +1,21 @@
#!/usr/bin/env bash
DOCUMENT_ID=${1}
DOCUMENT_FILE_NAME=${2}
DOCUMENT_SOURCE_PATH=${3}
DOCUMENT_THUMBNAIL_PATH=${4}
DOCUMENT_DOWNLOAD_URL=${5}
DOCUMENT_THUMBNAIL_URL=${6}
DOCUMENT_CORRESPONDENT=${7}
DOCUMENT_TAGS=${8}
echo "
A document with an id of ${DOCUMENT_ID} was just consumed. I know the
following additional information about it:
* Generated File Name: ${DOCUMENT_FILE_NAME}
* Archive Path: ${DOCUMENT_ARCHIVE_PATH}
* Source Path: ${DOCUMENT_SOURCE_PATH}
* Created: ${DOCUMENT_CREATED}
* Added: ${DOCUMENT_ADDED}
* Modified: ${DOCUMENT_MODIFIED}
* Thumbnail Path: ${DOCUMENT_THUMBNAIL_PATH}
* Download URL: ${DOCUMENT_DOWNLOAD_URL}
* Thumbnail URL: ${DOCUMENT_THUMBNAIL_URL}

View File

@@ -2,5 +2,5 @@
docker run -p 5432:5432 -e POSTGRES_PASSWORD=password -v paperless_pgdata:/var/lib/postgresql/data -d postgres:13
docker run -d -p 6379:6379 redis:latest
docker run -p 3000:3000 -d gotenberg/gotenberg:7.6
docker run -p 9998:9998 -d ghcr.io/paperless-ngx/tika:latest
docker run -p 3000:3000 -d gotenberg/gotenberg:7
docker run -p 9998:9998 -d apache/tika

View File

@@ -1,13 +0,0 @@
import { defineConfig } from 'cypress'
export default defineConfig({
videosFolder: 'cypress/videos',
screenshotsFolder: 'cypress/screenshots',
fixturesFolder: 'cypress/fixtures',
e2e: {
setupNodeEvents(on, config) {
return require('./cypress/plugins/index.ts')(on, config)
},
baseUrl: 'http://localhost:4200',
},
})

9
src-ui/cypress.json Normal file
View File

@@ -0,0 +1,9 @@
{
"integrationFolder": "cypress/integration",
"supportFile": "cypress/support/index.ts",
"videosFolder": "cypress/videos",
"screenshotsFolder": "cypress/screenshots",
"pluginsFile": "cypress/plugins/index.ts",
"fixturesFolder": "cypress/fixtures",
"baseUrl": "http://localhost:4200"
}

View File

@@ -1,94 +0,0 @@
describe('document-detail', () => {
beforeEach(() => {
// also uses global fixtures from cypress/support/e2e.ts
this.modifiedDocuments = []
cy.fixture('documents/documents.json').then((documentsJson) => {
cy.intercept('GET', 'http://localhost:8000/api/documents/1/', (req) => {
let response = { ...documentsJson }
response = response.results.find((d) => d.id == 1)
req.reply(response)
})
})
cy.intercept('PUT', 'http://localhost:8000/api/documents/1/', (req) => {
this.modifiedDocuments.push(req.body) // store this for later
req.reply({ result: 'OK' })
}).as('saveDoc')
cy.fixture('documents/1/comments.json').then((commentsJson) => {
cy.intercept(
'GET',
'http://localhost:8000/api/documents/1/comments/',
(req) => {
req.reply(commentsJson.filter((c) => c.id != 10)) // 3
}
)
cy.intercept(
'DELETE',
'http://localhost:8000/api/documents/1/comments/?id=9',
(req) => {
req.reply(commentsJson.filter((c) => c.id != 9 && c.id != 10)) // 2
}
)
cy.intercept(
'POST',
'http://localhost:8000/api/documents/1/comments/',
(req) => {
req.reply(commentsJson) // 4
}
)
})
cy.viewport(1024, 1024)
cy.visit('/documents/1/')
})
it('should activate / deactivate save button when changes are saved', () => {
cy.contains('button', 'Save').should('be.disabled')
cy.get('app-input-text[formcontrolname="title"]')
.type(' additional')
.wait(1500) // this delay is for frontend debounce
cy.contains('button', 'Save').should('not.be.disabled')
})
it('should warn on unsaved changes', () => {
cy.get('app-input-text[formcontrolname="title"]')
.type(' additional')
.wait(1500) // this delay is for frontend debounce
cy.get('button[title="Close"]').click()
cy.contains('You have unsaved changes')
cy.contains('button', 'Cancel').click().wait(150)
cy.contains('button', 'Save').click().wait('@saveDoc').wait(2000) // navigates away after saving
cy.contains('You have unsaved changes').should('not.exist')
})
it('should show a list of comments', () => {
cy.wait(1000).get('a').contains('Comments').click().wait(1000)
cy.get('app-document-comments').find('.card').its('length').should('eq', 3)
})
it('should support comment deletion', () => {
cy.wait(1000).get('a').contains('Comments').click().wait(1000)
cy.get('app-document-comments')
.find('.card')
.first()
.find('button')
.click({ force: true })
.wait(500)
cy.get('app-document-comments').find('.card').its('length').should('eq', 2)
})
it('should support comment insertion', () => {
cy.wait(1000).get('a').contains('Comments').click().wait(1000)
cy.get('app-document-comments')
.find('form textarea')
.type('Testing new comment')
.wait(500)
cy.get('app-document-comments').find('form button').click().wait(1500)
cy.get('app-document-comments').find('.card').its('length').should('eq', 4)
})
})

View File

@@ -1,331 +0,0 @@
import { PaperlessDocument } from 'src/app/data/paperless-document'
describe('documents query params', () => {
beforeEach(() => {
// also uses global fixtures from cypress/support/e2e.ts
cy.fixture('documents/documents.json').then((documentsJson) => {
// mock api filtering
cy.intercept('GET', 'http://localhost:8000/api/documents/*', (req) => {
let response = { ...documentsJson }
if (req.query.hasOwnProperty('ordering')) {
const sort_field = req.query['ordering'].toString().replace('-', '')
const reverse = req.query['ordering'].toString().indexOf('-') !== -1
response.results = (
documentsJson.results as Array<PaperlessDocument>
).sort((docA, docB) => {
let result = 0
switch (sort_field) {
case 'created':
case 'added':
result =
new Date(docA[sort_field]) < new Date(docB[sort_field])
? -1
: 1
break
case 'archive_serial_number':
result = docA[sort_field] < docB[sort_field] ? -1 : 1
break
}
if (reverse) result = -result
return result
})
}
if (req.query.hasOwnProperty('tags__id__in')) {
const tag_ids: Array<number> = req.query['tags__id__in']
.toString()
.split(',')
.map((v) => +v)
response.results = (
documentsJson.results as Array<PaperlessDocument>
).filter(
(d) =>
d.tags.length > 0 &&
d.tags.filter((t) => tag_ids.includes(t)).length > 0
)
response.count = response.results.length
} else if (req.query.hasOwnProperty('tags__id__none')) {
const tag_ids: Array<number> = req.query['tags__id__none']
.toString()
.split(',')
.map((v) => +v)
response.results = (
documentsJson.results as Array<PaperlessDocument>
).filter((d) => d.tags.filter((t) => tag_ids.includes(t)).length == 0)
response.count = response.results.length
} else if (
req.query.hasOwnProperty('is_tagged') &&
req.query['is_tagged'] == '0'
) {
response.results = (
documentsJson.results as Array<PaperlessDocument>
).filter((d) => d.tags.length == 0)
response.count = response.results.length
}
if (req.query.hasOwnProperty('document_type__id')) {
const doctype_id = +req.query['document_type__id']
response.results = (
documentsJson.results as Array<PaperlessDocument>
).filter((d) => d.document_type == doctype_id)
response.count = response.results.length
} else if (
req.query.hasOwnProperty('document_type__isnull') &&
req.query['document_type__isnull'] == '1'
) {
response.results = (
documentsJson.results as Array<PaperlessDocument>
).filter((d) => d.document_type == undefined)
response.count = response.results.length
}
if (req.query.hasOwnProperty('correspondent__id')) {
const correspondent_id = +req.query['correspondent__id']
response.results = (
documentsJson.results as Array<PaperlessDocument>
).filter((d) => d.correspondent == correspondent_id)
response.count = response.results.length
} else if (
req.query.hasOwnProperty('correspondent__isnull') &&
req.query['correspondent__isnull'] == '1'
) {
response.results = (
documentsJson.results as Array<PaperlessDocument>
).filter((d) => d.correspondent == undefined)
response.count = response.results.length
}
if (req.query.hasOwnProperty('storage_path__id')) {
const storage_path_id = +req.query['storage_path__id']
response.results = (
documentsJson.results as Array<PaperlessDocument>
).filter((d) => d.storage_path == storage_path_id)
response.count = response.results.length
} else if (
req.query.hasOwnProperty('storage_path__isnull') &&
req.query['storage_path__isnull'] == '1'
) {
response.results = (
documentsJson.results as Array<PaperlessDocument>
).filter((d) => d.storage_path == undefined)
response.count = response.results.length
}
if (req.query.hasOwnProperty('created__date__gt')) {
const date = new Date(req.query['created__date__gt'])
response.results = (
documentsJson.results as Array<PaperlessDocument>
).filter((d) => new Date(d.created) > date)
response.count = response.results.length
} else if (req.query.hasOwnProperty('created__date__lt')) {
const date = new Date(req.query['created__date__lt'])
response.results = (
documentsJson.results as Array<PaperlessDocument>
).filter((d) => new Date(d.created) < date)
response.count = response.results.length
}
if (req.query.hasOwnProperty('added__date__gt')) {
const date = new Date(req.query['added__date__gt'])
response.results = (
documentsJson.results as Array<PaperlessDocument>
).filter((d) => new Date(d.added) > date)
response.count = response.results.length
} else if (req.query.hasOwnProperty('added__date__lt')) {
const date = new Date(req.query['added__date__lt'])
response.results = (
documentsJson.results as Array<PaperlessDocument>
).filter((d) => new Date(d.added) < date)
response.count = response.results.length
}
if (req.query.hasOwnProperty('title_content')) {
const title_content_regexp = new RegExp(
req.query['title_content'].toString(),
'i'
)
response.results = (
documentsJson.results as Array<PaperlessDocument>
).filter(
(d) =>
title_content_regexp.test(d.title) ||
title_content_regexp.test(d.content)
)
response.count = response.results.length
}
if (req.query.hasOwnProperty('archive_serial_number')) {
const asn = +req.query['archive_serial_number']
response.results = (
documentsJson.results as Array<PaperlessDocument>
).filter((d) => d.archive_serial_number == asn)
response.count = response.results.length
} else if (req.query.hasOwnProperty('archive_serial_number__isnull')) {
const isnull = req.query['storage_path__isnull'] == '1'
response.results = (
documentsJson.results as Array<PaperlessDocument>
).filter((d) =>
isnull
? d.archive_serial_number == undefined
: d.archive_serial_number != undefined
)
response.count = response.results.length
} else if (req.query.hasOwnProperty('archive_serial_number__gt')) {
const asn = +req.query['archive_serial_number__gt']
response.results = (
documentsJson.results as Array<PaperlessDocument>
).filter(
(d) => d.archive_serial_number > 0 && d.archive_serial_number > asn
)
response.count = response.results.length
} else if (req.query.hasOwnProperty('archive_serial_number__lt')) {
const asn = +req.query['archive_serial_number__lt']
response.results = (
documentsJson.results as Array<PaperlessDocument>
).filter(
(d) => d.archive_serial_number > 0 && d.archive_serial_number < asn
)
response.count = response.results.length
}
req.reply(response)
})
})
})
it('should show a list of documents sorted by created', () => {
cy.visit('/documents?sort=created')
cy.get('app-document-card-small').first().contains('No latin title')
})
it('should show a list of documents reverse sorted by created', () => {
cy.visit('/documents?sort=created&reverse=true')
cy.get('app-document-card-small').first().contains('sit amet')
})
it('should show a list of documents sorted by added', () => {
cy.visit('/documents?sort=added')
cy.get('app-document-card-small').first().contains('No latin title')
})
it('should show a list of documents reverse sorted by added', () => {
cy.visit('/documents?sort=added&reverse=true')
cy.get('app-document-card-small').first().contains('sit amet')
})
it('should show a list of documents filtered by any tags', () => {
cy.visit('/documents?sort=created&reverse=true&tags__id__in=2,4,5')
cy.contains('3 documents')
})
it('should show a list of documents filtered by excluded tags', () => {
cy.visit('/documents?sort=created&reverse=true&tags__id__none=2,4')
cy.contains('One document')
})
it('should show a list of documents filtered by no tags', () => {
cy.visit('/documents?sort=created&reverse=true&is_tagged=0')
cy.contains('One document')
})
it('should show a list of documents filtered by document type', () => {
cy.visit('/documents?sort=created&reverse=true&document_type__id=1')
cy.contains('3 documents')
})
it('should show a list of documents filtered by no document type', () => {
cy.visit('/documents?sort=created&reverse=true&document_type__isnull=1')
cy.contains('One document')
})
it('should show a list of documents filtered by correspondent', () => {
cy.visit('/documents?sort=created&reverse=true&correspondent__id=9')
cy.contains('2 documents')
})
it('should show a list of documents filtered by no correspondent', () => {
cy.visit('/documents?sort=created&reverse=true&correspondent__isnull=1')
cy.contains('2 documents')
})
it('should show a list of documents filtered by storage path', () => {
cy.visit('/documents?sort=created&reverse=true&storage_path__id=2')
cy.contains('One document')
})
it('should show a list of documents filtered by no storage path', () => {
cy.visit('/documents?sort=created&reverse=true&storage_path__isnull=1')
cy.contains('3 documents')
})
it('should show a list of documents filtered by title or content', () => {
cy.visit('/documents?sort=created&reverse=true&title_content=lorem')
cy.contains('2 documents')
})
it('should show a list of documents filtered by asn', () => {
cy.visit('/documents?sort=created&reverse=true&archive_serial_number=12345')
cy.contains('One document')
})
it('should show a list of documents filtered by empty asn', () => {
cy.visit(
'/documents?sort=created&reverse=true&archive_serial_number__isnull=1'
)
cy.contains('2 documents')
})
it('should show a list of documents filtered by non-empty asn', () => {
cy.visit(
'/documents?sort=created&reverse=true&archive_serial_number__isnull=0'
)
cy.contains('2 documents')
})
it('should show a list of documents filtered by asn greater than', () => {
cy.visit(
'/documents?sort=created&reverse=true&archive_serial_number__gt=12346'
)
cy.contains('One document')
})
it('should show a list of documents filtered by asn less than', () => {
cy.visit(
'/documents?sort=created&reverse=true&archive_serial_number__lt=12346'
)
cy.contains('One document')
})
it('should show a list of documents filtered by created date greater than', () => {
cy.visit(
'/documents?sort=created&reverse=true&created__date__gt=2022-03-23'
)
cy.contains('3 documents')
})
it('should show a list of documents filtered by created date less than', () => {
cy.visit(
'/documents?sort=created&reverse=true&created__date__lt=2022-03-23'
)
cy.contains('One document')
})
it('should show a list of documents filtered by added date greater than', () => {
cy.visit('/documents?sort=created&reverse=true&added__date__gt=2022-03-24')
cy.contains('2 documents')
})
it('should show a list of documents filtered by added date less than', () => {
cy.visit('/documents?sort=created&reverse=true&added__date__lt=2022-03-24')
cy.contains('2 documents')
})
it('should show a list of documents filtered by multiple filters', () => {
cy.visit(
'/documents?sort=created&reverse=true&document_type__id=1&correspondent__id=9&tags__id__in=4,5'
)
cy.contains('2 documents')
})
})

View File

@@ -1,60 +0,0 @@
describe('tasks', () => {
beforeEach(() => {
this.dismissedTasks = new Set<number>()
cy.fixture('tasks/tasks.json').then((tasksViewsJson) => {
// acknowledge tasks POST
cy.intercept(
'POST',
'http://localhost:8000/api/acknowledge_tasks/',
(req) => {
req.body['tasks'].forEach((t) => this.dismissedTasks.add(t)) // store this for later
req.reply({ result: 'OK' })
}
)
cy.intercept('GET', 'http://localhost:8000/api/tasks/', (req) => {
let response = [...tasksViewsJson]
if (this.dismissedTasks.size) {
response = response.filter((t) => {
return !this.dismissedTasks.has(t.id)
})
}
req.reply(response)
}).as('tasks')
})
cy.visit('/tasks')
cy.wait('@tasks')
})
it('should show a list of dismissable tasks in tabs', () => {
cy.get('tbody').find('tr:visible').its('length').should('eq', 10) // double because collapsible result tr
cy.wait(500) // stabilizes the test, for some reason...
cy.get('tbody')
.find('button:visible')
.contains('Dismiss')
.first()
.click()
.wait('@tasks')
.wait(2000)
.then(() => {
cy.get('tbody').find('tr:visible').its('length').should('eq', 8) // double because collapsible result tr
})
})
it('should allow toggling all tasks in list and warn on dismiss', () => {
cy.get('thead').find('input[type="checkbox"]').first().click()
cy.get('body').find('button').contains('Dismiss selected').first().click()
cy.contains('Confirm')
cy.get('.modal')
.contains('button', 'Dismiss')
.click()
.wait('@tasks')
.wait(2000)
.then(() => {
cy.get('tbody').find('tr:visible').should('not.exist')
})
})
})

View File

@@ -1,46 +0,0 @@
[
{
"id": 10,
"comment": "Testing new comment",
"created": "2022-08-08T04:24:55.176008Z",
"user": {
"id": 1,
"username": "user2",
"firstname": "",
"lastname": ""
}
},
{
"id": 9,
"comment": "Testing one more time",
"created": "2022-02-18T04:24:55.176008Z",
"user": {
"id": 2,
"username": "user1",
"firstname": "",
"lastname": ""
}
},
{
"id": 8,
"comment": "Another comment",
"created": "2021-11-08T04:24:47.925042Z",
"user": {
"id": 2,
"username": "user33",
"firstname": "",
"lastname": ""
}
},
{
"id": 7,
"comment": "Cupcake ipsum dolor sit amet cheesecake candy cookie tiramisu. Donut chocolate chupa chups macaroon brownie halvah pie cheesecake gummies. Sweet chocolate bar candy donut gummi bears bear claw liquorice bonbon shortbread.\n\nDonut chocolate bar candy wafer wafer tiramisu. Gummies chocolate cake muffin toffee carrot cake macaroon. Toffee toffee jelly beans danish lollipop cake.",
"created": "2021-02-08T02:37:49.724132Z",
"user": {
"id": 3,
"username": "admin",
"firstname": "",
"lastname": ""
}
}
]

File diff suppressed because one or more lines are too long

View File

@@ -1 +0,0 @@
{"version":"v1.7.1","update_available":false,"feature_is_set":true}

Some files were not shown because too many files have changed in this diff Show More