paperless-ngx

mirror of https://github.com/paperless-ngx/paperless-ngx.git synced 2026-05-05 22:25:25 +00:00

Author	SHA1	Message	Date
Brian Martin	b6ae129ad1	Sample Config and Bug Fix Update sample config to reflect new setting variable. Change consumer to handle density setting as str instead of int.	2016-05-13 23:23:58 -04:00
Brian Martin	52c5aafb3f	Convert Density Add settings variable for the convert density setting. If no variable is set, default to 300.	2016-05-13 22:47:40 -04:00
Daniel Quinn	e96c7448bc	Fix for #107	2016-04-11 23:28:12 +01:00
Daniel Quinn	90939be6af	@Pitkley made a good suggestion in #98	2016-04-10 17:39:49 +01:00
Daniel Quinn	64b72d4337	Added test for duplicates	2016-04-03 18:44:00 +01:00
Daniel Quinn	bbe691f342	Merge pull request #101 from danielquinn/issue/89 Closes #89.	2016-03-28 14:25:56 +01:00
Daniel Quinn	b4e648e1e3	Test All The Things	2016-03-28 14:16:26 +01:00
Daniel Quinn	b92e007e15	Removed log components and introduced signals for tags & correspondents	2016-03-28 11:11:15 +01:00
Daniel Quinn	49b56425e8	Merge branch 'master' into issue/81	2016-03-25 20:56:30 +00:00
Daniel Quinn	b387be6f25	I didn't mean to explicitly set -limit	2016-03-25 20:33:00 +00:00
Daniel Quinn	9991f5a6b2	Introducing optional env vars for ImageMagick	2016-03-25 20:31:15 +00:00
Daniel Quinn	0aa0513004	Modifications for support for dates	2016-03-24 19:18:33 +00:00
Daniel Quinn	1170139127	Added a consume-start and consume-finish signal	2016-03-14 21:20:44 +00:00
Tikitu de Jager	95217e8e21	Use FileInfo directly instead of via indirection	2016-03-07 21:08:07 +02:00
Tikitu de Jager	1f75af0137	Extract filename parsing into testable class	2016-03-07 21:05:04 +02:00
Pit Kleyersburg	fb36a49c26	Add unpaper as another pre-processing step	2016-03-06 15:30:37 +01:00
Daniel Quinn	495ed1c36c	Added thumbnail generation to the conumer	2016-03-05 12:09:06 +00:00
Daniel Quinn	5d4587ef8b	Accounted for .sender in a few places	2016-03-04 09:14:50 +00:00
Daniel Quinn	070463b85a	s/Sender/Correspondent & reworked the (im\|ex)porter	2016-03-03 20:52:42 +00:00
Daniel Quinn	fad466477b	More verbose error logging	2016-03-03 18:18:48 +00:00
Daniel Quinn	631aa99d92	No need to pass verbosity around anymore	2016-02-28 00:39:40 +00:00
Daniel Quinn	2fe9b0cbc1	New logging appears to work	2016-02-27 20:18:50 +00:00
Daniel Quinn	1aecb1e63a	Compensate for case and format of jpg vs. jpeg	2016-02-23 20:15:13 +00:00
Daniel Quinn	3a7923e32d	Moved pyocr.get_available_tools() into a method	2016-02-21 02:24:05 +00:00
Daniel Quinn	422ae9303a	pep8	2016-02-21 00:14:50 +00:00
Daniel Quinn	51b19f4c19	Issue #57	2016-02-20 22:30:01 +00:00
Pit Kleyersburg	c45f951ca0	Ignore error if orientation detection fails Fixes an additional issue that came up in #48.	2016-02-19 09:52:32 +01:00
Pit Kleyersburg	c34d57a872	Detect image orientation if the OCR supports it Fixes issue #47.	2016-02-18 09:37:13 +01:00
Daniel Quinn	1e7ece81ee	Fixes #45	2016-02-17 23:07:54 +00:00
Daniel Quinn	6f95b05287	Support appropriate sorting for long documents	2016-02-17 00:10:05 +00:00
Pit Kleyersburg	46f8f492f5	Safely and non-randomly create scratch directory Creating the scratch-files in `_get_grayscale` using a random integer is for one inherently unsafe and can cause a collision. On the other hand, it should be unnecessary given that the files will be cleaned up after the OCR run. Since we don't know if OCR runs might be parallel in the future, this commit implements thread-safe and deterministic directory-creation. Additionally it fixes the call to `_cleanup` by `consume`. In the current implementation `_cleanup` will not be called if the last consumed document failed with an `OCRError`, this commit fixes this.	2016-02-16 12:15:57 +01:00
Daniel Quinn	a0f4f6c5f2	Fixed merge conflict and did some pep8	2016-02-14 17:13:48 +00:00
Pit Kleyersburg	aeab9a0e81	Detect language only on one page of PDF To detect the language currently the entire document gets processed. If a different language has been detected than the default one, the entire document will be processed again for the new language. This PR analyzes the middle page for its language and either processes the remaining pages with the default language if it didn't differ, or processes all pages for the new guessed language. The amount of processed pages comes down from the worst case `2n` to worst case `n+1`.	2016-02-14 17:55:13 +01:00
Daniel Quinn	7843ea5037	Added and implemented a rudimentary logger	2016-02-14 16:09:52 +00:00
Pit Kleyersburg	20b2408dbb	Ensure `OCR_THREADS` is integer, add documentation	2016-02-14 16:37:38 +01:00
Pit Kleyersburg	f5beda9c56	Enable parallel OCR processing At the moment, every page in a PDF will be processed one by one using tesseract. Since the processing of a single page is independent from every other page, one can make use of multi-core machines. This PR introduces a multiprocessing pool to process multiple pages simultaneously. The amount of threads to use can be specified in the environment variable `PAPERLESS_OCR_THREADS`. This will default to the number of cores/hyperthreads Python detects for your system.	2016-02-14 15:57:42 +01:00
Daniel Quinn	a846b3f7b8	Adding some more debugging	2016-02-13 00:57:05 +00:00
Daniel Quinn	2421f559be	Simpler regex	2016-02-12 08:27:09 +00:00
Daniel Quinn	a022fcb8f1	Fixed the auto-naming regexes	2016-02-11 22:05:55 +00:00
Daniel Quinn	48761911b3	Image imports and consumption by mail work	2016-02-06 17:05:36 +00:00

40 Commits