Commit Graph

64 Commits

Author SHA1 Message Date
Trenton Holmes e8868d7ebf Entirely removes the optipng, updates ghostscript fall back to also use WebP. Updates the conversion to use a multiprocessing pool 2022-06-11 08:38:49 -07:00
Michael Shamoon 58f2c6a5fc webp thumbnail support with png fallback 2022-06-10 02:28:13 -07:00
shamoon 536576518e Merge pull request #721 from paperless-ngx/bug-fix-date-ignore
Fix Ignore Date Parsing
2022-05-10 16:45:58 -07:00
Trenton Holmes 5b96944940 Updates the ignore date parsing to utilize the settings defined date order, instead of guessing a bit 2022-05-08 16:57:35 -07:00
Trenton Holmes 8a6aaf4e2d Adds additional testing for both date parsing and consumed document created date 2022-05-08 16:57:35 -07:00
Trenton Holmes 3003bdd507 Runs pyupgrade to Python 3.8+ and adds a hook for it 2022-05-06 09:04:08 -07:00
Fantasticle 0baacbef98 update new regex pattern for second boundary 2022-03-31 09:37:15 +02:00
fantasticle 1ecb26a3fb Update regex date match patterns 2022-03-30 12:19:30 +02:00
Simon Siebert 54cbacf4f4 Update parsers.py and test_consumer.py 2022-03-14 19:03:09 +01:00
Trenton Holmes 1771d18a21 Runs the pre-commit hooks over all the Python files 2022-03-11 11:34:28 -08:00
kpj fc695896dd Format Python code with black 2022-02-27 15:26:41 +01:00
jonaswinkler 40ce38254b fixes #631 2021-03-14 14:42:48 +01:00
jonaswinkler 416101d557 only import dateparser when required 2021-02-15 11:52:46 +01:00
jonaswinkler 8d6071e977 fix a bug with thumbnail generation when TIKA was enabled 2021-02-09 22:12:43 +01:00
jonaswinkler 431d4fd8e4 rework most of the logging 2021-02-05 01:10:29 +01:00
jonaswinkler bdc247ce49 code style 2021-02-02 23:58:25 +01:00
jonaswinkler 2faa425caf localization for websockets 2021-01-28 22:06:02 +01:00
jonaswinkler 868fd4155a bug fixes, test case fixes 2021-01-26 15:19:56 +01:00
jonaswinkler 05d69c0882 Merge branch 'dev' into feature-websockets-status 2021-01-23 22:22:17 +01:00
Jonas Winkler be94a8e49a Merge pull request #251 from jayme-github/ignore-date
Add option to ignore certain dates in parse_date
2021-01-05 00:19:13 +01:00
jonaswinkler 9f9581e1f8 Merge branch 'dev' into feature-websockets-status 2021-01-04 22:45:56 +01:00
jonaswinkler e97ff3d671 code style 2021-01-02 15:26:09 +01:00
jayme-github 654ee4e62e Add option to ignore certain dates in parse_date
PAPERLESS_IGNORE_DATES allows to specify a comma separated list of dates
to ignore during date parsing (from filename and content). This can be
used so specify dates that do appear often in documents but are usually
not the documents creation date (like your date of birth).
2021-01-02 15:20:49 +01:00
jonaswinkler 40ef375c15 supply file_name for tika parser 2021-01-01 22:19:43 +01:00
jonaswinkler c05bfb894a remove duplicate code 2021-01-01 21:50:45 +01:00
jonaswinkler 713985f259 fixes #218 2020-12-30 15:12:16 +01:00
jonaswinkler 5894060dc5 fixes #25 2020-12-15 13:52:35 +01:00
jonaswinkler 2f7bb01f34 moved metadata extraction to the parsers 2020-12-10 14:57:53 +01:00
jonaswinkler 522ada88ea Merge branch 'dev' into feature-websockets-status 2020-12-06 22:53:54 +01:00
jonaswinkler 4548cf08c7 fixes #78 2020-12-02 18:00:49 +01:00
jonaswinkler 834352130c checking file types against parsers in the consumer. 2020-12-01 15:26:05 +01:00
jonaswinkler aaa6599283 Merge branch 'dev' into feature-ocrmypdf 2020-11-30 16:48:09 +01:00
jonaswinkler f51207fc32 added file type checks to the parsers to prevent temporary files from being consumed. Also: parsers announce file types they wish to use as default for each mime type. 2020-11-30 00:40:04 +01:00
Jonas Winkler df801d17e1 reworked the interface of the parsers. 2020-11-25 19:36:39 +01:00
Jonas Winkler 2d559d330d reworked PDF parser that uses OCRmyPDF and produces archive files. 2020-11-25 14:50:43 +01:00
Jonas Winkler 8069c2eb6a add support for archive files. 2020-11-25 14:47:17 +01:00
Jonas Winkler d252a1dcda Merge branch 'dev' into celery-tasks 2020-11-22 22:49:37 +01:00
Jonas Winkler b44f8383e4 code cleanup 2020-11-21 14:03:45 +01:00
Jonas Winkler 41650f20f4 mime type handling 2020-11-20 13:31:03 +01:00
Jonas Winkler 17430210a1 Merge branch 'dev' into celery-tasks 2020-11-19 22:10:57 +01:00
Jonas Winkler c487e5f017 a new setting that allows you to skip thumbnail optimization. 2020-11-18 22:42:05 +01:00
Jonas Winkler 8908bc259e updated logging, logging for the mail consumer to see whats happening 2020-11-18 13:23:30 +01:00
Jonas Winkler d2e22e3f27 Changed the way parsers are discovered. This also prepares for upcoming changes regarding content types and file types: parsers should declare what they support, and actual file extensions should not be hardcoded everywhere. 2020-11-16 23:53:12 +01:00
Jonas Winkler 0421031128 add some more checks. 2020-11-12 21:20:12 +01:00
Jonas Winkler 2e04ba1c04 code style fixes 2020-11-12 21:09:45 +01:00
Jonas Winkler 572e40ca27 backend that supports asgi and status update sockets with channels 2020-11-07 11:31:04 +01:00
Jonas Winkler 28ba634e6a silenced unpaper once and for all 2020-11-03 14:04:21 +01:00
Jonas Winkler f4cebda085 A handy script to redo ocr on all documents, 2020-11-03 14:04:11 +01:00
Jonas Winkler 3a08a2d206 made unpaper and convert a little bit nicer to interact with 2020-11-02 19:31:04 +01:00
Jonas Winkler d15405ef56 reworked most of the tesseract parser, better logging 2020-11-02 15:40:44 +01:00