Commit Graph

72 Commits

Author SHA1 Message Date
Trenton H 6f163111ce Upgrades black to v23, upgrades ruff 2023-04-26 09:35:27 -07:00
Trenton H 3bcbd05252 Fixes ruff not running isort against the codebase 2023-04-26 09:35:27 -07:00
Trenton H ce41ac9158 Configures ruff as the one stop linter and resolves warnings it raised 2023-04-01 17:03:52 -07:00
Trenton H c21775980f relock with Python 3.8.15 2023-01-06 17:59:39 -08:00
Trenton H d19bf59f47 Cleans up and improves parser discovery testing, simplifies the determination of supported or not supported extensions and mime types 2023-01-05 08:39:48 -08:00
Trenton H 914661fdbb Don't allow an exception when trying to parse a date cause complete failure 2022-11-17 13:37:37 -08:00
Matthias Eck 3d0a26fdb1 fix(parsers|test_api): fix failed tests 2022-08-06 19:19:10 +02:00
Matthias Eck a5d2ae2588 feat(parsers): add generator for date parsing 2022-08-06 13:03:20 +02:00
Trenton Holmes e8868d7ebf Entirely removes the optipng, updates ghostscript fall back to also use WebP. Updates the conversion to use a multiprocessing pool 2022-06-11 08:38:49 -07:00
Michael Shamoon 58f2c6a5fc webp thumbnail support with png fallback 2022-06-10 02:28:13 -07:00
shamoon 536576518e Merge pull request #721 from paperless-ngx/bug-fix-date-ignore
Fix Ignore Date Parsing
2022-05-10 16:45:58 -07:00
Trenton Holmes 5b96944940 Updates the ignore date parsing to utilize the settings defined date order, instead of guessing a bit 2022-05-08 16:57:35 -07:00
Trenton Holmes 8a6aaf4e2d Adds additional testing for both date parsing and consumed document created date 2022-05-08 16:57:35 -07:00
Trenton Holmes 3003bdd507 Runs pyupgrade to Python 3.8+ and adds a hook for it 2022-05-06 09:04:08 -07:00
Fantasticle 0baacbef98 update new regex pattern for second boundary 2022-03-31 09:37:15 +02:00
fantasticle 1ecb26a3fb Update regex date match patterns 2022-03-30 12:19:30 +02:00
Simon Siebert 54cbacf4f4 Update parsers.py and test_consumer.py 2022-03-14 19:03:09 +01:00
Trenton Holmes 1771d18a21 Runs the pre-commit hooks over all the Python files 2022-03-11 11:34:28 -08:00
kpj fc695896dd Format Python code with black 2022-02-27 15:26:41 +01:00
jonaswinkler 40ce38254b fixes #631 2021-03-14 14:42:48 +01:00
jonaswinkler 416101d557 only import dateparser when required 2021-02-15 11:52:46 +01:00
jonaswinkler 8d6071e977 fix a bug with thumbnail generation when TIKA was enabled 2021-02-09 22:12:43 +01:00
jonaswinkler 431d4fd8e4 rework most of the logging 2021-02-05 01:10:29 +01:00
jonaswinkler bdc247ce49 code style 2021-02-02 23:58:25 +01:00
jonaswinkler 2faa425caf localization for websockets 2021-01-28 22:06:02 +01:00
jonaswinkler 868fd4155a bug fixes, test case fixes 2021-01-26 15:19:56 +01:00
jonaswinkler 05d69c0882 Merge branch 'dev' into feature-websockets-status 2021-01-23 22:22:17 +01:00
Jonas Winkler be94a8e49a Merge pull request #251 from jayme-github/ignore-date
Add option to ignore certain dates in parse_date
2021-01-05 00:19:13 +01:00
jonaswinkler 9f9581e1f8 Merge branch 'dev' into feature-websockets-status 2021-01-04 22:45:56 +01:00
jonaswinkler e97ff3d671 code style 2021-01-02 15:26:09 +01:00
jayme-github 654ee4e62e Add option to ignore certain dates in parse_date
PAPERLESS_IGNORE_DATES allows to specify a comma separated list of dates
to ignore during date parsing (from filename and content). This can be
used so specify dates that do appear often in documents but are usually
not the documents creation date (like your date of birth).
2021-01-02 15:20:49 +01:00
jonaswinkler 40ef375c15 supply file_name for tika parser 2021-01-01 22:19:43 +01:00
jonaswinkler c05bfb894a remove duplicate code 2021-01-01 21:50:45 +01:00
jonaswinkler 713985f259 fixes #218 2020-12-30 15:12:16 +01:00
jonaswinkler 5894060dc5 fixes #25 2020-12-15 13:52:35 +01:00
jonaswinkler 2f7bb01f34 moved metadata extraction to the parsers 2020-12-10 14:57:53 +01:00
jonaswinkler 522ada88ea Merge branch 'dev' into feature-websockets-status 2020-12-06 22:53:54 +01:00
jonaswinkler 4548cf08c7 fixes #78 2020-12-02 18:00:49 +01:00
jonaswinkler 834352130c checking file types against parsers in the consumer. 2020-12-01 15:26:05 +01:00
jonaswinkler aaa6599283 Merge branch 'dev' into feature-ocrmypdf 2020-11-30 16:48:09 +01:00
jonaswinkler f51207fc32 added file type checks to the parsers to prevent temporary files from being consumed. Also: parsers announce file types they wish to use as default for each mime type. 2020-11-30 00:40:04 +01:00
Jonas Winkler df801d17e1 reworked the interface of the parsers. 2020-11-25 19:36:39 +01:00
Jonas Winkler 2d559d330d reworked PDF parser that uses OCRmyPDF and produces archive files. 2020-11-25 14:50:43 +01:00
Jonas Winkler 8069c2eb6a add support for archive files. 2020-11-25 14:47:17 +01:00
Jonas Winkler d252a1dcda Merge branch 'dev' into celery-tasks 2020-11-22 22:49:37 +01:00
Jonas Winkler b44f8383e4 code cleanup 2020-11-21 14:03:45 +01:00
Jonas Winkler 41650f20f4 mime type handling 2020-11-20 13:31:03 +01:00
Jonas Winkler 17430210a1 Merge branch 'dev' into celery-tasks 2020-11-19 22:10:57 +01:00
Jonas Winkler c487e5f017 a new setting that allows you to skip thumbnail optimization. 2020-11-18 22:42:05 +01:00
Jonas Winkler 8908bc259e updated logging, logging for the mail consumer to see whats happening 2020-11-18 13:23:30 +01:00