Commit Graph

125 Commits

Author SHA1 Message Date
Michael Eischer 350f29d921 data: replace Tree with TreeNodeIterator
The TreeNodeIterator decodes nodes while iterating over a tree blob.
This should reduce peak memory usage as now only the serialized tree
blob and a single node have to alive at the same time. Using the
iterator has implications for the error handling however. Now it is
necessary that all loops that iterate through a tree check for errors
before using the node returned by the iterator.

The other change is that it is no longer possible to iterate over a tree
multiple times. Instead it must be loaded a second time. This only
affects the tree rewriting code.
2026-01-31 20:03:38 +01:00
Michael Eischer 1e183509d4 data: rework StreamTrees to use synchronous callbacks
The tree.Nodes will be replaced by an iterator to loads and serializes
tree node ondemand. Thus, the processing moves from StreamTrees into the
callback. Schedule them onto the workers used by StreamTrees for proper
load distribution.
2026-01-31 20:03:38 +01:00
Winfried Plappert ce57961f14 restic check with snapshot filters (#5469)
---------

Co-authored-by: Michael Eischer <michael.eischer@fau.de>
2025-11-28 19:12:38 +00:00
Michael Eischer 84dda4dc74 check: use AssociatedBlobSet 2025-11-26 20:59:39 +01:00
Michael Eischer f0955fa931 repository: add Checker() method to repository to replace unchecked cast 2025-10-03 19:34:33 +02:00
Michael Eischer 82971ad7f0 check: split index/pack check into repository package 2025-10-03 19:34:32 +02:00
Michael Eischer bfc2ce97fd check: don't keep extra MasterIndex reference 2025-10-03 19:32:15 +02:00
Michael Eischer 56ac8360c7 data: split node and snapshot code from restic package 2025-10-03 19:10:39 +02:00
Michael Eischer 52eb66929f repository: deduplicate index progress bar initializaton 2025-10-03 18:55:46 +02:00
Michael Eischer ca1e5e10b6 add proper constants for node type 2024-08-31 18:20:01 +02:00
Michael Eischer 6024597028 drop support for s3legacy layout 2024-08-31 17:25:24 +02:00
Michael Eischer 943b6ccfba index: remove support for legacy index format 2024-08-31 17:12:43 +02:00
Michael Eischer c9a4a95848 check: suggest using repair packs to repair truncated pack files
Previously, that help message was only shown for running `check
--read-data`.
2024-07-05 20:04:25 +02:00
Michael Eischer 5e0ea8fcfa pack: move to repository package 2024-05-25 13:13:03 +02:00
Michael Eischer 50ec408302 index: move to repository package 2024-05-25 13:13:03 +02:00
Michael Eischer 447b486c20 index: deduplicate index loading of check and repository 2024-05-24 21:33:17 +02:00
Michael Eischer 1266a4932f repository: fix parameter order of LookupBlobSize
All methods should use blobType followed by ID.
2024-05-24 21:33:17 +02:00
Michael Eischer 4df887406f repository: inline MasterIndex interface into Repository interface 2024-05-24 21:33:17 +02:00
Michael Eischer 94e863885c check: move verification of individual pack file to repository 2024-05-18 21:42:50 +02:00
Michael Eischer a1ca5e15c4 migrations: add temporary hack for s3_layout
The migration will be removed after the next restic release anyways.
Thus, there's no need for a clean implementation.
2024-05-18 21:38:31 +02:00
Michael Eischer 74d90653e0 check: use ReadFull to load pack header in checkPack
This ensures that the pack header is actually read completely.
Previously, for a truncated file it was possible to only read a part of
the header, as backend.Load(...) is not guaranteed to return as many
bytes as requested by the length parameter.
2024-05-18 21:28:54 +02:00
Michael Eischer ff0744b3af check: test checkPack retries 2024-05-18 21:28:54 +02:00
Michael Eischer 3ff063e913 check: verify pack a second time if broken 2024-05-18 21:28:54 +02:00
Michael Eischer e401af07b2 check: fix error message formatting 2024-05-18 21:28:54 +02:00
Michael Eischer ffe5439149 Merge pull request #4605 from MichaelEischer/better-restorer-error-handling
Rework repository.StreamPacks & better restorer error handling
2024-05-01 16:37:41 +02:00
Michael Eischer 940a3159b5 let index.Each() and pack.Size() return error on canceled context
This forces a caller to actually check that the function did complete.
2024-04-22 22:39:32 +02:00
Michael Eischer 666a0b0bdb repository: streamPack: replace streaming with chunked download
Due to the interface of streamPack, we cannot guarantee that operations
progress fast enough that the underlying connections remains open. This
introduces partial failures which massively complicate the error
handling.

Switch to a simpler approach that retrieves the pack in chunks of 32MB.
If a blob is larger than this limit, then it is downloaded separately.

To avoid multiple copies in memory, an auxiliary interface
`discardReader` is introduced that allows directly accessing the
downloaded byte slices, while still supporting the streaming used by the
`check` command.
2024-04-22 21:21:23 +02:00
Michael Eischer ed4a4f8748 check: exclude inaccessible files from the repair pack suggestion 2024-02-12 20:25:15 +01:00
Michael Eischer 4073299a7c check: fix missing error if blob is invalid 2024-02-12 20:20:13 +01:00
Michael Eischer 22a3cea1b3 checker: wrap all pack errors in ErrPackData 2024-02-12 20:19:32 +01:00
Alexander Neumann c0514dd8ba Fix linter errors (except for tests) 2024-02-10 22:58:10 +01:00
Michael Eischer 246559e654 check: cleanup s3 legacy detection 2024-01-27 13:02:04 +01:00
Michael Eischer bfb56b78e1 replace some usages of restic.Repository with more specific interface
This should eventually make it easier to test the code.
2024-01-27 13:02:02 +01:00
Michael Eischer 22d0c3f8dc check: Use PackBlobIterator instead of StreamPack
To only stream the content of a pack file once, check used StreamPack
with a custom pack load function. This combination was always brittle
and complicates using StreamPack everywhere else. Now that StreamPack
internally uses PackBlobIterator use that primitive instead, which is a
much better fit for what the check command requires.
2024-01-19 21:40:36 +01:00
Andrea Gelmini 241916d55b Fix typos 2023-12-06 13:11:55 +01:00
Michael Eischer c7b770eb1f convert MemorizeList to be repository based
Ideally, code that uses a repository shouldn't directly interact with
the underlying backend. Thus, move MemorizeList one layer up.
2023-10-25 23:01:35 +02:00
Michael Eischer 1b8a67fe76 move Backend interface to backend package 2023-10-25 23:00:18 +02:00
Michael Eischer a28940ea29 check: Suggest usage of restic repair packs for corrupted blobs
For now, the guide is only shown if the blob content does not match its
hash. The main intended usage is to handle data corruption errors when
using maximum compression in restic 0.16.0
2023-10-23 18:36:28 +02:00
Michael Eischer 91aef00df3 check: add index loading progress bar 2023-10-01 19:55:29 +02:00
Michael Eischer 3fd0ad7448 repository: list index files only once 2023-10-01 19:53:26 +02:00
greatroar 1678392a6d checker: Make ErrLegacyLayout a value, not a type 2022-12-17 09:41:07 +01:00
Michael Eischer ff7ef5007e Replace most usages of ioutil with the underlying function
The ioutil functions are deprecated since Go 1.17 and only wrap another
library function. Thus directly call the underlying function.

This commit only mechanically replaces the function calls.
2022-12-02 19:36:43 +01:00
greatroar 09c14f33c8 internal/checker: Pass Error.Error pointer receiver 2022-10-14 14:13:32 +02:00
Michael Eischer 2e3f1c08c5 repository: split index into a separate package 2022-10-08 21:15:34 +02:00
Michael Eischer ddcf549eba repository: remove IsMixedPack and add replacement for checker
Repositories with mixed packs are probably quite rare by now. When
loading data blobs from a mixed pack file, this will no longer trigger
caching that file. However, usually tree blobs are accessed first such
that this shouldn't make much of a difference.

The checker gets a simpler replacement.
2022-10-03 12:03:59 +02:00
Michael Eischer 1ebd57247a repository: optimize MasterIndex.Each
Sending data through a channel at very high frequency is extremely
inefficient. Thus use simple callbacks instead of channels.

> name                old time/op  new time/op  delta
> MasterIndexEach-16   6.68s ±24%   0.96s ± 2%  -85.64%  (p=0.008 n=5+5)
2022-09-24 12:21:59 +02:00
Michael Eischer 04e49924fb checker: Fix S3 legacy layout detection 2022-07-23 11:19:32 +02:00
Michael Eischer fcb3ddf181 check: Complain about usage of s3 legacy layout 2022-07-23 11:19:32 +02:00
Michael Eischer 8b8bd4e8ac check: complain about mixed pack files 2022-07-23 11:19:32 +02:00
Michael Eischer 6f53ecc1ae adapt workers based on whether an operation is CPU or IO-bound
Use runtime.GOMAXPROCS(0) as worker count for CPU-bound tasks,
repo.Connections() for IO-bound task and a combination if a task can be
both. Streaming packs is treated as IO-bound as adding more worker
cannot provide a speedup.

Typical IO-bound tasks are download / uploading / deleting files.
Decoding / Encoding / Verifying are usually CPU-bound. Several tasks are
a combination of both, e.g. for combined download and decode functions.
In the latter case add both limits together. As the backends have their
own concurrency limits restic still won't download more than
repo.Connections() files in parallel, but the additional workers can
decode already downloaded data in parallel.
2022-07-03 12:19:26 +02:00