Commit Graph

497 Commits

Author SHA1 Message Date
Michael Eischer 5c935e71fa index: also preallocate hashed array tree 2026-05-10 00:35:17 +02:00
Michael Eischer 934c615e51 index: support index preallocation 2026-05-10 00:35:17 +02:00
Michael Eischer ba638b6602 indexmap: use bloom filter to drastically speed up check for unknown blobs
Only in use on 64-bit systems. Use the upper 28bits of the id of an
index entry as bloom filter. This allows skipping the index entry
traversal most of the time if an id is not stored in the hashmap.

The bloom filter embedded in the index entry id is check each time
before following a reference to an index entry. This further reduces
the risk of false positives. The bloom filter itself is basically for
free on modern CPUs.

The main performance cost of checking for unknown blobs in the index are
the essentially random RAM accesses for the initial bucket lookup as
well as following the next pointer in the index entries. With the bloom
filter most of the time only the initial bucket lookup is necessary.

This speeds up checking for unknown blobs by a factor 5 (!), while
having no effect on the lookup of known blobs:

$ benchstat no-bloom with-bloom
name                old time/op  new time/op  delta
IndexHasUnknown-16  49.0ms ± 2%   9.9ms ± 7%  -79.70%  (p=0.000 n=10+10)
IndexHasKnown-16    48.0ms ± 3%  47.9ms ± 3%     ~     (p=0.968 n=10+9)

This bloom filter parameters m=28 k=1 were derived empirically, while
also leaving sufficient room for very large repositories. Before this
commit, the final merge index step took roughly 1 second per million
index entries. With the chosen bloom filter parameters, it would
currently take 19 hours to just merge such an index. It is safe to
assume that such large repositories don't exist.

Comparison with other parameter sets:

$ m=28 k=1 versus m=32 k=1
name                old time/op  new time/op  delta
IndexHasUnknown-16  49.0ms ± 2%   9.7ms ±16%  -80.17%  (p=0.000 n=10+10)
IndexHasKnown-16    48.0ms ± 3%  48.4ms ± 3%     ~     (p=0.436 n=10+10)

$ m=28 k=1 versus m=24 k=1
name                old time/op  new time/op  delta
IndexHasUnknown-16  49.0ms ± 2%  10.8ms ±13%  -77.90%  (p=0.000 n=10+10)
IndexHasKnown-16    48.0ms ± 3%  47.9ms ± 3%     ~     (p=0.684 n=10+10)

$ m=28 k=1 versus m=28 k=2
name                old time/op  new time/op  delta
IndexHasUnknown-16  49.0ms ± 2%  24.9ms ± 5%  -49.27%  (p=0.000 n=10+10)
IndexHasKnown-16    48.0ms ± 3%  48.0ms ± 4%     ~     (p=1.000 n=10+10)

`k=2` outright wrecks the performance. This is most likely the case as
it performs worse on longer index entry chains, which also happen to be
the expensive ones to process.

`m=32` yields diminishing returns, while getting within an order of
magnitude of the largest known restic repositories.

Design alternatives:

In principle it would be possible to add a single large bloom filter
instead of embedding them in the index entry ids. However, this bloom
filter would necessarily incur additional random memory accesses and
thus slow things down overall.
2026-05-10 00:35:17 +02:00
Michael Eischer 320f709fbc index: modernize masterindex tests
`b.Loop()` drastically shortens benchmark execution times for tests with
an expensive initialization phase as it only has to happen once now.
2026-05-10 00:35:17 +02:00
Michael Eischer e33ed5d0c1 index: make tests more representative 2026-05-10 00:35:17 +02:00
Michael Eischer 4c0dc9e202 index: support incremental index loading
Do not require a full index reload if only a few additional index files
have been added. This can drastically speed up loading the index in the
mount command.
2026-05-07 22:52:03 +02:00
Michael Eischer d1937a530b clarify pack ID in decryption error (#5710)
pack ID is included in full. In addition, the error message now says
that it is a pack file.
2026-02-18 20:43:10 +01:00
Michael Eischer f84d398989 repository: prevent test deadlock within WithBlobUploader
Calling t.Fatal internally triggers runtime.Goexit . This kills the
current goroutine while only running deferred code. Add an extra context
that gets canceled if the go routine exits while within the user
provided callback.
2026-01-31 19:18:36 +01:00
Michael Eischer f3a89bfff6 Merge pull request #5612 from MichaelEischer/repository-async-saveblob
repository: add async blob upload method
2025-11-26 21:34:35 +01:00
Michael Eischer 5cc8636047 Merge pull request #5614 from MichaelEischer/fix-lookupblobsize
repository: fix LookupBlobSize to also return pending blobs
2025-11-26 21:24:32 +01:00
Michael Eischer 5607fd759f repository: fix race condition for blobSaver shutdown
wg.Go() may not be called after wg.Wait(). This prevents connecting two
errgroups such that the errors are propagated between them if the child
errgroup dynamically starts goroutines. Instead use just a single errgroup,
and sequence the shutdown using a sync.WaitGroup. This is far simpler
and does not require any "clever" tricks.
2025-11-26 21:18:22 +01:00
Michael Eischer 9f87e9096a repository: add tests for SaveBlobAsync 2025-11-26 21:18:22 +01:00
Michael Eischer 046b0e711d repository: add SaveBlobAsync method 2025-11-26 21:18:21 +01:00
Michael Eischer 07d090f233 repository: expose AssociatedBlobSet via repository interface 2025-11-26 20:59:08 +01:00
Michael Eischer 0f05277b47 index: add sub and intersect method to AssociatedSet 2025-11-26 20:59:08 +01:00
Michael Eischer f1aabdd293 index: add test for pending blobs 2025-11-23 18:08:56 +01:00
Michael Eischer 50d376c543 repository: fix LookupBlobSize to also report pending blobs 2025-11-23 17:55:13 +01:00
Michael Eischer cf409b7c66 automatically batch snapshots in copy 2025-11-23 17:40:37 +01:00
Michael Eischer 405813f250 repository: fix LookupBlobSize to also report pending blobs 2025-11-23 17:09:07 +01:00
Michael Eischer 81d8bc4ade repository: replace CopyBlobs with Repack implementation 2025-11-23 16:06:29 +01:00
Winfried Plappert b24b088978 restic copy --batch: The mighty linter
I cave in - no double comment
2025-11-19 07:34:39 +00:00
Winfried Plappert fc3de018bc restic copy --batch - fussy linter
internal/repository/repack.go: I have to please the mighty linter.
2025-11-19 07:29:09 +00:00
Winfried Plappert b87f7586e4 restic copy --batch: a fresh start from commit 382616747
Instead of rebasing my code, I decided to start fresh, since WithBlobUploader()
has been introduced.

changelog/unreleased/issue-5453:
doc/045_working_with_repos.rst:
the usual

cmd/restic/cmd_copy.go:
gather all snaps to be collected - collectAllSnapshots()
run overall copy step - func copyTreeBatched()
helper copySaveSnapshot() to save the corresponding snapshot

internal/repository/repack.go:
introduce wrapper CopyBlobs(), which passes parameter `uploader restic.BlobSaver` from
WithBlobUploader() via copyTreeBatched() to repack().

internal/backend/local/local_windows.go:
I did not touch it, but gofmt did: whitespace
2025-11-19 07:09:24 +00:00
Michael Eischer 14f3bc8232 Merge pull request #5560 from MichaelEischer/index-iterators
index: port to  modern Go iterators
2025-11-16 13:24:48 +01:00
Michael Eischer b587c126e0 Fix linter warning 2025-11-16 12:56:37 +01:00
Michael Eischer 9944ef7a7c index: convert AssociatedSet to go iterators 2025-11-16 12:56:37 +01:00
Michael Eischer 38c543457e index: convert to implement modern go iterators 2025-11-16 12:56:37 +01:00
Michael Eischer a0925fa922 repository: set progress bar maximum in Repack 2025-11-16 12:51:46 +01:00
Michael Eischer b2afccbd96 repository: remove unused obsoletePacks return values from Repack 2025-11-16 12:51:46 +01:00
Michael Eischer 0624b656b8 Merge pull request #5558 from MichaelEischer/simplify-blob-upload
repository: enforce correct usage of SaveBlob
2025-11-16 12:51:01 +01:00
Michael Eischer c6e33c3954 repository: enforce that SaveBlob is called within WithBlobUploader
This is achieved by removing SaveBlob from the public API and only
returning it via a uploader object that is passed in by
WithBlobUploader.
2025-10-12 18:26:26 +02:00
Michael Eischer ac4642b479 repository: replace StartPackUploader+Flush with WithBlobUploader
The new method combines both step into a single wrapper function. Thus
it ensures that both are always called in pairs. As an additional
benefit this slightly reduces the boilerplate to upload blobs.
2025-10-08 22:49:45 +02:00
Michael Eischer b7bbb408ee check: refactor pack selection for read data
Drop the `packs` map from the internal state of the checker. Instead the
Packs(...) method now calls a filter callback that can select the
packs intended for checking.
2025-10-03 23:45:05 +02:00
Michael Eischer 4426dfe6a9 repository: replace SetIndex method with internal loadIndexWithCallback method 2025-10-03 19:36:57 +02:00
Michael Eischer f0955fa931 repository: add Checker() method to repository to replace unchecked cast 2025-10-03 19:34:33 +02:00
Michael Eischer 189b295c30 repository: add dedicated test helper 2025-10-03 19:34:33 +02:00
Michael Eischer 82971ad7f0 check: split index/pack check into repository package 2025-10-03 19:34:32 +02:00
Michael Eischer 56ac8360c7 data: split node and snapshot code from restic package 2025-10-03 19:10:39 +02:00
Michael Eischer 52eb66929f repository: deduplicate index progress bar initializaton 2025-10-03 18:55:46 +02:00
Michael Eischer b6c50662da repository: don't ignore cache clearing error 2025-10-03 18:22:42 +02:00
Michael Eischer 4dc71f24c5 backends: pass error logger to backends 2025-10-03 18:22:42 +02:00
Michael Eischer 1c7bb15327 Merge pull request #5451 from greatroar/concurrency
Concurrency simplifications
2025-09-24 22:22:40 +02:00
Michael Eischer 4edfd36c8f Merge pull request #5363 from zmanda/fix-gh-5258-backup-exits-with-wrong-code-on-ctrl-c
bugfix: fatal errors do not keep underlying error
2025-09-24 22:04:38 +02:00
Michael Eischer 88bdf20bd8 Reduce linter ignores 2025-09-21 22:24:27 +02:00
Michael Eischer 60d80a6127 Fix linter warnings 2025-09-21 22:24:15 +02:00
Srigovind Nayak ce089f7e2d errors: standardize error wrapping for Fatal errors
* replace all occurences of  `errors.Fatal(err.Error())` with `errors.Fatalf("%s", err)` so that the error wrapping is correct across the codebase

* updated the review comments
2025-09-13 23:32:40 +05:30
Michael Eischer de29d74707 check: fix error reporting on download retry 2025-09-08 11:45:28 +02:00
greatroar 2c39b1f84f internal/repository/index: Simplify MasterIndex concurrency 2025-07-18 15:06:37 +02:00
Srigovind Nayak f13e9c10a4 Add support for additional compression levels fastest and better (#5321)
* repository: expose addtional compression levels

* adding better and fastest compression levels for zstd

* repository: add changelog entry for issue-4728

* chore: fix golint issues

* chore: sort compression modes in the help text

* updating review comments
2025-03-31 21:21:12 +02:00
Michael Eischer 13cb90b83a Merge pull request #5295 from MichaelEischer/randomize-pack-order
Randomize blob to pack file assignment
2025-03-25 18:13:49 +01:00