Commit Graph

31 Commits

Author SHA1 Message Date
Michael Eischer 5c935e71fa index: also preallocate hashed array tree 2026-05-10 00:35:17 +02:00
Michael Eischer 934c615e51 index: support index preallocation 2026-05-10 00:35:17 +02:00
Michael Eischer ba638b6602 indexmap: use bloom filter to drastically speed up check for unknown blobs
Only in use on 64-bit systems. Use the upper 28bits of the id of an
index entry as bloom filter. This allows skipping the index entry
traversal most of the time if an id is not stored in the hashmap.

The bloom filter embedded in the index entry id is check each time
before following a reference to an index entry. This further reduces
the risk of false positives. The bloom filter itself is basically for
free on modern CPUs.

The main performance cost of checking for unknown blobs in the index are
the essentially random RAM accesses for the initial bucket lookup as
well as following the next pointer in the index entries. With the bloom
filter most of the time only the initial bucket lookup is necessary.

This speeds up checking for unknown blobs by a factor 5 (!), while
having no effect on the lookup of known blobs:

$ benchstat no-bloom with-bloom
name                old time/op  new time/op  delta
IndexHasUnknown-16  49.0ms ± 2%   9.9ms ± 7%  -79.70%  (p=0.000 n=10+10)
IndexHasKnown-16    48.0ms ± 3%  47.9ms ± 3%     ~     (p=0.968 n=10+9)

This bloom filter parameters m=28 k=1 were derived empirically, while
also leaving sufficient room for very large repositories. Before this
commit, the final merge index step took roughly 1 second per million
index entries. With the chosen bloom filter parameters, it would
currently take 19 hours to just merge such an index. It is safe to
assume that such large repositories don't exist.

Comparison with other parameter sets:

$ m=28 k=1 versus m=32 k=1
name                old time/op  new time/op  delta
IndexHasUnknown-16  49.0ms ± 2%   9.7ms ±16%  -80.17%  (p=0.000 n=10+10)
IndexHasKnown-16    48.0ms ± 3%  48.4ms ± 3%     ~     (p=0.436 n=10+10)

$ m=28 k=1 versus m=24 k=1
name                old time/op  new time/op  delta
IndexHasUnknown-16  49.0ms ± 2%  10.8ms ±13%  -77.90%  (p=0.000 n=10+10)
IndexHasKnown-16    48.0ms ± 3%  47.9ms ± 3%     ~     (p=0.684 n=10+10)

$ m=28 k=1 versus m=28 k=2
name                old time/op  new time/op  delta
IndexHasUnknown-16  49.0ms ± 2%  24.9ms ± 5%  -49.27%  (p=0.000 n=10+10)
IndexHasKnown-16    48.0ms ± 3%  48.0ms ± 4%     ~     (p=1.000 n=10+10)

`k=2` outright wrecks the performance. This is most likely the case as
it performs worse on longer index entry chains, which also happen to be
the expensive ones to process.

`m=32` yields diminishing returns, while getting within an order of
magnitude of the largest known restic repositories.

Design alternatives:

In principle it would be possible to add a single large bloom filter
instead of embedding them in the index entry ids. However, this bloom
filter would necessarily incur additional random memory accesses and
thus slow things down overall.
2026-05-10 00:35:17 +02:00
Michael Eischer 320f709fbc index: modernize masterindex tests
`b.Loop()` drastically shortens benchmark execution times for tests with
an expensive initialization phase as it only has to happen once now.
2026-05-10 00:35:17 +02:00
Michael Eischer e33ed5d0c1 index: make tests more representative 2026-05-10 00:35:17 +02:00
Michael Eischer 4c0dc9e202 index: support incremental index loading
Do not require a full index reload if only a few additional index files
have been added. This can drastically speed up loading the index in the
mount command.
2026-05-07 22:52:03 +02:00
Michael Eischer 5cc8636047 Merge pull request #5614 from MichaelEischer/fix-lookupblobsize
repository: fix LookupBlobSize to also return pending blobs
2025-11-26 21:24:32 +01:00
Michael Eischer 0f05277b47 index: add sub and intersect method to AssociatedSet 2025-11-26 20:59:08 +01:00
Michael Eischer f1aabdd293 index: add test for pending blobs 2025-11-23 18:08:56 +01:00
Michael Eischer 50d376c543 repository: fix LookupBlobSize to also report pending blobs 2025-11-23 17:55:13 +01:00
Michael Eischer 405813f250 repository: fix LookupBlobSize to also report pending blobs 2025-11-23 17:09:07 +01:00
Michael Eischer b587c126e0 Fix linter warning 2025-11-16 12:56:37 +01:00
Michael Eischer 9944ef7a7c index: convert AssociatedSet to go iterators 2025-11-16 12:56:37 +01:00
Michael Eischer 38c543457e index: convert to implement modern go iterators 2025-11-16 12:56:37 +01:00
Michael Eischer f0955fa931 repository: add Checker() method to repository to replace unchecked cast 2025-10-03 19:34:33 +02:00
Michael Eischer 189b295c30 repository: add dedicated test helper 2025-10-03 19:34:33 +02:00
Michael Eischer 56ac8360c7 data: split node and snapshot code from restic package 2025-10-03 19:10:39 +02:00
Michael Eischer 1c7bb15327 Merge pull request #5451 from greatroar/concurrency
Concurrency simplifications
2025-09-24 22:22:40 +02:00
Michael Eischer 88bdf20bd8 Reduce linter ignores 2025-09-21 22:24:27 +02:00
greatroar 2c39b1f84f internal/repository/index: Simplify MasterIndex concurrency 2025-07-18 15:06:37 +02:00
Martin Smith 6e45c51509 Fix name including package name and variable shadowing package. 2025-03-23 10:01:19 +00:00
Martin Smith 3788605127 Rename unused parameters to '_'. 2025-03-22 18:20:30 +00:00
Michael Eischer 39e63ee4e3 index: add tests for oversized index handling 2025-02-16 17:42:00 +01:00
Michael Eischer 3b8d15d651 index: rewrite oversized indexes 2025-02-16 17:03:14 +01:00
Michael Eischer 2fd8a3865c index: automatically write full indexes in StorePack 2025-02-16 16:39:38 +01:00
Michael Eischer 99e105eeb6 repository: restrict SaveUnpacked and RemoveUnpacked
Those methods now only allow modifying snapshots. Internal data types
used by the repository are now read-only. The repository-internal code
can bypass the restrictions by wrapping the repository in an
`internalRepository` type.

The restriction itself is implemented by using a new datatype
WriteableFileType in the SaveUnpacked and RemoveUnpacked methods. This
statically ensures that code cannot bypass the access restrictions.

The test changes are somewhat noisy as some of them modify repository
internals and therefore require some way to bypass the access
restrictions. This works by capturing an `internalRepository` or
`Backend` when creating the Repository using a test helper function.
2025-01-13 22:39:57 +01:00
greatroar b5c28a7ba2 internal/restic: Use IDSet.Clone + use maps package
One place where IDSet.Clone is useful was reinventing it, using a
conversion to list, a sort, and a conversion back to map.

Also, use the stdlib "maps" package to implement as much of IDSet as
possible. This requires changing one caller, which assumed that cloning
nil would return a non-nil IDSet.
2024-10-03 21:14:29 +02:00
Michael Eischer 943b6ccfba index: remove support for legacy index format 2024-08-31 17:12:43 +02:00
Srigovind Nayak 068d5b95c3 rewrite: skip saving empty indexes during MasterIndex.Rewrite 2024-08-03 23:34:59 +05:30
Viktor Szépe ac00229386 Fix typos 2024-07-03 20:02:06 +02:00
Michael Eischer 50ec408302 index: move to repository package 2024-05-25 13:13:03 +02:00