Commit Graph

2641 Commits

Author SHA1 Message Date
Michael Eischer 5c935e71fa index: also preallocate hashed array tree 2026-05-10 00:35:17 +02:00
Michael Eischer 934c615e51 index: support index preallocation 2026-05-10 00:35:17 +02:00
Michael Eischer ba638b6602 indexmap: use bloom filter to drastically speed up check for unknown blobs
Only in use on 64-bit systems. Use the upper 28bits of the id of an
index entry as bloom filter. This allows skipping the index entry
traversal most of the time if an id is not stored in the hashmap.

The bloom filter embedded in the index entry id is check each time
before following a reference to an index entry. This further reduces
the risk of false positives. The bloom filter itself is basically for
free on modern CPUs.

The main performance cost of checking for unknown blobs in the index are
the essentially random RAM accesses for the initial bucket lookup as
well as following the next pointer in the index entries. With the bloom
filter most of the time only the initial bucket lookup is necessary.

This speeds up checking for unknown blobs by a factor 5 (!), while
having no effect on the lookup of known blobs:

$ benchstat no-bloom with-bloom
name                old time/op  new time/op  delta
IndexHasUnknown-16  49.0ms ± 2%   9.9ms ± 7%  -79.70%  (p=0.000 n=10+10)
IndexHasKnown-16    48.0ms ± 3%  47.9ms ± 3%     ~     (p=0.968 n=10+9)

This bloom filter parameters m=28 k=1 were derived empirically, while
also leaving sufficient room for very large repositories. Before this
commit, the final merge index step took roughly 1 second per million
index entries. With the chosen bloom filter parameters, it would
currently take 19 hours to just merge such an index. It is safe to
assume that such large repositories don't exist.

Comparison with other parameter sets:

$ m=28 k=1 versus m=32 k=1
name                old time/op  new time/op  delta
IndexHasUnknown-16  49.0ms ± 2%   9.7ms ±16%  -80.17%  (p=0.000 n=10+10)
IndexHasKnown-16    48.0ms ± 3%  48.4ms ± 3%     ~     (p=0.436 n=10+10)

$ m=28 k=1 versus m=24 k=1
name                old time/op  new time/op  delta
IndexHasUnknown-16  49.0ms ± 2%  10.8ms ±13%  -77.90%  (p=0.000 n=10+10)
IndexHasKnown-16    48.0ms ± 3%  47.9ms ± 3%     ~     (p=0.684 n=10+10)

$ m=28 k=1 versus m=28 k=2
name                old time/op  new time/op  delta
IndexHasUnknown-16  49.0ms ± 2%  24.9ms ± 5%  -49.27%  (p=0.000 n=10+10)
IndexHasKnown-16    48.0ms ± 3%  48.0ms ± 4%     ~     (p=1.000 n=10+10)

`k=2` outright wrecks the performance. This is most likely the case as
it performs worse on longer index entry chains, which also happen to be
the expensive ones to process.

`m=32` yields diminishing returns, while getting within an order of
magnitude of the largest known restic repositories.

Design alternatives:

In principle it would be possible to add a single large bloom filter
instead of embedding them in the index entry ids. However, this bloom
filter would necessarily incur additional random memory accesses and
thus slow things down overall.
2026-05-10 00:35:17 +02:00
Michael Eischer 320f709fbc index: modernize masterindex tests
`b.Loop()` drastically shortens benchmark execution times for tests with
an expensive initialization phase as it only has to happen once now.
2026-05-10 00:35:17 +02:00
Michael Eischer e33ed5d0c1 index: make tests more representative 2026-05-10 00:35:17 +02:00
Michael Eischer 39084a912e Merge pull request #5700 from MichaelEischer/err-invalid-env 2026-05-10 00:18:40 +02:00
Michael Eischer 4c0dc9e202 index: support incremental index loading
Do not require a full index reload if only a few additional index files
have been added. This can drastically speed up loading the index in the
mount command.
2026-05-07 22:52:03 +02:00
Michael Eischer 7077500a3b Have backup -vv mention compressed size of added files (#5669)
ui: mention compressed size of added files in `backup -vv`

This is already shown for modified files, but the added files message
wasn't updated when compression was implemented in restic.

Co-authored-by: Ilya Grigoriev <ilyagr@users.noreply.github.com>
2026-02-18 21:24:29 +01:00
Michael Eischer d1937a530b clarify pack ID in decryption error (#5710)
pack ID is included in full. In addition, the error message now says
that it is a pack file.
2026-02-18 20:43:10 +01:00
Michael Eischer 5be6d9c73f fail of RESTIC_READ_CONCURRENCY or RESTIC_COMPRESSION are invalid 2026-02-01 15:57:07 +01:00
gunar 7101f11133 Fail fast for invalid RESTIC_PACK_SIZE env values (#5592)
Co-authored-by: Michael Eischer <michael.eischer@fau.de>
2026-02-01 15:45:31 +01:00
Michael Eischer 67c13c643d Merge pull request #5691 from MichaelEischer/fix-rewriter-error 2026-02-01 12:09:52 +01:00
Michael Eischer ef1d525f22 rewriter: return correct error if tree iteration fails 2026-01-31 22:07:07 +01:00
Michael Eischer 74d60ad223 rewriter: test KeepEmptyDirectory option 2026-01-31 22:01:23 +01:00
Michael Eischer 0d71f70a22 minor cleanups and typos 2026-01-31 22:01:23 +01:00
Winfried Plappert ee154ce0ab restic rewrite --include
added function Count() to the *TreeWriter methods
2026-01-31 19:58:29 +00:00
Winfried Plappert 5148608c39 restic rewrite include - based on restic 0.18.1
cmd/restic/cmd_rewrite.go:
introduction of include filters for this command:
- add include filters, add error checking code
- add new parameter 'keepEmptyDirectoryFunc' to 'walker.NewSnapshotSizeRewriter()',
  so empty directories have to be kept to keep the directory structure intact
- add parameter 'keepEmptySnapshot' to 'filterAndReplaceSnapshot()' to keep snapshots
  intact when nothing is to be included
- introduce helper function 'gatherIncludeFilters()' and 'gatherExcludeFilters()' to
  keep code flow clean

cmd/restic/cmd_rewrite_integration_test.go:
add several new tests around the 'include' functionality

internal/filter/include.go:
this is where is include filter is defined

internal/walker/rewriter.go:
- struct RewriteOpts gains field 'KeepEmtpyDirectory', which is a 'NodeKeepEmptyDirectoryFunc()'
  which defaults to nil, so that al subdirectories are kept
- function 'NewSnapshotSizeRewriter()' gains the parameter 'keepEmptyDirecoryFilter' which
  controls the management of empty subdirectories in case of include filters active

internal/data/tree.go:
gains a function Count() for checking the number if node elements in a newly built tree

internal/walker/rewriter_test.go:
function 'NewSnapshotSizeRewriter()' gets an additional parameter nil to keeps things happy

cmd/restic/cmd_repair_snapshots.go:
function 'filterAndReplaceSnapshot()' gets an additional parameter 'keepEmptySnapshot=nil'

doc/045_working_with_repos.rst:
gets to mention include filters

changelog/unreleased/issue-4278:
the usual announcement file

git rebase master -i produced this

restic rewrite include - keep linter happy

cmd/restic/cmd_rewrite_integration_test.go:
linter likes strings.Contain() better than my strings.Index() >= 0
2026-01-31 19:42:56 +00:00
Michael Eischer ce7c144aac data: add support for unknown keys to treeIterator
While not planned, it's also not completely impossible that a tree node
might get additional top-level fields. As the tree iterator is built
with a strict expectation of the top-level fields, this would result in
a parsing error. Future-proof the code by simply skipping unknown
fields.
2026-01-31 20:03:38 +01:00
Michael Eischer 81948937ca data: test DualTreeIterator 2026-01-31 20:03:38 +01:00
Michael Eischer fa8889eec4 data: test LoadTree+SaveTree cycle 2026-01-31 20:03:38 +01:00
Michael Eischer 6de64911fb data: test TreeFinder 2026-01-31 20:03:38 +01:00
Michael Eischer 17688c2313 data: move TestTreeMap to data package to allow reuse 2026-01-31 20:03:38 +01:00
Michael Eischer e1a5550a27 test: use generics in Equal function signature
This simplifies comparing a typed value against nil. Previously it was
necessary to case nil into the proper type.
2026-01-31 20:03:38 +01:00
Michael Eischer 24d56fe2a6 diff: switch to efficient DualTreeIterator
The previous implementation stored the whole tree in a map and used it
for checking overlap between trees. This is now replaced with the
DualTreeIterator, which iterates over two trees in parallel and returns
the merge stream in order. In case of overlap between both trees, it
returns both nodes at the same time. Otherwise, only a single node is
returned.
2026-01-31 20:03:38 +01:00
Michael Eischer 350f29d921 data: replace Tree with TreeNodeIterator
The TreeNodeIterator decodes nodes while iterating over a tree blob.
This should reduce peak memory usage as now only the serialized tree
blob and a single node have to alive at the same time. Using the
iterator has implications for the error handling however. Now it is
necessary that all loops that iterate through a tree check for errors
before using the node returned by the iterator.

The other change is that it is no longer possible to iterate over a tree
multiple times. Instead it must be loaded a second time. This only
affects the tree rewriting code.
2026-01-31 20:03:38 +01:00
Michael Eischer 1e183509d4 data: rework StreamTrees to use synchronous callbacks
The tree.Nodes will be replaced by an iterator to loads and serializes
tree node ondemand. Thus, the processing moves from StreamTrees into the
callback. Schedule them onto the workers used by StreamTrees for proper
load distribution.
2026-01-31 20:03:38 +01:00
Michael Eischer 25a5aa3520 dump: fix missing error handling if tree cannot be read 2026-01-31 19:18:36 +01:00
Michael Eischer 278e457e1f data: use data.TreeWriter to serialize&write data.Tree
Always serialize trees via TreeJSONBuilder. Add a wrapper called
TreeWriter which combines serialization and saving the tree blob in the
repository. In the future, TreeJSONBuilder will have to upload tree
chunks while the tree is still serialized. This will a wrapper like
TreeWriter, so add it right now already.

The archiver.treeSaver still directly uses the TreeJSONBuilder as it
requires special handling.
2026-01-31 19:18:36 +01:00
Michael Eischer f84d398989 repository: prevent test deadlock within WithBlobUploader
Calling t.Fatal internally triggers runtime.Goexit . This kills the
current goroutine while only running deferred code. Add an extra context
that gets canceled if the go routine exits while within the user
provided callback.
2026-01-31 19:18:36 +01:00
Michael Eischer d82ea53735 data: fix invalid trees used in test cases
data.TestCreateSnapshot which is used in particular by TestFindUsedBlobs
and TestFindUsedBlobs could generate trees with duplicate file names.
This is invalid and going forward will result in an error.
2026-01-31 19:18:36 +01:00
Michael Eischer 880b08f9ec Merge pull request #5627 from MichaelEischer/faster-files-writer
restore: tune fileswriter
2026-01-26 21:45:49 +01:00
Ilya Grigoriev 79c37f3d1a ui: mention compressed size of added files in backup -vv
This is already shown for modified files, but the added files message
wasn't updated when compression was implemented in restic.
2026-01-15 18:39:16 -08:00
Michael Eischer ebc51e60c9 Merge pull request #5626 from MichaelEischer/lazy-status
ui: only redraw status bar if it has not changed
2025-12-03 21:29:35 +01:00
Michael Eischer 1e6ed458ff remove old // +build comments 2025-11-30 11:53:23 +01:00
Michael Eischer 760d0220f4 restorer: scale file cache with workers count 2025-11-30 11:01:01 +01:00
Michael Eischer 24fcfeafcb restore: cache file descriptors
This avoid opening and closing files after each single blob write
2025-11-30 10:56:15 +01:00
Michael Eischer 0ee9360f3e restore: reduce contention while writing files 2025-11-29 23:09:04 +01:00
Michael Eischer ae6d6bd9a6 ui: only redraw status bar if it has not changed 2025-11-29 22:09:41 +01:00
Aneesh N b9afdf795e Fix: Correctly restore ACL inheritance state (#5465)
* Fix: Correctly restore ACL inheritance state

When restoring a file or directory on Windows, the `IsInherited` property of its Access Control Entries (ACEs) was always being set to `False`, even if the ACEs were inherited in the original backup.

This was caused by the restore process calling the `SetNamedSecurityInfo` API without providing context about the object's inheritance policy. By default, this API applies the provided Discretionary Access Control List (DACL) as an explicit set of permissions, thereby losing the original inheritance state.

This commit fixes the issue by inspecting the `Control` flags of the saved Security Descriptor during restore. Based on whether the `SE_DACL_PROTECTED` flag is present, the code now adds the appropriate `PROTECTED_DACL_SECURITY_INFORMATION` or `UNPROTECTED_DACL_SECURITY_INFORMATION` flag to the `SetNamedSecurityInfo` API call.

By providing this crucial inheritance context, the Windows API can now correctly reconstruct the ACL, ensuring the `IsInherited` status of each ACE is preserved as it was at the time of backup.

* Fix: Correctly restore ACL inheritance flags

This commit resolves an issue where the ACL inheritance state (`IsInherited` property) was not being correctly restored for files and directories on Windows.

The root cause was that the `SECURITY_INFORMATION` flags used in the `SetNamedSecurityInfo` API call contained both the `PROTECTED_DACL_SECURITY_INFORMATION` and `UNPROTECTED_DACL_SECURITY_INFORMATION` flags simultaneously. When faced with this conflicting information, the Windows API defaulted to the more restrictive `PROTECTED` behavior, incorrectly disabling inheritance on restored items.

The fix modifies the `setNamedSecurityInfoHigh` function to first clear all existing inheritance-related flags from the `securityInfo` bitmask. It then adds the single, correct flag (`PROTECTED` or `UNPROTECTED`) based on the `SE_DACL_PROTECTED` control bit from the original, saved Security Descriptor.

This ensures that the API receives unambiguous instructions, allowing it to correctly preserve the inheritance state as it was at the time of backup. The accompanying test case for ACL inheritance now passes with this change.

* Fix inheritance flag handling in low-privilege security descriptor restore

When restoring files without admin privileges, the IsInherited property
of Access Control Entries (ACEs) was not being preserved correctly.
The low-privilege restore path (setNamedSecurityInfoLow) was using a
static PROTECTED_DACL_SECURITY_INFORMATION flag, which always marked
the restored DACL as explicitly set rather than inherited.

This commit updates setNamedSecurityInfoLow to dynamically determine
the correct inheritance flag based on the SE_DACL_PROTECTED control
flag from the original security descriptor, matching the behavior of
the high-privilege path (setNamedSecurityInfoHigh).

Changes:
- Update setNamedSecurityInfoLow to accept control flags parameter
- Add logic to set either PROTECTED_DACL_SECURITY_INFORMATION or
  UNPROTECTED_DACL_SECURITY_INFORMATION based on the original SD
- Add TestRestoreSecurityDescriptorInheritanceLowPrivilege to verify
  inheritance is correctly restored in low-privilege scenarios

This ensures that both admin and non-admin restore operations correctly
preserve the inheritance state of ACLs, maintaining the original
permissions flow on child objects.

Addresses review feedback on PR for issue #5427

* Refactor security flags into separate backup/restore variants

Split highSecurityFlags into highBackupSecurityFlags and
highRestoreSecurityFlags to avoid runtime bitwise operations.
This makes the code cleaner and more maintainable by using
appropriate flags for GET vs SET operations.

Addresses review feedback on PR for issue #5427

---------

Co-authored-by: Aneesh Nireshwalia <anireshw@akamai.com>
2025-11-28 19:22:47 +00:00
Winfried Plappert ce57961f14 restic check with snapshot filters (#5469)
---------

Co-authored-by: Michael Eischer <michael.eischer@fau.de>
2025-11-28 19:12:38 +00:00
Michael Eischer f3a89bfff6 Merge pull request #5612 from MichaelEischer/repository-async-saveblob
repository: add async blob upload method
2025-11-26 21:34:35 +01:00
Michael Eischer 5cc8636047 Merge pull request #5614 from MichaelEischer/fix-lookupblobsize
repository: fix LookupBlobSize to also return pending blobs
2025-11-26 21:24:32 +01:00
Michael Eischer 6769d26068 archiver: improve test reliability 2025-11-26 21:21:16 +01:00
Michael Eischer 5607fd759f repository: fix race condition for blobSaver shutdown
wg.Go() may not be called after wg.Wait(). This prevents connecting two
errgroups such that the errors are propagated between them if the child
errgroup dynamically starts goroutines. Instead use just a single errgroup,
and sequence the shutdown using a sync.WaitGroup. This is far simpler
and does not require any "clever" tricks.
2025-11-26 21:18:22 +01:00
Michael Eischer 9f87e9096a repository: add tests for SaveBlobAsync 2025-11-26 21:18:22 +01:00
Michael Eischer d8dcd6d115 archiver: add buffer test 2025-11-26 21:18:22 +01:00
Michael Eischer 3f92987974 archiver: assert number of uploaded chunks in fileSaver test 2025-11-26 21:18:22 +01:00
Michael Eischer 7f6fdcc52c archiver: convert buffer pool to use sync.Pool 2025-11-26 21:18:22 +01:00
Michael Eischer dd6cb0dd8e archiver: port to repository.SaveBlobAsync 2025-11-26 21:18:22 +01:00
Michael Eischer 046b0e711d repository: add SaveBlobAsync method 2025-11-26 21:18:21 +01:00