The snapshot filtering internally converts relative paths to absolute
ones to ensure that the parent snapshots selection works for backups of
relative paths.
Since 0.15 (#4020), inodes are generated as hashes of names, xor'd with
the parent inode. That means that the inode of a/b/b is
h(a/b/b) = h(a) ^ h(b) ^ h(b) = h(a).
I.e., the grandchild has the same inode as the grandparent. GNU find
trips over this because it thinks it has encountered a loop in the
filesystem, and fails to search a/b/b. This happens more generally when
the same name occurs an even number of times.
Fix this by multiplying the parent by a large prime, so the combining
operation is not longer symmetric in its arguments. This is what the FNV
hash does, which we used prior to 0.15. The hash is now
h(a/b/b) = h(b) ^ p*(h(b) ^ p*h(a))
Note that we already ensure that h(x) is never zero.
Collisions can still occur, but they should be much less likely to occur
within a single path.
Fixes#4253.
This turns snapshotFilterOptions from cmd into a restic.SnapshotFilter
type and makes restic.FindFilteredSnapshot and FindFilteredSnapshots
methods on that type. This fixes#4211 by ensuring that hosts and paths
are named struct fields instead of unnamed function arguments in long
lists of such.
Timestamp limits are also included in the new type. To avoid too much
pointer handling, the convention is that time zero means no limit.
That's January 1st, year 1, 00:00 UTC, which is so unlikely a date that
we can sacrifice it for simpler code.
The output is now
```
-v, --verbose be verbose (specify multiple times or a level using --verbose=n, max level/times is 2)
```
instead of
```
-v, --verbose n be verbose (specify multiple times or a level using --verbose=n, max level/times is 2)
```
With debug logging enabled this method would take a lock and then
format the lock as a string. Since PR #4022 landed the string
formatting method has also taken the lock, so this deadlocks.
Instead just record the lock ID, as is done elsewhere.
The code always assumed that the upgrade happens in place. Thus writing
the upgrade to a separate file fails, when trying to remove the file
stored at that location.
Added missing call to scanFinished=true.
This was causing the percent and eta to never get
printed for backup progress even after the scan was finished.
The retry backend does not return the original error, if its execution
is interrupted by canceling the context. Thus, we have to manually
ensure that the invalid data error gets returned.
Additionally, use the retry backend for some of the repository tests, as
this is the configuration which will be used by restic.
The retry printed the filename twice:
```
Load(<lock/04804cba82>, 0, 0) returned error, retrying after 720.254544ms: load(<lock/04804cba82>): invalid data returned
```
now the warning has changed to
```
Load(<lock/04804cba82>, 0, 0) returned error, retrying after 720.254544ms: invalid data returned
```
The StdioWrapper is not used at all by the ProgressPrinters. It is
called a bit earlier than previously. However, as the password prompt
directly accessed stdin/stdout this doesn't cause problems.
The maximum for `--verbose=n` is n=2. Internally it is translated into a
scale from 0 to 3. However, the default (without verbose) is 1, thus the
verbosity level can only be increased two times.
Only the repacking of *un*compressed packs reduces the amount of
uncompressed data. Previously the counter even overflowed for fully
compressed repositories.
Commands should use the normal shutdown path. In addition, the Exitf
function was only used by `dump` and `restore` but not any other command
which introduces the risk of inconsistent behavior.
When reporting an error for a tree, the output message can overlap with
the progress bar output, e.g. `error for tree e91ef6fb:napshots`.
The fix only applies for this specific message and does not work on
Windows.
Revert what seems to be a typo introduced as part of the fix for #2041
in 2018 7d0f2eaf24.
`xbuild` does not look like a go build/tag keyword to me, I failed to
find documentation for it and using `go install -tags '!selfupdate' ...`
has no effect, i.e. self-update code is still compiled.
`+build` however works; updating the OpenBSD port/binary package
security/restic to apply this PR works as expected:
```
$ restic help | grep self
$ restic self-update
unknown command "self-update" for "restic"
```
(Using `go:build` now as per restic's style and gofmt.)
Previously, using `restic-0.14.0p1` on OpenBSD/amd64 7.2-current would
check for a newer version and probably attempt replacing the system wide
root-owned executable (on a read-only filesystem) as unprivileged user:
```
$ restic version
restic 0.14.0 compiled with go1.19.2 on openbsd/amd64
$ restic help | grep self
self-update Update the restic binary
$ restic self-update
writing restic to /usr/local/bin/restic
find latest release of restic at GitHub
restic is up to date
```
(It never tried to actually write besaid path; doing so would fail, so
the current message can be considered misleading.)
Mostly changed the ones that repeat the name of a system call, which is
already contained in os.PathError.Op. internal/fs.Reader had to be
changed to actually return such errors.
The scanner process has only cosmetic effect for the progress printer,
and can be disabled without impacting functionality when the user does
not need an estimate of completion.
In many cases the scanner process can provide beneficial priming of
the file system cache, so as general advice it should not be disabled.
However, tests have shown that backup of NFS and fuse based filesystems,
where stat(2) is relatively expensive, can be significantly faster
without the scanner.
TestRepository and its variants always returned no-op cleanup functions.
If they ever do need to do cleanup, using testing.T.Cleanup is easier
than passing these functions around.
The Original field is meant to remember the original snapshot id if e.g.
changing its tags. It was only set by the `rewrite` command if it was
not set previously. However, a rewritten snapshot is potentially rather
different from the original snapshot. Thus just always set the Original
field. This also makes it easier to later on detect and potentially
remove the original snapshots.
We now check for space that is not reserved for the root user on the
remote, and the check is no longer in a defer block because it wouldn't
fire. Some change in the surrounding code may have led the deferred
function to capture the wrong err variable.
Fixes#3336.
IDs.Less can be rewritten as
string(list[i][:]) < string(list[j][:])
Note that this does not copy the ID's.
The Uniq method was no longer used.
The String method has been reimplemented without first copying into a
separate slice of a custom type.
The Test method was only used in exactly one place, namely when trying
to create a new repository it was used to check whether a config file
already exists.
Use a combination of Stat() and IsNotExist() instead.
Since #3940 the rclone backend returns the commands exit code if it
fails to start. The list of expected errors was missing the "file
already closed"-error which can occur if the http test request first
learns about the closed pipe to rclone before noticing the canceled
context.
Go internally makes sure that a file descriptor is unusable once it was
closed, thus this cannot have unintended side effects (like accidentally
reading from the wrong file due to a reused file descriptor).
The ioutil functions are deprecated since Go 1.17 and only wrap another
library function. Thus directly call the underlying function.
This commit only mechanically replaces the function calls.
Without comma-ok, the runtime inserts the same check with a similar
enough panic message:
interface conversion: interface {} is nil, not *syscall.Stat_t
The new genericized LRU cache no longer needs to have the IDs separately
allocated:
name old time/op new time/op delta
Add-8 494ns ± 2% 388ns ± 2% -21.46% (p=0.000 n=10+9)
name old alloc/op new alloc/op delta
Add-8 176B ± 0% 152B ± 0% -13.64% (p=0.000 n=10+10)
name old allocs/op new allocs/op delta
Add-8 5.00 ± 0% 3.00 ± 0% -40.00% (p=0.000 n=10+10)
Hard links to the same file now get the same inode within the FUSE
mount. Also, inode generation is faster and, more importantly, no longer
allocates.
Benchmarked on Linux/amd64. Old means the benchmark with
sink = fs.GenerateDynamicInode(1, sub.node.Name)
instead of calling inodeFromNode. Results:
name old time/op new time/op delta
Inode/no_hard_links-8 137ns ± 4% 34ns ± 1% -75.20% (p=0.000 n=10+10)
Inode/hard_link-8 33.6ns ± 1% 9.5ns ± 0% -71.82% (p=0.000 n=9+8)
name old alloc/op new alloc/op delta
Inode/no_hard_links-8 48.0B ± 0% 0.0B -100.00% (p=0.000 n=10+10)
Inode/hard_link-8 0.00B 0.00B ~ (all equal)
name old allocs/op new allocs/op delta
Inode/no_hard_links-8 1.00 ± 0% 0.00 -100.00% (p=0.000 n=10+10)
Inode/hard_link-8 0.00 0.00 ~ (all equal)
The old check did not consider files containing case insensitive
excludes. The check is now implemented as a function of the
excludePatternOptions struct to improve cohesion.
In principle, the JSON format of Tree objects is extensible without
requiring a format change. In order to not loose information just play
it safe and reject rewriting trees for which we could loose data.
The lock test creates a lock and checks that it is not stale. However,
it is possible that the lock is refreshed concurrently, which updates
the lock timestamp. Checking the timestamp in `Stale()` without
synchronization results in a data race. Thus add a lock to prevent
concurrent accesses.
The lock test creates a lock and checks that it is not stale. This also
tests whether the corresponding process still exists. However, it is
possible that the lock is refreshed concurrently, which updates the lock
timestamp. Calling `processExists()` with a value receiver, however,
creates an unsynchronized copy of this field. Thus call the method using
a pointer receiver.
The test did not wait for the mount command to fully shutdown all
running goroutines. This caused the go race detector to report a data
race related to lock refreshes.
==================
WARNING: DATA RACE
Write at 0x0000021bdfdb by goroutine 667:
github.com/restic/restic/internal/backend/retry.TestFastRetries()
/restic/restic/internal/backend/retry/testing.go:7 +0x18f
github.com/restic/restic/cmd/restic.withTestEnvironment()
/restic/restic/cmd/restic/integration_helpers_test.go:175 +0x183
github.com/restic/restic/cmd/restic.TestMountSameTimestamps()
/restic/restic/cmd/restic/integration_fuse_test.go:202 +0xac
testing.tRunner()
/usr/lib/go/src/testing/testing.go:1446 +0x216
testing.(*T).Run.func1()
/usr/lib/go/src/testing/testing.go:1493 +0x47
Previous read at 0x0000021bdfdb by goroutine 609:
github.com/restic/restic/internal/backend/retry.(*Backend).retry()
/restic/restic/internal/backend/retry/backend_retry.go:72 +0x9e
github.com/restic/restic/internal/backend/retry.(*Backend).Remove()
/restic/restic/internal/backend/retry/backend_retry.go:149 +0x17d
github.com/restic/restic/internal/cache.(*Backend).Remove()
/restic/restic/internal/cache/backend.go:38 +0x11d
github.com/restic/restic/internal/restic.(*Lock).Unlock()
/restic/restic/internal/restic/lock.go:190 +0x249
github.com/restic/restic/cmd/restic.refreshLocks.func1()
/restic/restic/cmd/restic/lock.go:86 +0xae
runtime.deferreturn()
/usr/lib/go/src/runtime/panic.go:476 +0x32
github.com/restic/restic/cmd/restic.lockRepository.func2()
/restic/restic/cmd/restic/lock.go:61 +0x71
[...]
Goroutine 609 (finished) created at:
github.com/restic/restic/cmd/restic.lockRepository()
/restic/restic/cmd/restic/lock.go:61 +0x488
github.com/restic/restic/cmd/restic.lockRepo()
/restic/restic/cmd/restic/lock.go:25 +0x219
github.com/restic/restic/cmd/restic.runMount()
/restic/restic/cmd/restic/cmd_mount.go:126 +0x1f8
github.com/restic/restic/cmd/restic.testRunMount()
/restic/restic/cmd/restic/integration_fuse_test.go:61 +0x1ce
github.com/restic/restic/cmd/restic.checkSnapshots.func1()
/restic/restic/cmd/restic/integration_fuse_test.go:90 +0x124
==================
In some rare cases files could be created which contain null IDs (all
zero) in their content list. This was caused by a race condition between
growing the `Content` slice and inserting the blob IDs into it. In some
cases the blob ID was written to the old slice, which a short time
afterwards was replaced with a larger copy, that did not yet contain the
blob ID.
Automatically fall back to hiding files if not authorized to permanently
delete files. This allows using restic with an append-only application
key with B2. Thus, an attacker cannot directly delete backups with the
API key used by restic.
To use this feature create an application key without the deleteFiles
capability. It is recommended to restrict the key to just one bucket.
For example using the b2 command line tool:
b2 create-key --bucket <bucketName> <keyName> listBuckets,readFiles,writeFiles,listFiles
Suggested-by: Daniel Gröber <dxld@darkboxed.org>
We previously checked whether the set of snapshots might have changed
based only on their number, which fails when as many snapshots are
forgotten as are added. Check for the SHA-256 of their id's instead.
The status bar got stuck once the first error was reported, the scanner
completed or some file was backed up. Either case sets a flag that the
scanner has started.
This flag is used to hide the progress bar until the flag is set. Due to
an inverted condition, the opposite happened and the status stopped
refreshing once the flag was set.
In addition, the scannerStarted flag was not set when the scanner just
reported progress information.
As the FileSaver is asynchronously waiting for all blobs of a file to be
stored, the number of active files is higher than the number of files
from which restic is reading concurrently. Thus to not confuse users,
only display files in the status from which restic is currently reading.
After reading and chunking all data in a file, the FutureFile still has
to wait until the FutureBlobs are completed. This was done synchronously
which results in blocking the file saver and prevents the next file from
being read.
By replacing the FutureBlob with a callback, it becomes possible to
complete the FutureFile asynchronously.
We always need both values, except in a test, so we don't need to lock
twice and risk scheduling in between.
Also, removed the resetting in Done. This copied a mutex, which isn't
allowed. Static analyzers tend to trip over that.
The channel-based algorithm had grown quite complicated. This is easier
to reason about and likely to be more performant with very many
CompleteBlob calls.
Counting the first occurrence of a duplicate blob as used and counting
all other as duplicates, independent of which instance of the blob is
kept, is only accurate if all copies of the blob have the same size. This
is no longer the case for a repository containing both compressed and
uncompressed blobs.
Thus for duplicated blobs first count all instances as duplicates and
then subtract the actually used instance later on.
As long as only a small fraction of the data in a repository is
rewritten, the keepBlobs set will be rather small after cleaning it up.
As golang maps do not shrink their memory usage, just copy the contents
over to a new map. However, only copy the map if the cleanup removed at
least half the entries.
The set covers necessary, existing and duplicate blobs. This removes the
duplicate sets used to track whether all necessary blobs also exist.
This reduces the memory usage of prune by about 20-30%.
The RetryBackend tests depend on the mock backend. When the Backend
interface is eventually split from the restic package, this will lead to
a dependency cycle between backend and backend/mock. Thus split the
RetryBackend into a separate package to avoid this problem.
Archiver.Save queries the current time multiple times. This commit
removes one of these calls as they showed up while profiling a backup of
a nearly unchanged dataset containing 3 million files.
There is no need to use a special wildcard `**` to demonstrate negative
patterns. Actually, it is both slower than the simpler variant and seems
to confuse users.
The string form was presumably useful before the introduction of
layouts, but right now it just makes call sequences and garbage
collection more expensive (the latter because every string contains
a pointer to be scanned).
if x { return true } return false => return x
fmt.Sprintf("%v", x) => fmt.Sprint(x) or x.String()
The fmt.Sprintf idiom is still used in the SecretString tests, where it
serves security hardening.
ID.UnmarshalJSON accepted non-JSON input with ' as the string delimiter.
Also, the error message for non-hex input was less informative than it
could be and it performed too many checks.
Changed ParseID to keep the error messages consistent.
The comparison of the current time and the last lock refresh were using
seconds represented as integers. As the test only waits for up to one
second, the associated number truncation can cause the test to take
longer than once second and thus to fail.
Switch to nanoseconds to avoid this problem. This also slightly speeds
up the test.
FindFilteredSnapshots no longer prints errors during snapshot loading on
stderr, but instead passes the error to the callback to allow the caller
to decide on what to do.
In addition, it moves the logic to handle an explicit snapshot list from
the main package to restic.
The helper function uidGidInt used strconv.ParseInt instead of
ParseUint, so it silently ignored some invalid user/group IDs.
Also, improve the error message. "Invalid UID" is more informative than
having "ParseInt" twice (*strconv.NumError displays the function name).
Finally, the user.User struct can be passed by pointer to get reduce
code size.
This package is no longer needed, since we can use the stdlib's
http.NewRequestWithContext.
backend/rclone already did, but it needed a different error check due to
a difference between net/http and ctxhttp.
Also, store the http.Client by value in the REST backend (changed to a
pointer when ctxhttp was introduced) and use errors.WithStack instead
of errors.Wrap where the message was no longer accurate. Errors from
http.NewRequestWithContext will start with "net/http" or "net/url", so
they're easy to identify.
The backup command failed if a directory contains duplicate entries.
Downgrade the severity of this problem from fatal error to a warning.
This allows users to still create a backup.
SaveTree did not use the TreeSaver but rather managed the tree
collection and upload itself. This prevents using the parallelism
offered by the TreeSaver and duplicates all related code. Using the
TreeSaver can provide some speed-ups as all steps within the backup tree
now rely on FutureNodes. This can be especially relevant for backups
with large amounts of explicitly specified files.
The main difference between SaveTree and SaveDir is, that only the
former can save tree blobs in which nodes have a different name than the
actual file on disk. This is the result of resolving name conflicts
between multiple files with the same name. The filename that must be
used within the snapshot is now passed directly to
restic.NodeFromFileInfo. This ensures that a FutureNode already contains
the correct filename.
According to the documentation of exec.Cmd Wait() must not be called
before completing all reads from the pipe returned by StdErrPipe(). Thus
return a context that is canceled once rclone has exited and use that as
a precondition to calling Wait(). This should ensure that all errors
printed to stderr have been copied first.
When rclone fails during the connection setup this currently often
results in a context canceled error. Replace this error with the exit
code from rclone.
When backing up many small files, the unbuffered channels frequently
cause the FileSaver to block when reporting progress information. Thus,
add buffers to these channels to avoid unnecessary scheduling.
As the status information is purely informational, it doesn't matter
that the status reporting shutdown is somewhat racy and could miss a few
final updates.
The only use cases in the code were in errors.IsFatal, backend/b2,
which needs a workaround, and backend.ParseLayout. The last of these
requires all backends to implement error unwrapping in IsNotExist.
All backends except gs already did that.
Repositories with mixed packs are probably quite rare by now. When
loading data blobs from a mixed pack file, this will no longer trigger
caching that file. However, usually tree blobs are accessed first such
that this shouldn't make much of a difference.
The checker gets a simpler replacement.
Monotonic timers are paused during standby. Thus these timers won't fire
after waking up. Fall back to periodic polling to detect too large clock
jumps. See https://github.com/golang/go/issues/35012 for a discussion of
go timers during standby.
While searching for lock file from concurrently running restic
instances, restic ignored unreadable lock files. These can either be
in fact invalid or just be temporarily unreadable. As it is not really
possible to differentiate between both cases, just err on the side of
caution and consider the repository as already locked.
The code retries searching for other locks up to three times to smooth
out temporarily unreadable lock files.
Restic continued e.g. a backup task even when it failed to renew the
lock or failed to do so in time. For example if a backup client enters
standby during the backup this can allow other operations like `prune`
to run in the meantime (after calling `unlock`). After leaving standby
the backup client will continue its backup and upload indexes which
refer pack files that were removed in the meantime.
This commit introduces a goroutine explicitly monitoring for locks that
are not refreshed in time. To simplify the implementation there's now a
separate goroutine to refresh the lock and monitor for timeouts for each
lock. The monitoring goroutine would now cause the backup to fail as the
client has lost it's lock in the meantime.
The lock refresh goroutines are bound to the context used to lock the
repository initially. The context returned by `lockRepo` is also
cancelled when any of the goroutines exits. This ensures that the
context is cancelled whenever for any reason the lock is no longer
refreshed.
Previously the global context was either accessed via gopts.ctx,
stored in a local variable and then used within that function or
sometimes both. This makes it very hard to follow which ctx or a wrapped
version of it reaches which method.
Thus just drop the context from the globalOptions struct and pass it
explicitly to every command line handler method.
Some backends generate additional files for each existing file, e.g.
1234567890abcdef1234567890abcdef1234567890abcdef1234567890abcdef
1234567890abcdef1234567890abcdef1234567890abcdef1234567890abcdef.sha256
For some commands this leads to an "multiple IDs with prefix" error when
trying to reference a snapshot.
Failing to process data requested from the cache usually indicates a
problem with the returned data. Assume that the cache entry is somehow
damaged and retry downloading it once.
The cross compilation tasks are currently the slowest part of the CI
runs. Splitting it into three parts should reduce its time to roughly
that of the windows CI run.
Sparse files contain large regions containing only zero bytes. Checking
that a blob only contains zeros is possible with over 100GB/s for modern
x86 CPUs. Calculating sha256 hashes is only possible with 500MB/s (or
2GB/s using hardware acceleration). Thus we can speed up the hash
calculation for all zero blobs (which always have length
chunker.MinSize) by checking for zero bytes and then using the
precomputed hash.
The all zeros check is only performed for blobs with the minimal chunk
size, and thus should add no overhead most of the time. For chunks which
are not all zero but have the minimal chunks size, the overhead will be
below 2% based on the above performance numbers.
This allows reading sparse sections of files as fast as the kernel can
return data to us. On my system using BTRFS this resulted in about
4GB/s.
The restorer can issue multiple calls to WriteAt in parallel. This can
result in unexpected orderings of the Truncate and WriteAt calls and
sometimes too short restored files.
We can either preallocate storage for a file or sparsify it. This
detects a pack file as sparse if it contains an all zero block or
consists of only one block. As the file sparsification is just an
approximation, hide it behind a `--sparse` parameter.
This writes files by using (*os.File).Truncate, which resolves to the
truncate system call on Unix.
Compared to the naive loop,
for _, b := range p {
if b != 0 {
return false
}
}
the optimized allZero is about 10× faster:
name old time/op new time/op delta
AllZero-8 1.09ms ± 1% 0.09ms ± 1% -92.10% (p=0.000 n=10+10)
name old speed new speed delta
AllZero-8 3.84GB/s ± 1% 48.59GB/s ± 1% +1166.51% (p=0.000 n=10+10)
`restic unlock` now only shows `successfully removed locks` if there were locks to be removed.
In addition, it also reports the number of the removed lock files.
Sending data through a channel at very high frequency is extremely
inefficient. Thus use simple callbacks instead of channels.
> name old time/op new time/op delta
> MasterIndexEach-16 6.68s ±24% 0.96s ± 2% -85.64% (p=0.008 n=5+5)
This is especially useful if ssh asks for a password or if closing the
initial connection could return an error due to a problematic server
implementation.
bazil/fuse expects us to return context.Canceled to signal that a
syscall was successfully interrupted. Returning a wrapped version of
that error however causes the fuse library to signal an EIO (input/output
error). Thus unwrap context.Canceled errors before returning them.
This results in printing a `(default: $ENV) (default: value)` suffix for
the corresponding options which looks strange. In addition, some of the
environment variables might contain secrets which should not be
displayed.
rclone can exit early for example when the connection to rclone is
relayed for example via ssh: `-o rclone.program='ssh user@example.org
forced-command'`
For backends which are able to atomically replace files, we just can
overwrite the old copy, if it is necessary to retry an upload. This has
the benefit of issuing one operation less and might be beneficial if a
backend storage, due to bugs or similar, could mix up the order of the
upload and delete calls.
When hard deleting the latest file version on B2, this uncovers earlier
versions. If an upload required retries, multiple version might exist
for a file. Thus to reliably delete a file, we have to remove all
versions of it.
Ignored packs were reported as an empty pack by EachByPack. The most
immediate effect of this is that the progress bar for rebuilding the
index reports processing more packs than actually exist.
Previously the buffer was grown incrementally inside `repo.LoadUnpacked`.
But we can do better as we already know how large the index will be.
Allocate a bit more memory to increase the chance that the buffer can be
reused in the future.
Instead of first checking whether a file is in the repository cache and
then opening it, we just can open the file. This saves one stat call. If
the file is in the cache, everything is fine and otherwise the code
follows its normal fallback path.
`init` and `copy` use `--repo2` with two different meaning which has
proven to be confusing for users. `--from-repo` now consistently marks a
source repository from which data is read. `--repo` is now always the
target/destination repository.
sort.Sort is not guaranteed to be stable. Go 1.19 has changed the
sorting algorithm which resulted in changes of the sort order. When
comparing snapshots with identical timestamp but different paths and
tags lists, there is not meaningful order among them. So just keep their
order stable.
Cleanly separate the directory presentation and the snapshot directory
structure. SnapshotsDir now translates the dirStruct into a format
usable by the fuse library and contains only minimal special case rules.
All decisions have moved into SnapshotsDirStructure which now creates a
fully preassembled tree data structure.
For large pack sizes we might be only interested in the first and last
blob of a pack file. Thus stream a pack file in multiple parts if the
gaps between requested blobs grow too large.
Also make the errors a bit less verbose by not prepending the operation,
since pkg/xattr already does that. Old errors looked like
Listxattr: xattr.list /myfiles/.zfs/snapshot: invalid argument
After repacking every blob that should be kept must have been repacked.
We have seen a few cases in which a single blob went missing, which
could have been caused by a bitflip somewhere. This sanity check might
help catch some of these cases.
pkg/sftp.Client.MkdirAll(d) does a Stat to determine if d exists and is
a directory, then a recursive call to create the parent, so the calls
for data/?? each take three round trips. Doing a Mkdir first should
eliminate two round trips for 255/256 data directories as well as all
but one of the top-level directories.
Also, we can do all of the calls concurrently. This may reintroduce some
of the Stat calls when multiple goroutines try to create the same
parent, but at the default number of connections, that should not be
much of a problem.
FutureBlob now uses a Take() method as a more memory-efficient way to
retrieve the futures result. In addition, futures are now collected
while saving the file. As only a limited number of blobs can be queued
for uploading, for a large file nearly all FutureBlobs already have
their result ready, such that the FutureBlob object just consumes
memory.
There is no real difference between the FutureTree and FutureFile
structs. However, differentiating both increases the size of the
FutureNode struct.
The FutureNode struct is now only 16 bytes large on 64bit platforms.
That way is has a very low overhead if the corresponding file/directory
was not processed yet.
There is a special case for nodes that were reused from the parent
snapshot, as a go channel seems to have 96 bytes overhead which would
result in a memory usage regression.
Unused blobs are not a problem but rather expected to exist now that
prune by default does not remove every unused blob. However, the option
has caused questions from users whether a repository is damaged or not,
so just remove that option.
Note that the remaining code is left intact as it is still useful for
our test cases.
It wasn't clear that Google Cloud Storage and Google Drive are two different services and that one should use the rclone backend for the latter. This commit adds a note with this information.
After the `BlobSaver` job is submitted, the buffer can be released and
reused by another `FileSaver` even before `BlobSaver.Save` returns. That
FileSaver will the change `buf.Data` leading to wrong backup statistics.
Found by `go test -race ./...`:
WARNING: DATA RACE
Write at 0x00c0000784a0 by goroutine 41:
github.com/restic/restic/internal/archiver.(*FileSaver).saveFile()
/home/michael/Projekte/restic/restic/internal/archiver/file_saver.go:176 +0x789
github.com/restic/restic/internal/archiver.(*FileSaver).worker()
/home/michael/Projekte/restic/restic/internal/archiver/file_saver.go:242 +0x2af
github.com/restic/restic/internal/archiver.NewFileSaver.func2()
/home/michael/Projekte/restic/restic/internal/archiver/file_saver.go:88 +0x5d
golang.org/x/sync/errgroup.(*Group).Go.func1()
/home/michael/go/pkg/mod/golang.org/x/sync@v0.0.0-20210220032951-036812b2e83c/errgroup/errgroup.go:57 +0x91
Previous read at 0x00c0000784a0 by goroutine 29:
github.com/restic/restic/internal/archiver.(*BlobSaver).Save()
/home/michael/Projekte/restic/restic/internal/archiver/blob_saver.go:57 +0x1dd
github.com/restic/restic/internal/archiver.(*BlobSaver).Save-fm()
<autogenerated>:1 +0xac
github.com/restic/restic/internal/archiver.(*FileSaver).saveFile()
/home/michael/Projekte/restic/restic/internal/archiver/file_saver.go:191 +0x855
github.com/restic/restic/internal/archiver.(*FileSaver).worker()
/home/michael/Projekte/restic/restic/internal/archiver/file_saver.go:242 +0x2af
github.com/restic/restic/internal/archiver.NewFileSaver.func2()
/home/michael/Projekte/restic/restic/internal/archiver/file_saver.go:88 +0x5d
golang.org/x/sync/errgroup.(*Group).Go.func1()
/home/michael/go/pkg/mod/golang.org/x/sync@v0.0.0-20210220032951-036812b2e83c/errgroup/errgroup.go:57 +0x91
Use runtime.GOMAXPROCS(0) as worker count for CPU-bound tasks,
repo.Connections() for IO-bound task and a combination if a task can be
both. Streaming packs is treated as IO-bound as adding more worker
cannot provide a speedup.
Typical IO-bound tasks are download / uploading / deleting files.
Decoding / Encoding / Verifying are usually CPU-bound. Several tasks are
a combination of both, e.g. for combined download and decode functions.
In the latter case add both limits together. As the backends have their
own concurrency limits restic still won't download more than
repo.Connections() files in parallel, but the additional workers can
decode already downloaded data in parallel.
Use only a single not completed pack file to keep the number of open and
active pack files low. The main change here is to defer hashing the pack
file to the upload step. This prevents the pack assembly step to become
a bottleneck as the only task is now to write data to the temporary pack
file.
The tests are cleaned up to no longer reimplement packer manager
functions.
Now with the asynchronous uploaders there's no more benefit from using
more blob savers than we have CPUs. Thus use just one blob saver for
each CPU we are allowed to use.
Previously, SaveAndEncrypt would assemble blobs into packs and either
return immediately if the pack is not yet full or upload the pack file
otherwise. The upload will block the current goroutine until it
finishes.
Now, the upload is done using separate goroutines. This requires changes
to the error handling. As uploads are no longer tied to a SaveAndEncrypt
call, failed uploads are signaled using an errgroup.
To count the uploaded amount of data, the pack header overhead is no
longer returned by `packer.Finalize` but rather by
`packer.HeaderOverhead`. This helper method is necessary to continue
returning the pack header overhead directly to the responsible call to
`repository.SaveBlob`. Without the method this would not be possible,
as packs are finalized asynchronously.
The short ids are not always unique. In addition, recovering from
damages is easier when having the full ids as that makes it easier to
access the corresponding files.
raw-data summed up the size of the blob plaintexts. However, with
compression this makes little sense as the storage size in the
repository is lower due to compression. Thus sum up the actual size each
blob takes in the repository.
As MergeFinalIndex and index uploads can occur concurrently, it is
necessary for MergeFinalIndex to check whether the IDs for an index were
already set before merging it. Otherwise, we'd loose the ID of an index
which is set _after_ uploading it.
The GlobalOptions struct now embeds a backend.TransportOptions, so it
doesn't need to construct one in open and create. The upload and
download limits are similarly now a struct in internal/limiter that is
embedded in GlobalOptions.
There were three loops over the index in restic prune, to find
duplicates, to determine sizes (in pack.Size) and to generate packInfos.
These three are now one loop. This way, prune doesn't need to construct
a set of duplicate blobs, pack.Size doesn't need to contain special
logic for prune's use case (the onlyHdr argument) and pack.Size doesn't
need to construct a map only to have it immediately transformed into a
different map.
Some quick testing on a 160GiB local repo doesn't show running time or
memory use of restic prune --dry-run changing significantly.
... called backend/sema. I resisted the temptation to call the main
type sema.Phore. Also, semaphores are now passed by value to skip a
level of indirection when using them.
github.com/pkg/errors is no longer getting updates, because Go 1.13
went with the more flexible errors.{As,Is} function. Use those instead:
errors from pkg/errors already support the Unwrap interface used by 1.13
error handling. Also:
* check for io.EOF with a straight ==. That value should not be wrapped,
and the chunker (whose error is checked in the cases changed) does not
wrap it.
* Give custom Error methods pointer receivers, so there's no ambiguity
when type-switching since the value type will no longer implement error.
* Make restic.ErrAlreadyLocked private, and rename it to
alreadyLockedError to match the stdlib convention that error type
names end in Error.
* Same with rest.ErrIsNotExist => rest.notExistError.
* Make s3.Backend.IsAccessDenied a private function.
When given a buf that is big enough for a compressed blob but not its
decompressed contents, the copy at the end of LoadBlob would skip the
last part of the contents.
Fixes#3783.
This isn't doing anything. Channels should get cleaned up by the GC when
the last reference to them disappears, just like all other data
structures. Also inlined BufferPool.Put in Buffer.Release, its only
caller.
fd05037e1a changed the allocation batch
size from 256 to 128 under the assumption that an indexEntry is 60 bytes
on amd64, but it's 64: structs are padded out to a multiple of 8 for
alignment reasons. That means we'd waste no space in malloc even without
the batch allocation, at least on 64-bit machines. While that strategy
cuts the overallocation down dramatically for many small indexes, it also
seems to slow allocation down (Go 1.18, Linux, amd64, -benchtime=2s):
name old time/op new time/op delta
DecodeIndex-8 4.67s ± 5% 4.60s ± 1% ~ (p=0.953 n=10+5)
DecodeIndexParallel-8 4.67s ± 3% 4.60s ± 1% ~ (p=0.953 n=10+5)
IndexHasUnknown-8 37.8ns ± 8% 36.5ns ±14% ~ (p=0.841 n=5+5)
IndexHasKnown-8 38.5ns ±12% 37.7ns ±10% ~ (p=0.968 n=5+5)
IndexAlloc-8 615ms ±18% 607ms ± 1% ~ (p=1.000 n=10+5)
IndexAllocParallel-8 245ms ±11% 285ms ± 6% +16.40% (p=0.001 n=10+5)
MasterIndexAlloc-8 286ms ± 9% 275ms ± 2% ~ (p=1.000 n=10+5)
LoadIndex/v1-8 27.0ms ± 4% 26.8ms ± 1% ~ (p=0.690 n=5+5)
LoadIndex/v2-8 22.4ms ± 1% 22.8ms ± 2% +1.48% (p=0.016 n=5+5)
name old alloc/op new alloc/op delta
IndexAlloc-8 446MB ± 0% 446MB ± 0% -0.00% (p=0.000 n=8+4)
IndexAllocParallel-8 446MB ± 0% 446MB ± 0% -0.00% (p=0.008 n=8+5)
MasterIndexAlloc-8 213MB ± 0% 159MB ± 0% -25.47% (p=0.000 n=10+5)
name old allocs/op new allocs/op delta
IndexAlloc-8 913k ± 0% 2632k ± 0% +188.19% (p=0.008 n=5+5)
IndexAllocParallel-8 913k ± 0% 2632k ± 0% +188.21% (p=0.008 n=5+5)
MasterIndexAlloc-8 318k ± 0% 1172k ± 0% +267.86% (p=0.008 n=5+5)
Instead, this patch sets a batch size of 4, which means no space is
wasted by malloc on 64-bit and very little on 32-bit. It still gets very
close to the savings from not allocating in batches, without requiring
special code for bits.UintSize==64. Benchmark results, again for
Linux/amd64:
name old time/op new time/op delta
DecodeIndex-8 4.67s ± 5% 4.83s ± 9% ~ (p=0.315 n=10+10)
DecodeIndexParallel-8 4.67s ± 3% 4.68s ± 4% ~ (p=0.315 n=10+10)
IndexHasUnknown-8 37.8ns ± 8% 44.5ns ±19% ~ (p=0.095 n=5+5)
IndexHasKnown-8 38.5ns ±12% 36.9ns ± 8% ~ (p=0.690 n=5+5)
IndexAlloc-8 615ms ±18% 628ms ±18% ~ (p=0.218 n=10+10)
IndexAllocParallel-8 245ms ±11% 262ms ± 9% +7.02% (p=0.043 n=10+10)
MasterIndexAlloc-8 286ms ± 9% 287ms ±13% ~ (p=1.000 n=10+10)
LoadIndex/v1-8 27.0ms ± 4% 26.8ms ± 0% ~ (p=1.000 n=5+5)
LoadIndex/v2-8 22.4ms ± 1% 22.5ms ± 0% ~ (p=0.056 n=5+5)
name old alloc/op new alloc/op delta
IndexAlloc-8 446MB ± 0% 446MB ± 0% ~ (p=1.000 n=8+10)
IndexAllocParallel-8 446MB ± 0% 446MB ± 0% -0.00% (p=0.000 n=8+8)
MasterIndexAlloc-8 213MB ± 0% 160MB ± 0% -25.02% (p=0.000 n=10+9)
name old allocs/op new allocs/op delta
IndexAlloc-8 913k ± 0% 1333k ± 0% +45.94% (p=0.000 n=8+10)
IndexAllocParallel-8 913k ± 0% 1333k ± 0% +45.94% (p=0.000 n=8+8)
MasterIndexAlloc-8 318k ± 0% 525k ± 0% +64.99% (p=0.000 n=10+10)
The allocation method indexmap.newEntry has also been rewritten in a
form that is a few instructions shorter.
Apparently SMB/CIFS on Linux/macOS returns somewhat random errnos when
trying to sync a windows share which does not support calling fsync for
a directory.
This functionality has gone unused since
4b3dc415ef changed hashing.Reader's only
client to use ioutil.ReadAll on a bufio.Reader wrapping the hashing
Reader.
Reverts bcb852a8d0.
This removes RunWorkers, which had become mere overhead by successive
refactors. It also ensures that each former user of that function
returns any context error that occurs, so failure to complete an
operation is always reported as an error.
Apparently it can take a moment between closing a tempfile marked as
DELETE_ON_CLOSE and it actually being deleted. During that time the file
is inaccessible. Thus just skip deleting the temp file on windows.
Tree packs are cached locally at clients and thus benefit a lot from
being compressed. Ensure this be having prune always repack pack files
containing uncompressed trees.
The `stats` command checks inodes to not count hardlinked files multiple
times into the restore size. This check applies across all snapshots and
not only within snapshots. As a result the result size was far too low
when calculating it for multiple snapshots and it would vary depending
on the order in which snapshots were listed.
If no specific Region is mentioned in RESTIC_REPOSITORY, AWS defaults to
us-east-1. For this reason, users that follow the tutorial and create
their S3 bucket in any other region get the following error:
"Fatal: create repository at [...] client.BucketExists"
Explicitly specifying the AWS region name fixes the issue.
The new option allows prune to operate with nearly no scratch space by only removing
no longer necessary pack files and first deleting the index before
rebuilding it. By first deleting the index it becomes safe to just
delete no longer necessary pack files. However, as a downside there's
now the risk that the repository becomes inaccessible if prune fails.
To recover from that problem a user might have to manually delete the
repository index and then run (a full) `rebuild-index` again.
A compressed index is only about one third the size of an uncompressed
one. Thus increase the number of entries in an index to avoid cluttering
the repository with small indexes.
The config file is not compressed as it should remain readable by older
restic versions such that these can return a proper error.
As the old format for unpacked data does not include a version header,
make use of a trick: The old data is always encoded as JSON. Thus it can
only start with '{' or '['. For any other value the first byte indicates
a versioned format. The version is set to 2 for now. Then the zstd
compressed data follows.
As repack streams packs these occupy one backend connection. Uploading a
new pack also requires a backend connection. To prevent a deadlock
during repack when reaching the backend connections limit, simply limit
the repackWorker count to always leave one connection for uploading.
* Write new file payload to a temp file before touching the original
binary. Minimizes the possibility of failing mid-write and corrupting
the binary.
* On Windows, move the original binary out to a temp file rather than
removing it as the running binary is locked. Fixes issue #2248.
As an exception prune is still allowed to load the index before
snapshots, as it uses exclusive locks. In case of problems with locking
it is also better to load snapshots created after loading the index, as
this will lead to a prune sanity check failure instead of a broken snapshot.
When resolving snapshotIDs in FindFilteredSnapshots either
FindLatestSnapshot or FindSnapshot is called. Both operations issue a
list operation to the backend. When for example passing a long list of
snapshot ids to `forget` this could lead to a large number of list
operations.
These commands filter the snapshots according to some criteria which
essentially requires loading the index before filtering the snapshots.
Thus create a copy of the snapshots list beforehand and use it later on.
During a backup the index is written before the corresponding snapshots.
To ensure that a concurrent/later restic run can read a snapshot's data,
restic thus must first load the snapshots and only afterwards the index.
Otherwise it is not possible to ensure that the loaded index is recent
enough to cover all of the snapshot's data.
Fixes#3687. Uses the cast suggested by @MichaelEischer, except that the
contant isn't cast along, because it's untyped and will be converted by
the compiler as necessary.
Nodes in trees were always printed with a `+` in diff, regardless of
whether or not a dir was added or removed. Let's use the mode we were
passed in printDir().
Closes#3685
The repack operation copies all selected blobs from a set of pack files
into new pack files. For prune the source and destination repositories
are identical. To implement copy, just use a different source and
destination repository.
Removing data based on a policy when the attacker had the opportunity to
add data to your repository comes with some considerations. This is
added to the 060_forget.rst documentation.
That document is also updated to reflect that restic now considers
the current system time while running "forget".
References to the security considerations section are added:
- In `restic forget --help`
- In the threat model (design.rst)
- In the (030) setup section where an append-only setup is referenced
A reference is also to be added to the `rest-server` readme's
append-only paragraph (see my fork).
This commit also resolves a typo (amount->number for countable noun),
changes a password length recommendation into the metric that
actually matters when creating passwords (entropy) since I was editing
these doc files anyway, and updates the outdated copyright year in
`conf.py`.
Some wording in 060_forget (line 21..22) was changed to clarify what
"forget" and "prune" do, to try and avoid the apparent misconception
that "forget" does not remove any data.
This is quite similar to gitignore. If a pattern is suffixed by an
exclamation mark and match a file that was previously matched by a
regular pattern, the match is cancelled. Notably, this can be used
with `--exclude-file` to cancel the exclusion of some files.
Like for gitignore, once a directory is excluded, it is not possible
to include files inside the directory. For example, a user wanting to
only keep `*.c` in some directory should not use:
~/work
!~/work/*.c
But:
~/work/*
!~/work/*.c
I didn't write documentation or changelog entry. I would like to get
feedback if this is the right approach for excluding/including files
at will for backups. I use something like this as an exclude file to
backup my home:
$HOME/**/*
!$HOME/Documents
!$HOME/code
!$HOME/.emacs.d
!$HOME/games
# [...]
node_modules
*~
*.o
*.lo
*.pyc
# [...]
$HOME/code/linux/*
!$HOME/code/linux/.git
# [...]
There are some limitations for this change:
- Patterns are not mixed accross methods: patterns from file are
handled first and if a file is excluded with this method, it's not
possible to reinclude it with `--exclude !something`.
- Patterns starting with `!` are now interpreted as a negative
pattern. I don't think anyone was relying on that.
- The whole list of patterns is walked for each match. We may
optimize later by exiting early if we know no pattern is starting
with `!`.
Fix#233
Otherwise newer Go versions complain that the hash for the installed
version of gox is not in the go.mod, which we don't want anyways because
the tests should use the latest version of gox.
Go 1.18 dropped support for installing binaries via "go get", Go <= 1.16
does not support it. So we need to use the right verb depending on the
Go version.
There's no point in locking the repository just to list the currently
existing lock files. This won't work for an exclusively locked
repository and is also confusing to users.
Load tree blobs with more than 50MB only from a single goroutine. Very
large tree blobs with for example 400 MB size can otherwise require
roughly 1GB * streamTreeParallelism memory.
Loading any parent tree for these only wastes time and memory.
Fixes#3641, where it was shown that the most recent tree will get
picked.
--parent is now implicitly ignored when --stdin is given.
Create a temporary file with a sufficiently random name to essentially
avoid any chance of conflicts. Once the upload has finished remove the
temporary suffix. Interrupted upload thus will be ignored by restic.
cleanup handlers run in the order in which they are added. As Go calls
init() functions in lexical order, the cleanup handler from global.go
was registered before that from lock.go, which is the correct order.
Make this order explicit to ensure that this won't break accidentally.
This ensures that restic won't create lots of new lock files without
deleting them later on.
In some cases a Delete operation on a backend can return a "File does
not exist" error even though the Delete operation succeeded. This can
for example be caused by request retries. This caused restic to forget
about the new lock file and continue trying to remove the old (already
deleted) lock file.
The function supports efficiently loading a specified list of blobs from
a single pack in a streaming fashion. That is there's no need for
temporary files independent of the pack size.
The archiver uses FS.OpenFile, where FS is an instance of the FS
interface. This is different from fs.OpenFile, which uses the OpenFile
method provided by the fs package.
Citing Kerrisk, The Linux Programming Interface:
The O_NOATIME flag is intended for use by indexing and backup
programs. Its use can significantly reduce the amount of disk
activity, because repeated disk seeks back and forth across the
disk are not required to read the contents of a file and to update
the last access time in the file’s i-node[.]
restic used to do this, but the functionality was removed along with the
fadvise call in #670.
Currently, `restic backup` (if a `--parent` is not provided)
will choose the most recent matching snapshot as the parent snapshot.
This makes sense in the usual case,
where we tag the snapshot-being-created with the current time.
However, this doesn't make sense if the user has passed `--time`
and is currently creating a snapshot older than the latest snapshot.
Instead, choose the most recent snapshot
which is not newer than the snapshot-being-created's timestamp,
to avoid any time travel.
Impetus for this change:
I'm using restic for the first time!
I have a number of existing BTRFS snapshots
I am backing up via restic to serve as my initial set of backups.
I initially `restic backup`'d the most recent snapshot to test,
then started backing up each of the other snapshots.
I noticed in `restic cat snapshot <id>` output
that all the remaining snapshots have the most recent as the parent.
Currently restic copy will copy each blob from every snapshot serially,
which has performance implications on high-latency backends such as b2.
This commit introduces 8x parallelism for blob downloads/uploads which
can improve restic copy operations up to 8x for repositories with many
small blobs on b2.
This commit also addresses the TODO comment in the copyTree function.
Related work:
A more thorough improvement of the restic copy performance can be found
in PR #3513
Closes#3595
Choosing to include `stdoutIsTerminal()` as:
- all other instances with `!opts.JSON` do so
- this likely will not affect anything, especially when autorun
- this seems to not be a meaningful enough summary
to include in auto-backup reports
JSON is still likely not guaranteed to work and this is a suboptimal
solution to this. Ideally, #1804 should refactor all print statements,
and define+document(+handle) when stdoutIsTerminal() should be used.
Else, it may end up more inconsistent and bulky
(duplicate lines, longer files).
The keyfile provided by restic's own webserver (https://restic.net) should be
more stable than relying on public keyservers. So I changed the URL to the
GPG keyfile, as recommended by MichaelEischer.
If a request fails with "x509: certificate signed by unknown authority",
the B2 backend now returns the error without retrying the request.
Closes#3556Closes#2355
Per Amazon's product page [1], S3 is officially called "Amazon S3". The
restic project uses the phrase "AWS S3" in some places. This patch
corrects the product name.
[1]:https://aws.amazon.com/s3/
The missing eof with http2 when a response included a content-length
header but no data, has been fixed in golang 1.17.3/1.16.10. Therefore
just drop the canary test and schedule it for removal once go 1.18 is
required as minimum version by restic.
Further code will also output to the terminal and the bar's cursor
positioning causes its output to overlap with the remaining output in a
racy way.
Fixes: #3344
Because there is no guarantee that a cleanup of these directories will occur
after the "restic check", we extend the behavior to detect and manage these
specific cache directories and allow their cleanup too.
Package internal/dump has been reworked so its API consists of a single
type Dumper that handles tar and zip formats. Tree loading and node
writing happen concurrently.
When deleting a file, B2 sometimes returns a "500 Service Unavailable"
error but nevertheless correctly deletes the file. Due to retries in
the B2 library blazer, we sometimes also see a "400 File not present"
error. The retries of restic for the delete request then fail with
"404 File with such name does not exist.".
As we have to rely on request retries in a distributed system to handle
temporary errors, also consider a delete request to be successful if the
file is reported as not existing. This should be safe as B2 claims to
provide a strongly consistent bucket listing and thus a missing file
shouldn't mysteriously show up again later on.
Running restic self-update --quiet no longer
prints "writing restic to /usr/local/bin/restic".
The only output printed with -q is failures or
"successfully updated restic to version 0.12.1"
https://github.com/restic/restic/pull/3535
fix test fail: changelog title can't end with `.`
shorten changelog title
restic dump uses bloblru.Cache to keep buffers alive, but also reuses
evicted buffers. That means large buffers may be used to store small
blobs, causing the cache to think it's using less memory than it
actually does.
This can be caused when the test has uploaded four blobs, then queues
two blobs for upload which are delayed. Then a seventh file can be
opened which lead to a test failure.
Dependencies are fetched at build time and stored in the GOPATH. These paths end up being in the final binary.
Bump restic version to latest and go version to the 1.16.6, which was used to build restic 0.12.1.
After the refactoring status updates were no longer printed in quiet
mode or when the output is not an interactive terminal. However, the
JSON output is often piped to e.g. another program. Thus, don't set the
update frequency to 0 in that case. The status updates are still
disabled for backup --quiet.
This also reduces the status update frequency to 60fps compared to a
potentially much higher value before the refactoring.
* PrintProgress no longer does unnecessary Sprintf calls, and performs
fewer allocations in general
* newProgressMax's callback checks whether the terminal supports
line updates once instead of once per call
* the callback looks up the terminal width once per call instead of
twice (on Windows)
* the status shortening now uses the Unicode-aware version from
internal/ui/termstatus (future-proofing)
1. All other document links goes to the _Read the Docs_ site
2. The GitHub reStructuredText renderer doesn't appear to do includes,
making for a rather empty read.
The rest config normally uses prepareURL to sanitize URLs and ensures
that the URL ends with a slash. However, the test used an URL without a
trailing slash, which after the rest server changes causes test
failures.
For files below 256MB this uses the md5 hash calculated while assembling
the pack file. For larger files the hash for each 100MB part is
calculated on the fly. That hash is also reused as temporary filename.
As restic only uploads encrypted data which includes among others a
random initialization vector, the file hash shouldn't be susceptible to
md5 collision attacks (even though the algorithm is broken).
This enables the backends to request the calculation of a
backend-specific hash. For the currently supported backends this will
always be MD5. The hash calculation happens as early as possible, for
pack files this is during assembly of the pack file. That way the hash
would even capture corruptions of the temporary pack file on disk.
This can be used to check how large a backup is or validate exclusions.
It does not actually write any data to the underlying backend. This is
implemented as a simple overlay backend that accepts writes without
forwarding them, passes through reads, and generally does the minimal
necessary to pretend that progress is actually happening.
Fixes#1542
Example usage:
$ restic -vv --dry-run . | grep add
new /changelog/unreleased/issue-1542, saved in 0.000s (350 B added)
modified /cmd/restic/cmd_backup.go, saved in 0.000s (16.543 KiB added)
modified /cmd/restic/global.go, saved in 0.000s (0 B added)
new /internal/backend/dry/dry_backend_test.go, saved in 0.000s (3.866 KiB added)
new /internal/backend/dry/dry_backend.go, saved in 0.000s (3.744 KiB added)
modified /internal/backend/test/tests.go, saved in 0.000s (0 B added)
modified /internal/repository/repository.go, saved in 0.000s (20.707 KiB added)
modified /internal/ui/backup.go, saved in 0.000s (9.110 KiB added)
modified /internal/ui/jsonstatus/status.go, saved in 0.001s (11.055 KiB added)
modified /restic, saved in 0.131s (25.542 MiB added)
Would add to the repo: 25.892 MiB
Add comment that the check is based on the stdlib HTTP2 client. Refactor
the checks into a function. Return an error if the value in the
Content-Length header cannot be parsed.
Ensure that only snapshots made in the past are taken into account when running restic forget with the within switches (--keep-within, --keep-within- hourly, and friends)
Allow keeping hourly/daily/weekly/monthly/yearly snapshots for a given time period.
This adds the following flags/parameters to restic forget:
--keep-within-hourly duration
--keep-within-daily duration
--keep-within-weekly duration
--keep-within-monthly duration
--keep-within-yearly duration
Includes following changes:
- Add tests for --keep-within-hourly (and friends)
- Add documentation for --keep-within-hourly (and friends)
- Add changelog for --keep-within-hourly (and friends)
The azure-sdk-for-go is the only remaining module without a go.mod file.
Thus we only need indirect dependencies for it. Remove all other
indirect dependencies.
Apparently the standard Go sha256 implementation is now faster than the
assembly implementation. The library now only adds support for SHA
extensions available in some processors.
See https://github.com/minio/sha256-simd/pull/57 for more details.
The first test function ensures that the workaround works as expected.
And the second test function is intended to fail as soon as the issue
has been fixed in golang to allow us to eventually remove the
workaround.
The golang http client does not return an error when a HTTP2 reply
includes a non-zero content length but does not return any data at all.
This scenario can occur e.g. when using rclone when a file stored in a
backend seems to be accessible but then fails to download.
If a pack file is missing try to determine the contained pack ids based
on the repository index. This helps with assessing the damage to a
repository before running `rebuild-index`.
Just passing the list of blobs to packsToBlobs would also work in most
cases, however, it could cause unexpected results when multiple pack
files have the same prefix. Forget found prefixes to prevent this.
Failed pack/blob downloads should be retried. For blobs that fail
decryption assume that the pack file is really damaged and try to
restore the remaining blobs.
* Stop prepending the operation name: it's already part of os.PathError,
leading to repetitive errors like "Chmod: chmod /foo/bar: operation not
permitted".
* Use errors.Is to check for specific errors.
Apparently readahead was disabled by default. Enable readahead with the
Linux default size of 128kB. Larger values seem to have no effect.
This can speed up reading from the fuse mount by at least factor 5.
Speedup for a 1G random file stored in a local repository:
(Only one result shown, but times were quite stable, restarted restic
after each command)
$ dd if=/dev/urandom bs=1M count=1024 of=rand
$ shasum -a 256 tmp/rand
75dd9b374e712577d64672a05b8ceee40dfc45dce6321082d2c2fd51d60c6c2d tmp/rand
before: $ time shasum -a 256 fuse/snapshots/latest/tmp/rand
75dd9b374e712577d64672a05b8ceee40dfc45dce6321082d2c2fd51d60c6c2d fuse/snapshots/latest/tmp/rand
real 0m18.294s
user 0m4.522s
sys 0m3.305s
before: $ time cat fuse/snapshots/latest/tmp/rand > /dev/null
real 0m14.924s
user 0m0.000s
sys 0m4.625s
after: $ time shasum -a 256 fuse/snapshots/latest/tmp/rand
75dd9b374e712577d64672a05b8ceee40dfc45dce6321082d2c2fd51d60c6c2d fuse/snapshots/latest/tmp/rand
real 0m6.106s
user 0m3.115s
sys 0m0.182s
after: $ time cat fuse/snapshots/latest/tmp/rand > /dev/null
real 0m3.096s
user 0m0.017s
sys 0m0.241s
Since the fileInfos are returned in a []interface, they're already
allocated on the heap. Making them pointers explicitly means the
compiler doesn't need to generate fileInfo and *fileInfo versions of the
methods on this type. The binary becomes about 7KiB smaller on
Linux/amd64.
This patch adds a `--latest` option to limit snapshots list to the n
last snapshots. It is very similar to the `--last` one but does not
limit to one entry. It also deprecates the `--last` flag usage in
favor of `--latest 1`
Output example:
$ restic snapshots --latest 2
repository 0d3eb989 opened successfully, password is correct
ID Time Host Tags Paths
------------------------------------------------------------
5a33bdcc 2020-12-14 12:30:00 local /home
73887d8e 2020-12-15 12:30:00 local /home
------------------------------------------------------------
2 snapshots
Signed-off-by: Sébastien Gross <seb•ɑƬ•chezwam•ɖɵʈ•org>
mintty on windows always uses pipes to connect stdout between processes
and for the terminal output. The previous implementation always assumed
that stdout connected to a pipe means that stdout is displayed on a
mintty terminal. However, this detection breaks when using pipes to
connect processes and for powershell which uses pipes when redirecting
to a file.
Now the pipe filename is queried and matched against the pattern used by
msys / cygwin when connected to the terminal. In all other cases assume
that a pipe is just a regular pipe.
Previously the progress bar / status update interval used
stdoutIsTerminal to determine whether it is possible to update the
progress bar or not. However, its implementation differed from the
detection within the backup command which included additional checks to
detect the presence of mintty on Windows. mintty behaves like a terminal
but uses pipes for communication.
This adds stdoutCanUpdateStatus() which calls the same terminal detection
code used by backup. This ensures that all commands consistently switch
between interactive and non-interactive terminal mode.
stdoutIsTerminal() now also returns true whenever stdoutCanUpdateStatus()
does so. This is required to properly handle the special case of mintty.
The `init` and `copy` commands can now use `--repository-file2` flag and
the `$RESTIC_REPOSITORY_FILE2` environment variable.
This also fixes the conflict with the `--repository-file` and `--repo2`
flag.
Tests are added for the initSecondaryGlobalOpts function.
This adds a NOK function to the test helper functions. This NOK tests if
err is not nil, and otherwise fail the test.
With the NOK function a couple of sad paths are tested in the
initSecondaryGlobalOpts function.
In total the tests checks wether the following are passed correct:
- Password
- PasswordFile
- Repo
- RepositoryFile
The following situation must return an error to pass the test:
- no Repo or RepositoryFile defined
- Repo and RepositoryFile defined both
This avoids problems when for some reason the JSON encoding changes.
This also ensures forward compatibility with future restic versions
which might e.g. add new fields to the tree metadata.
This commit changes the error message so that a list of file names is
printed. Before, just the raw map was printed, which is not a great user
interface.
The error returned when finishing the upload of an object was dropped.
This could cause silent upload failures and thus data loss in certain
cases. When a MD5 hash for the uploaded blob is specified, a wrong
hash/damaged upload would return its error via the Close() whose error
was dropped.
The azureAdapter was used directly without a pointer, but the Len()
method was only defined with a pointer receiver (which means Len() is
not present on a azureAdapter{}, only on a pointer to it).
Bugs in the error handling while uploading a file to the backend could
cause incomplete files, e.g. https://github.com/golang/go/issues/42400
which could affect the local backend.
Proactively add sanity checks which will treat an upload as failed if
the reported upload size does not match the actual file size.
Before, the scanner would could files twice if they were included in the
list of backup targets twice, e.g. `restic backup foo foo/bar` would
could the file `foo/bar` twice.
This commit uses the tree structure from the archiver to run the
scanner, so both parts see the same files.
For example `restic find --show-pack-id --blob f78dc991 5b9e4366 ddd8c7d4`
would previously only expand one blob if all of them belong to the same
file.
This assigns an id to each tree root and then keeps track of how many
tree loads (i.e. trees referenced for the first time) are pending per
tree root. Once a tree root and its subtrees were fully processed there
are no more pending tree loads and the tree root is reported as
processed.
When the tomb is created with a canceled context, then the workers
started via `t.Go` exist nearly immediately. Once for the first time all
started goroutines have been stopped, it is not allowed to issue further
calls to `t.Go`. This is a problem when the started goroutines exit
immediately, as for example the first goroutine might already have
stopped before starting the second one, which is not allowed as once the
first goroutines has stopped no goroutines were running.
To fix this race condition the startup and main task of the archiver now
also run within a `t.Go` function. This also allows unifying the error
handling as it is no longer necessary to distinguish between errors
returned by the workers or the saveTree processing. The tomb now just
returns the first error encountered, which should also be the most
descriptive one.
Depending on the used backend, operations started with a canceled
context may fail or not. For example the local backend still works in
large parts when called with a canceled context. Backends transfering
data via http don't work. It is also not possible to retry failed
operations in that state as the RetryBackend will abort with a 'context
canceled' error.
Ensure uniform behavior of all backends by checking for a canceled
context by checking for a canceled context as a first step in the
RetryBackend. This ensures uniform behavior across all backends, as
backends are always wrapped in a RetryBackend.
If the context was canceled then saveTree might receive a treeID or not
depending on the timing. This could cause saveTree to incorrectly return
a nil treeID as valid. Fix this always returning an error when the
context was canceled in the meantime.
A canceled background context lets the blob/tree/fileSavers exit
without reporting an error. The error handling previously replaced
a 'context canceled' error received by the main backup method with
the error reported by the savers. However, in case of a canceled
background context that error is nil, causing restic to loose the
error and save a snapshot with a nil tree.
The counter value needs to be aligned to 64 bit in memory for the
atomic functions to work on some platform (such as 32 bit ARM).
The atomic package says in its documentation:
> These functions require great care to be used correctly. Except for
> special, low-level applications, synchronization is better done with
> channels or the facilities of the sync package.
This commit replaces the atomic functions with a simple sync.Mutex, so
we don't have to care about alignment.
I like the idea of verifying the integrity of applications, I download from the internet. So I was very happy to see that restic does provide SHA256-checksums which are signed with the maintainers PGP key.
The only thing I miss: I could not find a direct way to download the used PGP key and verify the keys fingerprint.
Doing some searches, I found:
* https://github.com/restic/rest-server/issues/121
* https://restic.net/blog/2015-09-16/verifying-code-archive-integrity/
To help other restic users, I think you should add information about your PGP key/fingerprint to this installation doc, too. To save you some precious time, I created a draft, how this doc might be expanded, in this pull-request. You are free to accept it or change the text to your liking.
I copied the key/fingerprint text from: ``restic/restic/master/doc/090_participating.rst``
Thank you for your work in restic!
This adds support for the following environment variables, which were
previously missing:
OS_USER_ID User ID for keystone v3 authentication
OS_USER_DOMAIN_ID User domain ID for keystone v3 authentication
OS_PROJECT_DOMAIN_ID Project domain ID for keystone v3 authentication
OS_TRUST_ID Trust ID for keystone v3 authentication
The canUpdateStatus check was simplified in #2608, but it accidentally flipped
the condition. The correct check is as follows: If the output is a pipe then
restic probably runs in mintty/cygwin. In that case it's possible to
update the output status. In all other cases it isn't.
This commit inverts to condition again to offer the previous and correct
behavior.
On shutdown the backup commands waits for the terminal output goroutine
to stop. However while running in the background the goroutine ignored
the canceled context.
In #2584 this was changed to use the uid/gid of the root node. This
would be okay for the top-level directory of a snapshot, however, this
change also applied to normal directories within a snapshot. This
change reverts the problematic part and adds a test that directory
attributes are represented correctly.
When a file system is mounted at a directory, lstat() returns attributes
of the root node of the mounted file system, including the device ID of
the other file system. The previous code used when --one-file-system is
specified excluded the directory itself because of that.
This commit changes the code so that mountpoints are kept as empty
directories, its attributes set to the root note of the mounted file
system. The behavior mimics `tar`, which does the same.
This code is more strict in what it expects to find in the backend:
depending on the layout, either a directory full of files or a directory
full of such directories.
Note that this fix only solves the statistics problem, if
all duplicates are marked for repacking.
If not all duplicates are marked for repacking, we lack the
information which
The situation that not all duplicates are marked for repacking can occur
when using the `max-repack-size` option
a gs service account may only have object permissions on an existing
bucket but no bucket create/get permissions.
these service accounts currently are blocked from initialization a
restic repository because restic can not determine if the bucket exists.
this PR updates the logic to assume the bucket exists when the bucket
attribute request results in a permissions denied error.
this way, restic can still initialize a repository if the service
account does have object permissions
fixes: https://github.com/restic/restic/issues/3100
UnusedBlobs now directly reads the list of existing blobs from the
repository index. This removes the need for the blobStatusExists flag,
which in turn allows converting the blobRefs map into a BlobSet.
By construction these two errors always show up in pairs: 'size could
not be found' is printed when the blob is not found in the repository
index. That blob is also part of the `blobs` array. Later on, check
iterates over that array and checks whether the blob is marked as
existing. Which cannot be the case as that mark is generated by
iterating over the repository index.
The merged warning no longer reports the blob index within a file. That
information could also be derived by printing the affected tree using
`cat` and searching for the blob.
Updates #2659. This is a case where the stdlib will not handle EINTR for
us, even with Go 1.16. That xattr calls are directly affected can be
seen in the report for issue #2968.
Due to the return if !isFile, the IsDir branch in List was never taken
and subdirectories were traversed recursively.
Also replaced isFile by an IsRegular check, which has been equivalent
since Go 1.12 (golang/go@a2a3dd00c9).
Add a callback to the PruneOptions struct which calculates the number of
bytes allowed to be unused after prune is done. This way, the logic is
closer to the option parsing code.
Also, add an explicit option `unlimited` for the use case when storage
does not matter but bandwidth and time do. Internally, this sets the
maximum number of unused bytes to MaxUint64.
Rework the documentation slightly so that no more "packs" are
mentioned and it talks about "files" instead.
Make it clear in the documentation that the percentage given to
`--max-unused` is relative to the whole repository size after pruning is
done. If specified, it must be below 100%, otherwise the repository
would contain 100% of unused data, which is pointless.
I had a hard time coming up with the correct formula to calculate the
maximum number of unused bytes based on the number of used bytes. For a
fraction `p` (0 ≤ p < 1), a repo with `u` bytes used, and the number of
unused bytes `x` the following holds:
x ≤ p * (u+x)
⇔ x ≤ p*u + p*x
⇔ x - p*x ≤ p*u
⇔ x * (1-p) ≤ p*u
⇔ x ≤ p/(1-p) * u
The restic security model includes full trust of the local machine, so
this should not fix any actual security problems, but it's better to be
safe than sorry.
Fixes#2192.
The file permissions included a go specific directory bit which
accidentially forced the usage of the GNU header format. This leads
to problems with 7zip on Windows or when extended attributes are
used.
The VSS support works for 32 and 64-bit windows, this includes a check that
the restic version matches the OS architecture as required by VSS. The backup
operation will fail the user has not sufficient permissions to use VSS.
Snapshotting volumes also covers mountpoints but skips UNC paths.
Cache locations were documented inconsistently in three places.
The backup docs mentioned PATH being used to find fusermount, which is
never run by restic backup. It now mentions ssh and rclone, which are
used by backends.
The notion of a "system-wide" environment variable makes no sense.
TMPDIR is now mentioned because it allows for optimization and may
have security implications.
The io.Reader interface does not support contexts, such that it is
necessary to embed the context into the backendReaderAt struct. This has
the problem that a reader might suddenly stop working when it's
contained context is canceled. However, this is now problem here as the
reader instances never escape the calling function.
Now that lockRepo receives a context, it is possible that it is canceled
before a lock was created. Thus `unlockRepo` must be able to handle this
case.
The list operation used by RemoveStaleLocks or RemoveAllLocks will
already be canceled by the passed in context. Therefore we can also just
cancel the remove operation as the unlock command won't process all lock
files anyways.
This is no change in behavior as a canceled context did later on cause
the config file creation to fail. Therefore this change just lets the
repository initialization fail a bit earlier.
A pattern part containing "/" is used to mark a path or a pattern as
absolute. However, on Windows the path separator is "\" such that glob
patterns like "?" could match the marker. The code now explicitly skips
the marker when the pattern does not represent an absolute path.
We now use v4 of the module. `backoff.WithMaxRetries` no longer repeats
an operation endlessly when a retry count of 0 is specified. This
required a few fixes for the tests.
The file is already created with the proper permissions, thus the chmod
call is not critical. However, some file systems have don't allow
modifications of the file permissions. Similarly the chmod call in the Remove
operation should not prevent it from working.
Many instances of errors.Wrap in this package would produce messages
like "Open: open <filename>: no such file or directory"; those now omit
the first "Open:" (or "Stat:", or "MkdirAll"). The function readVersion
now appends its own name to the error message, rather than the function
that failed, to make it easier to spot. Other function names (e.g.,
Load) are already added further up in the call chain.
The archiver first called the Select function for a path before checking
whether the Lstat on that path actually worked. The RejectFuncs in
exclude.go worked around this by checking whether they received a nil
os.FileInfo. Checking first is more obvious and requires less code.
This removes the requirement on `restic self-update --output` to point
to a path of an existing file, to overwrite. In case the specified
path does exist we still want to verify that it's a regular file,
rather than a directory or a device, which gets overwritten.
We also want to verify that a path to a new file exists within an
existing directory. The alternative being running into that issue
after the actual download, etc has completed.
While at it I also replace `errors.Errorf` with the more appropriately
verbose `errors.Fatalf`.
Resolves#2491
As an alternative to -r, this allows to read the repository URL
from a file in order to prevent certain types of information leaks,
especially for URLs containing credentials.
Fixes#1458, fixes#2900.
When the backup is interrupted for some reason while the scanner is
still active this could lead to a deadlock. Interruptions are triggered
by canceling the context object used by both the backup progress UI and
the scanner. It is possible that a context is canceled between the
respective check in the scanner and it calling the `ReportTotal` method
of the UI. The latter method sends a message to the UI goroutine.
However, a canceled context will also stop that goroutine, which can
cause the channel send operation to block indefinitely.
This is resolved by adding a `closed` channel which is closed once the
UI goroutine is stopped and serves as an escape hatch for reported UI
updates.
This change covers not just the ReportTotal method but all potentially
affected methods of the progress UI implementation.
Makes the following corrections to the "Applying Policy:" output:
- keep the last 1 snapshots snapshots => keep 1 latest snapshots
- keep the last 1 snapshots, 3 hourly, 5 yearly snapshots => keep 1 latest, 3 hourly, 5 yearly snapshots
This allows creating multiple repositories with identical chunker
parameters which is required for working deduplication when copying
snapshots between different repositories.
This updates all dependencies which causes the following changes:
* The fuse library is set to the last version supporting macOS
* The minimal version of Go required to build restic is now 1.13
When the diff calculation compares two trees with identical id then no
differences between them can ever show up. Optimize for that case by
simply traversing the tree only once to collect all referenced blobs for
a proper calculation of added and removed blobs.
Just skipping the common subtrees is not possible as this would skew the
results if the added or removed blobs are shared with one of the
subtrees.
There was an issue that prevented the dump command from working
correctly when either:
* `/` contained multiple nodes (e.g. `restic backup /`)
* dumping a file in the first sublevel was attempted (e.g. `/foo`)
The help messages suggested that the `ls` command work without
explicitly passing a snapshot ID. However, this was never the case:
without a snapshot ID the command just failed with the error
`Ignoring "", it is not a snapshot id`.
Fixes#2299
Use the `Original` field of the copied snapshot to store a persistent
snapshot ID. This can either be the ID of the source snapshot if
`Original` was not yet set or the previous value stored in the
`Original` field. In order to still copy snapshots modified using the
tags command the source snapshot is compared to all snapshots in the
destination repository which have the same persistent ID. Snapshots are
only considered equal if all fields except `Original` and `Parent`
match. That way modified snapshots are still copied while avoiding
duplicate copies at the same time.
The standard UNIX-style ordering of command-line arguments places
optional flags before other positional arguments. All of restic's
commands support this ordering, but some of the usage strings showed the
flags after the positional arguments (which restic also parses just
fine). This change updates the doc strings to reflect the standard
ordering.
Because the `restic help` command comes directly from Cobra, there does
not appear to be a way to update the argument ordering in its usage
string, so it maintains the non-standard ordering (positional arguments
before optional flags).
Previously the directory stats were reported immediately after calling
`SaveDir`. However, as the latter method saves the tree asynchronously
the stats were still initialized to their nil value. The stats are now
reported via a callback similar to the one used for the fileSaver.
Add a copy command to copy snapshots between repositories. It allows the user
to specify a destination repository, password, password-file, password-command
or key-hint to supply the necessary details to open the destination repository.
You need to supply a list of snapshots to copy, snapshots which already exist
in the destination repository will be skipped.
Note, when using the network this becomes rather slow, as it needs to read the
blocks, decrypt them using the source key, then encrypt them again using the
destination key before finally writing them out to the destination repository.
The slicing operator `slice[low:high]` default to 0 for the lower bound and
len(slice) for the upper bound when either or both are not specified.
Fix the code to use `cap(slice)` to check for the slice capacity.
If a blob in a pack file can be decrypted successfully but contains data
that results in a different hash than stated in the header pack, then
abort repacking. As both the pack header and the blob are
cryptographically verified this either means than a malicious entity
tampered with the backup or indicates hardware problems on the client.
prune should fail with an error in both cases.
The old behavior was problematic in the context of rebuild-index as it
could leave old, possibly invalid index files behind without returning a
fatal error.
Prune calls rebuildIndex before removing any data from the repository.
For this use case failing to delete an old index MUST be treated as a
fatal error. Otherwise the index could still contain an old index file
that refers to blobs/packs that were later on deleted by prune. Later
backup runs will assume that the affected blobs already exist in the
repository which results in a backup which misses data.
The previous check only approximately verified whether all required
blobs were found. However, after forgetting a few snapshots the
repository contains lots of unused blobs whose number can be sufficient
to make up for missing packs.
When coupled with a malfunctioning backend that temporarily returns broken
data this could cause restic to regard the corresponding packs as
invalid and thereby delete data that's still in use. This change lets
restic play it safe and refuse to delete anything if data is missing.
Do not lock the repository if --no-lock global flag is set. This allows
to mount repositories which are archived on a read only system.
Signed-off-by: Sébastien Gross <seb•ɑƬ•chezwam•ɖɵʈ•org>
The seen BlobSet always contained a subset of the entries in blobs.
Thus use blobs instead and avoid the memory overhead of the second set.
Suggested-by: Alexander Weiss <alex@weissfam.de>
As the connection to the rclone child process is now closed after an
unexpected subprocess exit, later requests will cause the http2
transport to try to reestablish a new connection. As previously this never
should have happened, the connection called panic in that case. This
panic is now replaced with a simple error message, as it no longer
indicates an internal problem.
- Add Open() functionality to dir
- only access index for blobs when file is read
- Implement NodeOpener and put one-time file stuff there
- Add comment about locking as suggested by bazil.org/fuse
=> Thanks at Michael Eischer for suggesting the last two improvements
Calling `Close()` on the rclone backend sometimes failed during test
execution with 'signal: Broken pipe'. The stdio connection closed both
the stdin and stdout file descriptors at the same moment, therefore
giving rclone no chance to properly send any final http2 data frames.
Now the stdin connection to rclone is closed first and will only be
forcefully closed after a timeout. In case rclone exits before the
timeout then the stdio connection will be closed normally.
restic did not notice when the rclone subprocess exited unexpectedly.
restic manually created pipes for stdin and stdout and used these for the
connection to the rclone subprocess. The process creating a pipe gets
file descriptors for the sender and receiver side of a pipe and passes
them on to the subprocess. The expected behavior would be that reads or
writes in the parent process fail / return once the child process dies
as a pipe would now just have a reader or writer but not both.
However, this never happened as restic kept the reader and writer
file descriptors of the pipes. `cmd.StdinPipe` and `cmd.StdoutPipe`
close the subprocess side of pipes once the child process was started
and close the parent process side of pipes once wait has finished. We
can't use these functions as we need access to the raw `os.File` so just
replicate that behavior.
The test now uses the fact that the sort is stable. It's not guaranteed
to be, but the test is cleaner and more exhaustive. sortCachedPacksFirst
no longer needs a return value.
In the Google Cloud Storage backend, support specifying access tokens
directly, as an alternative to a credentials file. This is useful when
restic is used non-interactively by some other program that is already
authenticated and eliminates the need to store long lived credentials.
The access token is specified in the GOOGLE_ACCESS_TOKEN environment
variable and takes precedence over GOOGLE_APPLICATION_CREDENTIALS.
If a data blob and a tree blob with the same ID (= same content) exist,
then the checker did not report a data or tree blob as unused when the
blob of the other type was still in use.
The `DuplicateTree` flag is necessary to ensure that failures cannot be
swallowed. The old checker implementation ignores errors from LoadTree
if the corresponding tree was already checked.
Backups traverse the file tree in depth-first order and saves trees on
the way back up. This results in tree packs filled in a way comparable
to the reverse Polish notation. In order to check tree blobs in that
order, the treeFilter would have to delay the forwarding of tree nodes
until all children of it are processed which would complicate the
implementation.
Therefore do the next similar thing and traverse the tree in depth-first
order, but process trees already on the way down. The tree blob ids are
added in reverse order to the backlog, which is once again reverted when
removing the ids from the back of the backlog.
The blobRefs map and the processedTrees IDSet are merged to reduce the
memory usage. The blobRefs map now uses separate flags to track blob
usage as data or tree blob. This prevents skipping of trees whose
content is identical to an already processed data blob. A third flag
tracks whether a blob exists or not, which removes the need for the
blobs IDSet.
Even though the checkTreeWorker skips already processed chunks,
filterTrees did queue the same tree blob on every occurence. This
becomes a serious performance bottleneck for larger number of snapshots
that cover mostly the same directories. Therefore decode a tree blob
exactly once.
The backup command used to return a zero exit code as long as a snapshot
could be created successfully, even if some of the source files could not
be read (in which case the snapshot would contain the rest of the files).
This made it hard for automation/scripts to detect failures/incomplete
backups by looking at the exit code. Restic now returns the following exit
codes for the backup command:
- 0 when the command was successful
- 1 when there was a fatal error (no snapshot created)
- 3 when some source data could not be read (incomplete snapshot created)
Changes proposed in #2763:
- Adding `RESTIC_CACHE_DIR` environment variables (introduced in #2425 for Unix and #2607 for Mac, Win).
- Adding used system-wide environment variables with links to the corresponding section.
The benchmark was actually testing the speed of index lookups.
name old time/op new time/op delta
SaveAndEncrypt-8 101ns ± 2% 31505824ns ± 1% +31311591.31% (p=0.000 n=10+10)
name old speed new speed delta
SaveAndEncrypt-8 41.7TB/s ± 2% 0.0TB/s ± 1% -100.00% (p=0.000 n=10+10)
name old alloc/op new alloc/op delta
SaveAndEncrypt-8 1.00B ± 0% 20989508.40B ± 0% +2098950740.00% (p=0.000 n=10+10)
name old allocs/op new allocs/op delta
SaveAndEncrypt-8 0.00 123.00 ± 0% +Inf% (p=0.000 n=10+9)
(The actual speed is ca. 131MiB/s.)
That site might not have supported https:// when those links were
originally added. It does now.
Also dropping the _spec.html_ ending of the url, there being a `<link
rel="canonical" ...>` tag suggesting that that no longer being the
preferred address.
cmd/restic/globals.go already provides Printf, Println and Warnf wrapper
which get their output streams from the globalOptions object. This
allows for stream replacements when testing.
A side remark to the definition of Index.blob:
Another possibility would have been to use:
blob map[restic.BlobHandle]*indexEntry
This would have led to the following sizes:
key: 32 + 1 = 33 bytes
value: 8 bytes
indexEntry: 8 + 4 + 4 = 16 bytes
each packID: 32 bytes
To save N index entries, we would therefore have needed:
N * OF * (33 + 8) bytes + N * 16 + N * 32 bytes / BP = N * 82 bytes
More precicely, using a pointer instead of a direct entry is the better memory choice if:
OF * 8 bytes + entrysize < OF * entrysize <=> entrysize > 8 bytes * OF/(OF-1)
Under the assumption of OF=1.5, this means using pointers would have been the better choice
if sizeof(indexEntry) > 24 bytes.
- The SaveBlob method now checks for duplicates.
- Moves handling of pending blobs to MasterIndex.
-> also cleans up pending index entries when they are saved in the index
-> when using SaveBlob no need to care about index any longer
- Always check for full index and save it when storing packs.
-> removes the need of an index uploader
-> also removes the verbose "uploaded intermediate index" messages
- The Flush method now also saves the index
- Fix race condition when checking and saving full/non-finalized indexes
errors.Fatalf wraps a error and just keeps an error message as a string.
This prevents the `restic.IsAlreadyLocked(err)` check from working as
the error is no longer an ErrAlreadyLocked.
Just add an additional remark to the error using `errors.WithMessage`.
This command can only be built on Darwin, FreeBSD and Linux
(and if we upgrade bazil.org/fuse, only FreeBSD and Linux:
https://github.com/bazil/fuse/issues/224).
Listing the few supported operating systems explicitly here makes
porting restic to new platforms easier.
The broken poly1305 implementation for arm64 was removed, fixing the
segfault issue on arm64 (gh-2618). The version for amd64 has improved
performance, which shows up in the internal/repository benchmark:
name old speed new speed delta
SaveAndEncrypt-8 110MB/s ± 2% 113MB/s ± 1% +2.24% (p=0.000 n=20+20)
`term.Print` sends the output via a channel to a goroutine which
actually prints the message. This may race with the password prompt
printed by `OpenRepository` resulting in a missing prompt.
The Save methods of the BlobSaver, FileSaver and TreeSaver return early
on when the archiver is stopped due to an error. For that they select on
both the tomb.Dying() and context.Done() channels, which can lead to a
race condition when the tomb is killed due to an error: The tomb first
closes its Dying channel before canceling all child contexts.
Archiver.SaveDir only aborts its execution once the context was
canceled. When the tomb killing is paused between closing its Dying
channel and canceling the child contexts, this lets the
FileSaver/TreeSaver.Save methods return immediately, however, ScanDir
still reads further files causing the test case to fail.
As a killed tomb always cancels all child contexts and as the Savers
always use a context bound to the tomb, it is sufficient to just use
context.Done() as escape hatch in the Save functions. This fixes the
mismatch between SaveDir and Save.
Adjust the tests to use contexts bound to the tomb for all interactions
with the Savers.
restic uses a cleanup hook to ensure that it restores the terminal
configuration to a sane state, when restic is interrupted while reading
a password from the terminal. However, this causes a problem, when
restic runs in a background job, as reconfiguring a terminal will cause
a SIGTTOU to be sent to restic pausing it. Therefore, restic seems to
hang on shutdown when it was running in the background.
This commit changes the behavior to only restore the terminal
configuration if restic was interrupted while reading a password from
the terminal. As reading a password from the terminal requires that
restic is in the foreground, this should avoid restic getting stopped.
Fixes#2298
Issue introduced in #402
In a damaged repository with a missing blob, the error message tried to
dereference the subtreeID field of the current node, which is a file
however. Said field is set to nil for a file thus causing a segfault
when dereferenced.
Fix this by using the actual parentTreeID.
The previous implementation was repeating the implementation that is
found inside of io.WriteString. Simplify by making use of the stdlib's
implementation.
On Linux CIFS (SMB) seems to be incompatible with the async preemption
implementation of Go 1.14. CIFS seems not to restart syscalls (open,
read, chmod, readdir, ...) as expected by Go, which sets SA_RESTART for
its signal handler to have syscalls restarted automatically. This leads
to Go passing up lots of EINTR return codes to restic.
See https://github.com/restic/restic/issues/2659 for a detailed explanation.
The username and hostname for new keys can be specified with the new
--user and --host flags, respectively. The flags are used only by the
`key add` command and are otherwise ignored.
This allows adding keys with for a desired user and host without having
to run restic as that particular user on that particular host, making
automated key management easier.
Co-authored-by: James TD Smith <ahktenzero@mohorovi.cc>
The method only ever receives *hashing.Writers, which don't implement
io.Closer. These come from packerManager.findPacker and have their
actual writers closed in Repository.savePacker. Moving the closing logic
to hashing.Writer results in "file already closed" errors.
When loading a blob, restic first looks up pack files containing the
blob. To avoid unnecessary work an already cached pack file is preferred.
However, if there is only a single pack file to choose from (which is
the normal case) sorting the one-element list won't change anything.
Therefore avoid the unnecessary cache check in that case.
The previous benchmark spent much of its time allocating RNGs and
generating too many random numbers. It now spends 90% of its time
hashing and half of the rest writing to files.
name old time/op new time/op delta
PackerManager-8 319ms ± 1% 247ms ± 1% -22.48% (p=0.000 n=20+18)
name old speed new speed delta
PackerManager-8 143MB/s ± 1% 213MB/s ± 1% +48.63% (p=0.000 n=10+18)
name old alloc/op new alloc/op delta
PackerManager-8 635kB ± 0% 92kB ± 0% -85.48% (p=0.000 n=10+19)
name old allocs/op new allocs/op delta
PackerManager-8 1.64k ± 0% 1.43k ± 0% -12.76% (p=0.000 n=10+20)
Archivers TestMetadataChanged incorrectly clears the Extended Attributes
from the expected metadata of the temporary file. This is incorrect as on
SELinux enabled filesystem, as the kernel will automaticly add a SElinux
label. However, since ExtendedAttributes{} != ExtendedAttributes{nil} we
still need to clear them if there are no attributes found.
The pool was used improperly, causing more allocations to be
performed than without it.
name old time/op new time/op delta
SaveAndEncrypt-8 36.8ms ± 2% 36.9ms ± 2% ~ (p=0.218 n=10+10)
name old speed new speed delta
SaveAndEncrypt-8 114MB/s ± 2% 114MB/s ± 2% ~ (p=0.218 n=10+10)
name old alloc/op new alloc/op delta
SaveAndEncrypt-8 21.1MB ± 0% 21.0MB ± 0% -0.44% (p=0.000 n=10+10)
name old allocs/op new allocs/op delta
SaveAndEncrypt-8 79.0 ± 0% 77.0 ± 0% -2.53% (p=0.000 n=10+10)
The `dump`, `find`, `forget`, `ls`, `mount`, `restore`, `snapshots`,
`stats` and `tag` commands will now take into account multiple
`--host` and `-H` flags.
Much simpler implementation that guarantees each required pack
is downloaded only once (and hence does not need to manage
pack cache). Also improves large file restore performance.
Signed-off-by: Igor Fedorenko <igor@ifedorenko.com>
internal/archiver.readdir and internal/fs.ReadDir were unused.
internal/fs.ReadDirNames and internal/archiver.readdirnames were doing
nearly the same thing, except one sorted its output and opened with
fs.O_NOFOLLOW. Both were only used in internal/archiver.
Each of the random test files was split into the same five blobs. The
test fails once the fifth blob is passed on to `SaveBlob`. That is for
certain interleavings of goroutine execution it would be possible for
the test to trigger the testErr just after storing the first file.
The fixed test uses a different file content for each of the nine files
and fails after writing the fourth blob. The file content is also small
enough to ensure that for each file only a single blob is saved. This
guarantees that the test cannot fail before reading the first four
files. FileReadConcurrency = 2 allows up to two files queued for
processing. Therefore the test can at most open the sixth file before it
has to save the fourth file / blob which triggers the testErr.
internal/ui/jsonstatus and termstatus sound similar but are not related
in any way. Instead `internal/ui/backup` and `internal/ui/jsonstatus/status`
are the counterparts. Rename the latter to `internal/ui/json/backup` to
make this clear.
jsonstatus wrote the JSON output without synchronization to the
stdio_wrapper which caused mangling between different status lines.
Use the Print and Error methods of termstatus instead which use a
central goroutine to synchronize output.
The favicon on restic.readthedocs.io still contained the old
superman-style logo.
This changes it to the 32x32 gopher icon also used on https://forum.restic.net/ ,
just converted to .ico using Gimp.
Tested by building the documentation and opening index.html in Chrome.
The new favicon looks fine.
I was running "golangci-lint" and found this two warnings
internal/checker/checker.go:135:18: (*Checker).LoadIndex$3 - result 0 (error) is always nil (unparam)
final := func() error {
^
internal/repository/repository.go:457:18: (*Repository).LoadIndex$3 - result 0 (error) is always nil (unparam)
final := func() error {
^
It turns out that these functions are used only in "RunWorkers(...)",
which is used only two times in whole project right after this "final"
functions.
And because these "final" functions always return "nil", I've
descided, that it would be better to remove requriments for "final" func
to return error to avoid magick "return nil" at their end.
This commit adds support for cross-compiling Restic for ppc64le
as part of the release process.
Add a new file: changelog/unreleased/issue-2277
Modifies: helpers/build-release-binaries/main.go
Modifies: run_integration_tests.go
While it was a nice idea, some tests like the backend integration tests
required credentials which were not available to third-party PRs. This
lead to a lot of uncovered code which confused people. So let's disable
codecov.io for now.
The environment variable RESTIC_PASSWORD_COMMAND works but has
not been documented yet. e.g. it could contain a command that
would fetch the password from a local user keyring
enhances: https://github.com/restic/restic/pull/2094
In some (rare) cases "fake" UID 51234 may exist in a system running a
test. When this is the case, `cmp.Equal(want, node3)` will fail based on
difference between empty string and an actual username present in a
system.
Fixes github issue #2372
Before, build.go only unset GO111MODULE and GOPATH, so the Go compiler
did not see either and worked in Module mode. But if the code is checked
out below ~/go (the default GOPATH), it will detect that the source is
within GOPATH and switch to non-Module mode. Now we're setting
GO111MODULE to "on" explicitly.
Restic used to quit if the repository password was typed incorrectly once.
Restic will now ask the user again for the repository password if typed incorrectly.
The user will now get three tries to input the correct password before restic quits.
Return valid directory info from Lstat() for parent directories of the
specified filename. Previously only "/" and "." were valid directories.
Also set directory mode as this is checked by archiver.
Closes#2063
Windows does not have a concept of a `change time` in the sense as Unix
has it: the field `CreationTime` of the `Win32FileAttributeData` struct
is not updated when attributes or content is changed. So from now on
we're using the `LastWriteTime` as the `change time` on Windows.
Since I could not remember what the value for `Check` means this commit
renames it to `SameFile`: when set to true, the test should make sure
that `FileChanged` should return false (=file is unmodified).
Sometimes restic gets bogus timestamps which cannot be converted to
JSON, because the stdlib JSON encoder returns an error if the year is
not within [0, 9999]. We now make sure that we at least record _some_
timestamp and cap the year either to 0000 or 9999. Before, restic would
refuse to save the file at all, so this improves the status quo.
This fixes#2174 and #1173
The help text for `restic stats` lists a number of modes in a list.
Make sure the "more info" text is a separate paragraph rather than
being part of the list.
Converting the changelog to PDF using pandoc leads to:
! Undefined control sequence.
l.1497 ...mple, by creating a file named ``..\test
This is because \t is interpreted as a control sequence. Use ``
instead of "" to work around this.
With this change it is possible to dump a folder to stdout as a tar. The
It can be used just like the normal dump command:
`./restic dump fa97e6e1 "/data/test/" > test.tar`
Where `/data/test/` is a a folder instead of a file.
This commit is a followup to the addition of the --group-by flag for the
snapshots command. Adding the grouping code there introduced duplicated
code (the forget command also does grouping). This commit refactors
boths sides to only use shared code.
This commit moves the code which is used to group snapshots in the
snapshots command into an own function to deduplicate code shared by the
snapshots command and forget command.
This commit will add json tags to the structs for json output, so all
json variables of the snapshot command output are lowercase and
snake-case.
Furthermore it adds some internal code changes based on the feedback in
the pull request #2087.
This commit adds a --group-by option to the snapshots command, which
behaves similar to the --group-by option of forget. Valid option values
are "host, paths, tags". If this option is given, the output of
snapshots will be divided into multiple tables, according to the value
given (i.e. "host" will create a table of snapshots for each host, that
has a snapshot in the list). Also the JSON output will be grouped.
The default behavior (when --group-by is not given) has not changed.
More to this discussion can be found in issue #2037.
Reading the password from non-terminal stdin used io.ReadFull with a
byte slice of length 1000.
We are now using a Scanner to read one line of input, independent of its
length.
Additionally, if stdin is not a terminal, the password is read only
once instead of twice (in an effort to detect typos).
Fixes#2203
Signed-off-by: Peter Schultz <peter.schultz@classmarkets.com>
This commit changes the signatures for repository.LoadAndDecrypt and
utils.LoadAll to allow passing in a []byte as the buffer to use. This
buffer is enlarged as needed, and returned back to the caller for
further use.
In later commits, this allows reducing allocations by reusing a buffer
for multiple calls, e.g. in a worker function.
This patch makes it more explicit what is meant by the CACHEDIR.TAG file.
It not only has to have this particular name, but also a specific content
(described at http://bford.info/cachedir/spec.html), which is not immediately
obvious to the user.
The `s3.storage-class` option can be passed to restic (using `-o`) to
specify the storage class to be used for S3 objects created by restic.
The storage class is passed as-is to S3, so it needs to be understood by
the API. On AWS, it can be one of `STANDARD`, `STANDARD_IA`,
`ONEZONE_IA`, `INTELLIGENT_TIERING` and `REDUCED_REDUNDANCY`. If
unspecified, the default storage class is used (`STANDARD` on AWS).
You can mix storage classes in the same bucket, and the setting isn't
stored in the restic repository, so be sure to specify it with each
command that writes to S3.
Closes#706
This change allows passing no arguments to rclone, using `-o
rclone.args=""`. It is helpful when running rclone remotely via SSH
using a key with a forced command (via `command=` in `authorized_keys`).
This adds a test of the json output of the forget command, by running it
once, asking it to keep one snapshot, and verifying that the output has
the right number of snapshots listed in the Keep and Remove fields of
the result.
This commit changes the logic slightly: checking the permissions in the
fuse mount when nobody else besides the current user can access the fuse
mount does not sense. The current user has access to the repo files in
addition to the password, so they can access all data regardless of what
the fuse mount does.
Enabling `--allow-root` allows the root user to access the files in the
fuse mount, for this user no permission checks will be done anyway.
The code now enables `DefaultPermissions` automatically when
`--allow-other` is set, it can be disabled with
`--no-default-permissions` to restore the old behavior.
This commit changes the internal file system implementation for reading
data from stdin, it now returns an error when no bytes could be read. I
think it's worth failing in this case, the user instructed restic to
read some data from stdin, and no data was read at all. Maybe it was in
a pipe and some earlier stage failed.
See #2135 for a short discussion.
When restic reads the backup from stdin, the number of bytes processed
was always displayed as zero. The reason is that the UI for the archive
uses the total bytes as returned by the scanner, which is zero for
stdin. So instead we keep track of the real number of bytes processed
and print that at the end.
Closes#2136
This option restores the previous behavior of `mount` by disabling the "DefaultPermissions" FUSE option. This allows any user that can access the mountpoint to read any file from the snapshot. Normal FUSE rules apply, so `allow-root` or `allow-other` can be used to allow users besides the mounting user to access these files.
This enforces the Unix permissions of the snapshot files within the mounted filesystem, which will only allow users to access snapshot files if they had access to the file outside of the snapshot.
Make restic forget --keep-within accept time ranges measured in hours and choose
accordingly which snapshots to keep and which to forget. Add relative tests.
The default value of the `--host` flag was set to 'H' (the shorthand
version of the flag), this caused the snapshot lookup to fail.
Also add shorthand `-H` for `backup` command.
Closes#2040
Some time ago we changed the paths in the repo to always use a slash for
separation, it seems we missed that the `dump` command still uses the
`filepath` package, so on Windows backslashes are used.
Closes#2079
don't create fileInfo structs for empty files. this saves memory.
this also avoids extra serial scan of all fileInfo, which should
make restore faster and more consistent.
Signed-off-by: Igor Fedorenko <igor@ifedorenko.com>
reworked restore error callback to use file location
path instead of much heavier Node. this reduced restore
memory usage by as much as 50% in some of my tests.
Signed-off-by: Igor Fedorenko <igor@ifedorenko.com>
* uses less memory as common prefix is only stored once
* stepping stone for simpler error callback api, which
will allow further memory footprint reduction
Signed-off-by: Igor Fedorenko <igor@ifedorenko.com>
When the scanner is slower than the actual backup, the tomb cancels the
context passed to Scan(), which then returns ctx.Err(). In the end, the
main function prints an error message that is not helpful ("Context
cancelled") and exits with an error code although no error occurred.
The code now ignores the error in the context and just uses it for
cancellation. The scanner is not supposed to return an error anyway.
Closes#1978
Sometimes, users run restic without retaining the local cache
directories. This was reported several times in the past.
Restic will now print a message whenever a new cache directory is
created from scratch (i.e. it did not exist before), so users have a
chance to recognize when the cache is not kept between different runs of
restic.
With --blob, --tree and --pack, the find command now lists the snapshots
that contain a specific tree or blob, or the snapshots that contain
blobs belonging to a given pack.
It also displays the pack ID a blob belongs to.
A list of IDs can be given, as long as the IDs are all of the same type.
This commit adds a command called `self-update` which downloads the
latest released version of restic from GitHub and replacing the current
binary with it. It does not rely on any external program (so it'll work
everywhere), but still verifies the GPG signature using the embedded GPG
public key.
By default, the `self-update` command is hidden behind the `selfupdate`
built tag, which is only set when restic is built using `build.go`. The
reason for this is that downstream distributions will then not include
the command by default, so users are encouraged to use the
platform-specific distribution mechanism.
This commit introduces two functions: withinDir() and
approachingMatchingTree()
Both bind the list of directories with a closure, so we don't need to
iterate over the list in the function passed to Walk(). This reduces the
indentation level and since we can just use return, we don't need the
breaks any more.
The case that len(dirs) == 0 can also be handled by the functions with a
return, which saves another indentation level.
The main function body of the function passed to Walk() was reduced to
three cases:
* Within one of the dirs: Print the node, and if recursive operation is
requested, directly return, so the walker continues recursive
traversal
* Approaching one of the dirs: don't print anything, but continue
recursive traversal.
* Nothing of the two: abort walking this branch of the tree.
Adds a SelectByName method to the archive and scanner which only require
the filename as input, and can thus be run before calling lstat on the
file. Can speed up scanning significantly if a lot of filename excludes
are used.
Accessing beyond the end of the file now removes the file from the cache
because it is assumed to be truncated. Usually, this means that the data
is fetched directly from the backend instead.
So, `dep` got an nice new feature to remove tests and non-go files from
`vendor/`, and this brings the size of the vendor directory from ~300MiB
down to ~20MiB. We don that now.
I went pretty loud with this, but I think the performance is bad enough
that it's really worth highlighting, especially since it locks the index
during the prune.
Adds a small warning indicating that b2 bucket names need to be unique. It's an easy mistake to make, and it's surprising to get the following error if you're not accustomed to the way B2 works:
Fatal: create repository at b2:postgres failed: NewBucket: b2_create_bucket: 400: Bucket name is already in use
traverseTree() was meant to call enterDir() whenever a directory is
selected for restore, either explicitly or implicitly (=contains a file
which is to be restored). After restoring a file, leaveDir() is called
in reverse order for all intermediate directories so that the metadata
can be restored.
When a directory is selected implicitly, the metadata for it is
restored. This is different from the previous restorer behavior, which
created implicitly selected intermediate directories with permissions
0700 (only user can read/write it).
This commit changes the behavior back to the old one. Only a directory
is explicitly selected for restore, enterDir()/leaveDir() are called for
it. Otherwise, only visitNode() is called, so visitNode() needs to make
sure the parent directory exists. If the directory is explicitly
included, leaveDir() will then restore the metadata correctly.
When we decide to change the behavior (restore metadata for all
intermediate directories, even if selected implicitly), we should do
that in the selection functions, not here.
This finally resolves#1870
When I backup one of my filesystems which has a lot of Hard Links (Backup directory of burp) the live status shows me 4.5 TB but it only takes up 1.2 TB of space in the repository. This is confusing because my repo is on S3 and I feared a huge Bill. This change should clarify this.
If this PR resolves an issue on GitHub, use "Closes #1234" so that the issue
is closed automatically when this PR is merged.
-->
### Checklist
Checklist
---------
- [ ] I have read the [Contribution Guidelines](https://github.com/restic/restic/blob/master/CONTRIBUTING.md#providing-patches)
- [ ] I have added tests for all changes in this PR
- [ ] I have added documentation for the changes (in the manual)
- [ ] There's a new file in `changelog/unreleased/` that describes the changes for our users (template [here](https://github.com/restic/restic/blob/master/changelog/TEMPLATE))
- [ ] I have run `gofmt` on the code in all commits
- [ ] All commit messages are formatted in the same style as [the other commits in the repo](https://github.com/restic/restic/blob/master/CONTRIBUTING.md#git-commits)
- [ ] I'm done, this Pull Request is ready for review
<!--
You do not need to check all the boxes below all at once. Feel free to take
your time and add more commits. If you're done and ready for review, please
check the last box. Enable a checkbox by replacing [ ] with [x].
-->
- [ ] I have read the [contribution guidelines](https://github.com/restic/restic/blob/master/CONTRIBUTING.md#providing-patches).
- [ ] I have [enabled maintainer edits](https://help.github.com/en/github/collaborating-with-issues-and-pull-requests/allowing-changes-to-a-pull-request-branch-created-from-a-fork).
- [ ] I have added tests for all code changes.
- [ ] I have added documentation for relevant changes (in the manual).
- [ ] There's a new file in `changelog/unreleased/` that describes the changes for our users (see [template](https://github.com/restic/restic/blob/master/changelog/TEMPLATE)).
- [ ] I have run `gofmt` on the code in all commits.
- [ ] All commit messages are formatted in the same style as [the other commits in the repo](https://github.com/restic/restic/blob/master/CONTRIBUTING.md#git-commits).
- [ ] I'm done! This pull request is ready for review.
or [good first issue](https://github.com/restic/restic/labels/help%3A%20good%20first%20issue).
If you are already a bit experienced with the restic internals, take a look
at the issues tagged as [help wanted](https://github.com/restic/restic/labels/help%3A%20wanted).
Reporting Bugs
@@ -45,9 +48,8 @@ environment was used and so on. Please tell us at least the following things:
Remember, the easier it is for us to reproduce the bug, the earlier it will be
corrected!
In addition, you can compile restic with debug support by running
`go run build.go -tags debug` and instructing it to create a debug log by
setting the environment variable `DEBUG_LOG` to a file, e.g. like this:
In addition, you can instruct restic to create a debug log by setting the
environment variable `DEBUG_LOG` to a file, e.g. like this:
$ export DEBUG_LOG=/tmp/restic-debug.log
$ restic backup ~/work
@@ -60,35 +62,25 @@ uploading it somewhere or post only the parts that are really relevant.
Development Environment
=======================
In order to compile restic with the `go` tool directly, it needs to be checked
out at the right path within a `GOPATH`. The concept of a `GOPATH` is explained
in ["How to write Go code"](https://golang.org/doc/code.html).
The repository contains the code written for restic in the directories
`cmd/` and `internal/`.
If you do not have a directory with Go code yet, executing the following
instructions in your shell will create one for you and check out the restic
repo:
Make sure you have the minimum required Go version installed. Clone the repo
(without having `$GOPATH` set) and `cd` into the directory:
$ export GOPATH="$HOME/go"
$ mkdir -p "$GOPATH/src/github.com/restic"
$ cd "$GOPATH/src/github.com/restic"
$ unset GOPATH
$ git clone https://github.com/restic/restic
$ cd restic
You can then build restic as follows:
Then use the `go` tool to build restic:
$ go build ./cmd/restic
$ ./restic version
restic compiled manually
compiled with go1.8.3 on linux/amd64
restic 0.14.0-dev (compiled manually) compiled with go1.19 on linux/amd64
The following commands can be used to run all the tests:
You can run all tests with the following command:
$ go test ./cmd/... ./internal/...
The repository contains two sets of directories with code: `cmd/` and
`internal/` contain the code written for restic, whereas `vendor/` contains
copies of libraries restic depends on. The libraries are managed with the
[`dep`](https://github.com/golang/dep) tool.
$ go test ./...
Providing Patches
=================
@@ -102,21 +94,19 @@ down to the following steps:
GitHub. For a new feature, please add an issue before starting to work on
it, so that duplicate work is prevented.
1.First we would kindly ask you to fork our project on GitHub if you haven't
done so already.
1.Next, fork our project on GitHub if you haven't done so already.
2. Clone the repository locally and create a new branch. If you are working on
the code itself, please set up the development environment as described in
the previous section. Especially take care to place your forked repository
at the correct path (`src/github.com/restic/restic`) within your `GOPATH`.
2. Clone your fork of the repository locally and **create a new branch** for
your changes. If you are working on the code itself, please set up the
development environment as described in the previous section.
3.Then commit your changes as fine grained as possible, as smaller patches,
that handle one and only one issue are easier to discuss and merge.
3.Commit your changes to the new branch as fine grained as possible, as
smaller patches, for individual changes, are easier to discuss and merge.
4. Push the new branch with your changes to your fork of the repository.
5. Create a pull request by visiting the GitHub website, it will guide you
through the process.
through the process. Please [allow edits from maintainers](https://help.github.com/en/github/collaborating-with-issues-and-pull-requests/allowing-changes-to-a-pull-request-branch-created-from-a-fork).
6. You will receive comments on your code and the feature or bug that they
address. Maybe you need to rework some minor things, in this case push new
@@ -124,30 +114,44 @@ down to the following steps:
existing commit, use common sense to decide which is better), they will be
automatically added to the pull request.
7. If your pull request changes anything that users should be aware of (a
bugfix, a new feature, ...) please add an entry to the file
['CHANGELOG.md'](CHANGELOG.md). It will be used in the announcement of the
next stable release. While writing, ask yourself: If I were the user, what
would I need to be aware of with this change.
7. If your pull request changes anything that users should be aware of
(a bugfix, a new feature, ...) please add an entry as a new file in
`changelog/unreleased` including the issue number in the filename (e.g.
`issue-8756`). Use the template in `changelog/TEMPLATE` for the content.
It will be used in the announcement of the next stable release. While
writing, ask yourself: If I were the user, what would I need to be aware
of with this change?
8.Once your code looks good and passes all the tests, we'll merge it. Thanks
8.Do not edit the man pages under `doc/man` or `doc/manual_rest.rst` -
these are autogenerated before new releases.
9. Once your code looks good and passes all the tests, we'll merge it. Thanks
a lot for your contribution!
Please provide the patches for each bug or feature in a separate branch and
open up a pull request for each.
open up a pull request for each, as this simplifies discussion and merging.
The restic project uses the `gofmt` tool for Go source indentation, so please
run
gofmt -w **/*.go
in the project root directory before committing. Installing the script
`fmt-check` from https://github.com/edsrzf/gofmt-git-hook locally as a
pre-commit hook checks formatting before committing automatically, just copy
this script to `.git/hooks/pre-commit`.
in the project root directory before committing. For each Pull Request, the
formatting is tested with `gofmt` for the latest stable version of Go.
Installing the script `fmt-check` from https://github.com/edsrzf/gofmt-git-hook
locally as a pre-commit hook checks formatting beforecommitting automatically,
just copy this script to `.git/hooks/pre-commit`.
The project is using the program
[`golangci-lint`](https://github.com/golangci/golangci-lint) to run a list of
linters and checkers. It will be run on the code when you submit a PR. In order
to check your code beforehand, you can run `golangci-lint run` manually.
Eventually, we will enable `golangci-lint` for the whole code base. For now,
you can ignore warnings printed for lines you did not modify, those will be
ignored by the CI.
For each pull request, several different systems run the integration tests on
Linux, OS X and Windows. We won't merge any code that does not pass all tests
Linux, macOS and Windows. We won't merge any code that does not pass all tests
for all systems, so when a tests fails, try to find out what's wrong and fix
it. If you need help on this, please leave a comment in the pull request, and
we'll be glad to assist. Having a PR with failing integration tests is nothing
@@ -164,7 +168,7 @@ history and triaging bugs much easier.
Git commit messages have a very terse summary in the first line of the commit
message, followed by an empty line, followed by a more verbose description or a
List of changed things. For examples, please refer to the excellent [How to
Write a Git Commit Message](http://chris.beams.io/posts/git-commit/).
Write a Git Commit Message](https://chris.beams.io/posts/git-commit/).
If you change/add multiple different things that aren't related at all, try to
make several smaller commits. This is much easier to review. Using `git add -p`
restic is a backup program that is fast, efficient and secure. It supports the three major operating systems (Linux, macOS, Windows) and a few smaller ones (FreeBSD, OpenBSD).
For detailed usage and installation instructions check out the [documentation](https://restic.readthedocs.io/en/latest).
You can ask questions in our [Discourse forum](https://forum.restic.net).
Quick start
-----------
Once you've [installed](https://restic.readthedocs.io/en/latest/020_installation.html) restic, start
off with creating a repository for your backups:
$ restic init --repo /tmp/backup
enter password for new backend:
enter password again:
created restic backend 085b3c76b9 at /tmp/backup
Please note that knowledge of your password is required to access the repository.
Losing your password means that your data is irrecoverably lost.
- [Amazon S3](https://restic.readthedocs.io/en/latest/030_preparing_a_new_repo.html#amazon-s3) (either from Amazon or using the [Minio](https://minio.io) server)
- And many other services via the [rclone](https://rclone.org) [Backend](https://restic.readthedocs.io/en/latest/030_preparing_a_new_repo.html#other-services-via-rclone)
# Design Principles
Restic is a program that does backups right and was designed with the
following principles in mind:
-**Easy:** Doing backups should be a frictionless process, otherwise
you might be tempted to skip it. Restic should be easy to configure
and use, so that, in the event of a data loss, you can just restore
it. Likewise, restoring data should not be complicated.
-**Fast**: Backing up your data with restic should only be limited by
your network or hard disk bandwidth so that you can backup your files
every day. Nobody does backups if it takes too much time. Restoring
backups should only transfer data that is needed for the files that
are to be restored, so that this process is also fast.
-**Verifiable**: Much more important than backup is restore, so restic
enables you to easily verify that all data can be restored.
-**Secure**: Restic uses cryptography to guarantee confidentiality and
integrity of your data. The location the backup data is stored is
assumed not to be a trusted environment (e.g. a shared space where
others like system administrators are able to access your backups).
Restic is built to secure your data against such attackers.
-**Efficient**: With the growth of data, additional snapshots should
only take the storage of the actual increment. Even more, duplicate
data should be de-duplicated before it is actually written to the
storage back end to save precious backup space.
# Reproducible Builds
The binaries released with each restic version starting at 0.6.1 are
[reproducible](https://reproducible-builds.org/), which means that you can
reproduce a byte identical version from the source code for that
release. Instructions on how to do that are contained in the
-`sftp server (via SSH) <https://restic.readthedocs.io/en/latest/030_preparing_a_new_repo.html#sftp>`__
-`HTTP REST server <https://restic.readthedocs.io/en/latest/030_preparing_a_new_repo.html#rest-server>`__ (`protocol <doc/100_references.rst#rest-backend>`__`rest-server <https://github.com/restic/rest-server>`__)
-`AWS S3 <https://restic.readthedocs.io/en/latest/030_preparing_a_new_repo.html#amazon-s3>`__ (either from Amazon or using the `Minio <https://minio.io>`__ server)
-`OpenStack Swift <https://restic.readthedocs.io/en/latest/030_preparing_a_new_repo.html#openstack-swift>`__
- And many other services via the `rclone <https://rclone.org>`__`Backend <https://restic.readthedocs.io/en/latest/030_preparing_a_new_repo.html#other-services-via-rclone>`__
Design Principles
-----------------
Restic is a program that does backups right and was designed with the
following principles in mind:
-**Easy:** Doing backups should be a frictionless process, otherwise
you might be tempted to skip it. Restic should be easy to configure
and use, so that, in the event of a data loss, you can just restore
it. Likewise, restoring data should not be complicated.
-**Fast**: Backing up your data with restic should only be limited by
your network or hard disk bandwidth so that you can backup your files
every day. Nobody does backups if it takes too much time. Restoring
backups should only transfer data that is needed for the files that
are to be restored, so that this process is also fast.
-**Verifiable**: Much more important than backup is restore, so restic
enables you to easily verify that all data can be restored.
-**Secure**: Restic uses cryptography to guarantee confidentiality and
integrity of your data. The location the backup data is stored is
assumed not to be a trusted environment (e.g. a shared space where
others like system administrators are able to access your backups).
Restic is built to secure your data against such attackers.
-**Efficient**: With the growth of data, additional snapshots should
only take the storage of the actual increment. Even more, duplicate
data should be de-duplicated before it is actually written to the
storage back end to save precious backup space.
Reproducible Builds
-------------------
The binaries released with each restic version starting at 0.6.1 are
`reproducible <https://reproducible-builds.org/>`__, which means that you can
easily reproduce a byte identical version from the source code for that
release. Instructions on how to do that are contained in the
Enhancement: Add release binaries for MIPS architectures
We've added a few new architectures for Linux to the release binaries: `mips`,
`mipsle`, `mips64`, and `mip64le`. MIPS is mostly used for low-end embedded
systems.
https://github.com/restic/restic/issues/3191
https://github.com/restic/restic/pull/3208
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.