Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zfs_autobackup and TrueNAS compatibility issues with mounted datasets #254

Open
M4stakilla opened this issue May 21, 2024 · 3 comments
Open

Comments

@M4stakilla
Copy link

M4stakilla commented May 21, 2024

Hi,

I'm using your script on TrueNAS Scale Dragonfish and found 2 issues when replicating to a mounted dataset.

After running the script with --debug and trying all commands one by one manually, I was able to figure out the missing commands to make things work properly.
This is the zfs_autobackup command I used for my replication:
root@truenas-backup:~# autobackup-venv/bin/python -m zfs_autobackup.ZfsAutobackup --verbose --debug --ssh-config ../../../.ssh/config --ssh-target truenas-master --rollback --keep-source=0 --keep-target=0 --allow-empty --snapshot-format {}-%Y-%m-%d_%H-%M --zfs-compressed --decrypt --clear-refreservation --strip-path 2 --exclude-received --other-snapshots test master-pool/encrypted-ds

Below a summary of the individual commands that are required to make it work on TrueNAS (I skipped all the snapshot, hold, release commands during my testing, as they are not relevant for these issues):

1. (zfs send -L -e -c -v -P backup-pool/encrypted-ds/test3-ds@manual-2024-05-21_10-22) | (ssh -F ../../../.ssh/config truenas-master 'zfs recv -u -x refreservation -v -s master-pool/encrypted-ds/test3-ds')
2. (ssh -F ../../../.ssh/config truenas-master 'zfs mount master-pool/encrypted-ds/test3-ds')
3. (ssh -F ../../../.ssh/config truenas-master 'zfs rollback master-pool/encrypted-ds/test3-ds@manual-2024-05-21_10-22')
4. (zfs send -L -e -c -v -P -i @manual-2024-05-21_10-22 backup-pool/encrypted-ds/test3-ds@manual-2024-05-21_10-35) | (ssh -F ../../../.ssh/config truenas-master 'zfs recv -u -x refreservation -v -s master-pool/encrypted-ds/test3-ds')
5. (ssh -F ../../../.ssh/config truenas-master 'zfs umount master-pool/encrypted-ds/test3-ds')
6. (ssh -F ../../../.ssh/config truenas-master 'zfs mount master-pool/encrypted-ds/test3-ds')
  1. (Done correctly by zfs_autobackup) This is the initial full replication and it works fine
  2. (Done correctly by zfs_autobackup) This is the mount of the target dataset and works well, since my source dataset has canmount=on and I want to actually mount it on the target as well (I'm only replicating datasets in the 3th level, so if I understood your warnings for this well, I shouldn't run into issues)
  3. (Missing in zfs_autobackup!) Apparently TrueNAS modifies something on the dataset after mounting it. So I need to rollback after the mount otherwise I'll get the error "! [Target] STDERR > cannot receive incremental stream: destination master-pool/encrypted-ds/test3-ds has been modified"
  4. (Done correctly by zfs_autobackup) This is the incremental replication of the 2nd snapshot and works well
  5. and 6. (Missing in zfs_autobackup!) When replicating over a mounted dataset in TrueNAS, TrueNAS will throw 3 different kinds of errors randomly, which can be resolved by unmounting and remounting the dataset.

Below the 3 errors that TrueNAS throws when not remounting the dataset:

[EFAULT] Failed retreiving USER quotas for master-pool/encrypted-ds/test-ds

 Error: concurrent.futures.process._RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/middlewared/plugins/zfs_/dataset_quota.py", line 76, in get_quota
    with libzfs.ZFS() as zfs:
  File "libzfs.pyx", line 529, in libzfs.ZFS.__exit__
  File "/usr/lib/python3/dist-packages/middlewared/plugins/zfs_/dataset_quota.py", line 78, in get_quota
    quotas = resource.userspace(quota_props)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "libzfs.pyx", line 3651, in libzfs.ZFSResource.userspace
libzfs.ZFSException: cannot get used/quota for master-pool/encrypted-ds/test-ds: unsupported version or feature

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.11/concurrent/futures/process.py", line 256, in _process_worker
    r = call_item.fn(*call_item.args, **call_item.kwargs)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/worker.py", line 112, in main_worker
    res = MIDDLEWARE._run(*call_args)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/worker.py", line 46, in _run
    return self._call(name, serviceobj, methodobj, args, job=job)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/worker.py", line 34, in _call
    with Client(f'ws+unix://{MIDDLEWARE_RUN_DIR}/middlewared-internal.sock', py_exceptions=True) as c:
  File "/usr/lib/python3/dist-packages/middlewared/worker.py", line 40, in _call
    return methodobj(*params)
           ^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/plugins/zfs_/dataset_quota.py", line 80, in get_quota
    raise CallError(f'Failed retreiving {quota_type} quotas for {ds}')
middlewared.service_exception.CallError: [EFAULT] Failed retreiving USER quotas for master-pool/encrypted-ds/test-ds
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 198, in call_method
    result = await self.middleware.call_with_audit(message['method'], serviceobj, methodobj, params, self)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1466, in call_with_audit
    result = await self._call(method, serviceobj, methodobj, params, app=app,
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1417, in _call
    return await methodobj(*prepared_call.args)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/schema/processor.py", line 187, in nf
    return await func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/plugins/pool_/dataset_quota_and_perms.py", line 225, in get_quota
    quota_list = await self.middleware.call(
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1564, in call
    return await self._call(
           ^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1425, in _call
    return await self._call_worker(name, *prepared_call.args)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1431, in _call_worker
    return await self.run_in_proc(main_worker, name, args, job)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1337, in run_in_proc
    return await self.run_in_executor(self.__procpool, method, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1321, in run_in_executor
    return await loop.run_in_executor(pool, functools.partial(method, *args, **kwargs))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
middlewared.service_exception.CallError: [EFAULT] Failed retreiving USER quotas for master-pool/encrypted-ds/test-ds

[EFAULT] Failed retreiving GROUP quotas for master-pool/encrypted-ds/test-ds

 Error: concurrent.futures.process._RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/middlewared/plugins/zfs_/dataset_quota.py", line 76, in get_quota
    with libzfs.ZFS() as zfs:
  File "libzfs.pyx", line 529, in libzfs.ZFS.__exit__
  File "/usr/lib/python3/dist-packages/middlewared/plugins/zfs_/dataset_quota.py", line 78, in get_quota
    quotas = resource.userspace(quota_props)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "libzfs.pyx", line 3651, in libzfs.ZFSResource.userspace
libzfs.ZFSException: cannot get used/quota for master-pool/encrypted-ds/test-ds: dataset is busy

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.11/concurrent/futures/process.py", line 256, in _process_worker
    r = call_item.fn(*call_item.args, **call_item.kwargs)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/worker.py", line 112, in main_worker
    res = MIDDLEWARE._run(*call_args)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/worker.py", line 46, in _run
    return self._call(name, serviceobj, methodobj, args, job=job)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/worker.py", line 34, in _call
    with Client(f'ws+unix://{MIDDLEWARE_RUN_DIR}/middlewared-internal.sock', py_exceptions=True) as c:
  File "/usr/lib/python3/dist-packages/middlewared/worker.py", line 40, in _call
    return methodobj(*params)
           ^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/plugins/zfs_/dataset_quota.py", line 80, in get_quota
    raise CallError(f'Failed retreiving {quota_type} quotas for {ds}')
middlewared.service_exception.CallError: [EFAULT] Failed retreiving GROUP quotas for master-pool/encrypted-ds/test-ds
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 198, in call_method
    result = await self.middleware.call_with_audit(message['method'], serviceobj, methodobj, params, self)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1466, in call_with_audit
    result = await self._call(method, serviceobj, methodobj, params, app=app,
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1417, in _call
    return await methodobj(*prepared_call.args)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/schema/processor.py", line 187, in nf
    return await func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/plugins/pool_/dataset_quota_and_perms.py", line 225, in get_quota
    quota_list = await self.middleware.call(
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1564, in call
    return await self._call(
           ^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1425, in _call
    return await self._call_worker(name, *prepared_call.args)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1431, in _call_worker
    return await self.run_in_proc(main_worker, name, args, job)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1337, in run_in_proc
    return await self.run_in_executor(self.__procpool, method, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1321, in run_in_executor
    return await loop.run_in_executor(pool, functools.partial(method, *args, **kwargs))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
middlewared.service_exception.CallError: [EFAULT] Failed retreiving GROUP quotas for master-pool/encrypted-ds/test-ds

[ENOENT] Path /mnt/master-pool/encrypted-ds/test-ds not found

 Error: Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 198, in call_method
    result = await self.middleware.call_with_audit(message['method'], serviceobj, methodobj, params, self)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1466, in call_with_audit
    result = await self._call(method, serviceobj, methodobj, params, app=app,
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1428, in _call
    return await self.run_in_executor(prepared_call.executor, methodobj, *prepared_call.args)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1321, in run_in_executor
    return await loop.run_in_executor(pool, functools.partial(method, *args, **kwargs))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/schema/processor.py", line 191, in nf
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/schema/processor.py", line 53, in nf
    res = f(*args, **kwargs)
          ^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/plugins/filesystem.py", line 422, in stat
    raise CallError(f'Path {_path} not found', errno.ENOENT)
middlewared.service_exception.CallError: [ENOENT] Path /mnt/master-pool/encrypted-ds/test-ds not found
@psy0rz
Copy link
Owner

psy0rz commented May 23, 2024

For point 3 try zfs-autobackup v3.3 beta. Maybe this also fixes 6, but maybe thats just a truenas issue?

See https://github.com/psy0rz/zfs_autobackup/wiki/Mounting

@M4stakilla
Copy link
Author

M4stakilla commented May 23, 2024

Hi psy0rz,

Thanks a lot for your response. I do agree it is a bit tricky regarding whose issue this is (TrueNAS or zfs-autobackup). I will try to raise a TrueNAS bugreport for this as well later (but I fear they'll just dismiss it because I'm not supposed to run external scripts to do this on "an appliance").

I actually did do all my latest testing with v3.3 beta 3.
zfs-autobackup_debug.log

Attached is the output of my 2 --debug runs out of which I've extracted all manual commands to perform my testing.

'3.' wasn't fixed in this version I'm afraid (see the first run - it only complained that the dataset was modified, but doesn't roll it back).
'6.' can only be done if '5.' is done first, so I'm afraid also this isn't resolved.

Thanks a lot for your hard work btw. Much appreciated. I first started scripting this myself in bash, but I could have never done it as good as you've done over the years ;)

I did create a bash wrapper script already that works around issue 5. and 6. Issue 3. I cannot work around, as it happens in the middle of your script (but also only occurs the first initial replication)

Example below:

#!/usr/bin/bash

DIRECTION="master_to_backup"
SCOPE="all_snapshots"

TASK="${DIRECTION}_${SCOPE}"
SSH_CONFIG_FILE="../../../.ssh/config"
SSH_TARGET="truenas-master"
REMOTE_CMD="ssh -F ${SSH_CONFIG_FILE} ${SSH_TARGET}"

ZFS_AUTOBACKUP_FOLDER="zfs_autobackup-latest"
LOG_FILE="../../../logs/backup/zfs_autobackup-${TASK}.$(date +"%Y-%m-%d_%H-%M").log"
ZFS_AUTOBACKUP_COMMAND="autobackup-venv/bin/python -m zfs_autobackup.ZfsAutobackup"

SNAPSHOT_OPTARGS="--rollback --keep-source=0 --keep-target=0 --allow-empty --snapshot-format {}-%Y-%m-%d_%H-%M"
ZFS_OPTARGS="--zfs-compressed --decrypt --clear-refreservation"
ZFS_AUTOBACKUP_OPTARGS="--strip-path 2 --exclude-received"

cd "$(dirname -- "${BASH_SOURCE[0]}")/${ZFS_AUTOBACKUP_FOLDER}"

if [[ "${DIRECTION}" == "backup_to_master" ]]; then
  SSH_OPTARGS="--ssh-config "${SSH_CONFIG_FILE}" --ssh-target ${SSH_TARGET}"
  
  TARGET_PARENT_DATASET="master-pool/encrypted-ds"
  IMPACTED_DATASETS="$(zfs list -H | awk '{print $1}' | xargs zfs get all | grep " autobackup:${TASK} " | awk '{print $1}' | xargs basename -a 2>/dev/null)"
  UMOUNT_CMD="$(for IMPACTED_DATASET in ${IMPACTED_DATASETS}; do echo -n "${REMOTE_CMD} 'zfs umount ${TARGET_PARENT_DATASET}/${IMPACTED_DATASET}' ; "; done)"
  MOUNT_CMD="$(for IMPACTED_DATASET in ${IMPACTED_DATASETS}; do echo -n "${REMOTE_CMD} 'zfs mount ${TARGET_PARENT_DATASET}/${IMPACTED_DATASET}' ; "; done)"
elif [[ "${DIRECTION}" == "master_to_backup" ]]; then
  SSH_OPTARGS="--ssh-config "${SSH_CONFIG_FILE}" --ssh-source ${SSH_TARGET}"
  
  TARGET_PARENT_DATASET="backup-pool/encrypted-ds"
  IMPACTED_DATASETS="$(${REMOTE_CMD} "zfs list -H | awk '{print \$1}' | xargs zfs get all | grep \" autobackup:${TASK} \" | awk '{print \$1}' | xargs basename -a 2>/dev/null")"
  UMOUNT_CMD="$(for IMPACTED_DATASET in ${IMPACTED_DATASETS}; do echo -n "zfs umount ${TARGET_PARENT_DATASET}/${IMPACTED_DATASET} ; "; done)"
  MOUNT_CMD="$(for IMPACTED_DATASET in ${IMPACTED_DATASETS}; do echo -n "zfs mount ${TARGET_PARENT_DATASET}/${IMPACTED_DATASET} ; "; done)"
else
  echo "'${DIRECTION}' is not a valid \$DIRECTION!"
  exit 1
fi

if [[ "${SCOPE}" == "all_snapshots" ]]; then
  ZFS_AUTOBACKUP_TASK_OPTARGS="--other-snapshots ${TASK} ${TARGET_PARENT_DATASET}"
elif [[ "${SCOPE}" == "latest_snapshot_only" ]]; then
  ZFS_AUTOBACKUP_TASK_OPTARGS=" ${TASK} ${TARGET_PARENT_DATASET}"
else
  echo "'${SCOPE}' is not a valid \$SCOPE!"
  exit 1
fi

[[ ! -d "$(dirname "${LOG_FILE}")" ]] && mkdir "$(dirname "${LOG_FILE}")"

echo "zfs-autobackup command = ${ZFS_AUTOBACKUP_COMMAND} --verbose ${SSH_OPTARGS} ${SNAPSHOT_OPTARGS} ${ZFS_OPTARGS} ${ZFS_AUTOBACKUP_OPTARGS} ${ZFS_AUTOBACKUP_TASK_OPTARGS}" >"${LOG_FILE}"
echo "umount command = ${UMOUNT_CMD}" >>"${LOG_FILE}"
echo "mount command = ${MOUNT_CMD}" >>"${LOG_FILE}"
echo >>"${LOG_FILE}"

nohup /usr/bin/bash -c \
  "{ ${ZFS_AUTOBACKUP_COMMAND} --verbose ${SSH_OPTARGS} ${SNAPSHOT_OPTARGS} ${ZFS_OPTARGS} ${ZFS_AUTOBACKUP_OPTARGS} ${ZFS_AUTOBACKUP_TASK_OPTARGS} ; ${UMOUNT_CMD} ${MOUNT_CMD} }" \
  >>"${LOG_FILE}" 2>&1 \
  & tail -f "${LOG_FILE}"

@psy0rz
Copy link
Owner

psy0rz commented Aug 13, 2024

Try setting canmount=noauto on the target datasets to see if this solves the issue. zfs problaby tries to mount the datasets, and creates a mountpoint directory in a higher dataset. (e.g. https://github.com/psy0rz/zfs_autobackup/wiki/Mounting#target-side)

Also use the --clear-mountpoint option for zfs-autobackup to automatically set canmount=noauto.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants