Fix rollback disk snapshots on instance snapshot failure#12949
Fix rollback disk snapshots on instance snapshot failure#12949sureshanaparti wants to merge 2 commits intoapache:4.22from
Conversation
|
@blueorangutan package |
|
@sureshanaparti a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress. |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## 4.22 #12949 +/- ##
============================================
- Coverage 17.60% 17.60% -0.01%
+ Complexity 15677 15676 -1
============================================
Files 5918 5918
Lines 531681 531685 +4
Branches 65005 65006 +1
============================================
- Hits 93623 93616 -7
- Misses 427498 427509 +11
Partials 10560 10560
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ el10 ✔️ debian ✔️ suse15. SL-JID 17339 |
There was a problem hiding this comment.
Pull request overview
Fixes cleanup/rollback of per-volume disk snapshots when an instance (VM) snapshot operation fails, preventing orphaned snapshots and incorrect snapshot resource counts (issue #12927).
Changes:
- Track created disk snapshots for rollback using a
Map<volumeId, SnapshotInfo>to ensure snapshots are rolled back even if creation fails mid-way. - Harden rollback logic with null checks and improve rollback logging.
- Update the KVM VM snapshot strategy unit test to match the new
createDiskSnapshotmethod signature.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| engine/storage/snapshot/src/main/java/org/apache/cloudstack/storage/vmsnapshot/StorageVMSnapshotStrategy.java | Track snapshots for rollback via volume-id map; ensure rollback attempts occur on failures; add null-safety/logging improvements. |
| engine/storage/snapshot/src/test/java/org/apache/cloudstack/storage/vmsnapshot/VMSnapshotStrategyKVMTest.java | Adjust unit test to pass the new rollback map parameter to createDiskSnapshot. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| @@ -456,7 +464,7 @@ protected SnapshotInfo createDiskSnapshot(VMSnapshot vmSnapshot, List<SnapshotIn | |||
| if (snapshotInfo == null) { | |||
| throw new CloudRuntimeException("Failed to create snapshot"); | |||
| } else { | |||
There was a problem hiding this comment.
volumeToSnapshotInfoMapForRollback.put(vol.getId(), snapshotInfo) is performed twice (before and after takeSnapshot). If the intent is to ensure rollback coverage when takeSnapshot throws, keep the pre-takeSnapshot put, but drop the else block and only overwrite the map entry if takeSnapshot returns a different SnapshotInfo instance. This removes redundant writes and makes the intent clearer.
| } else { | |
| } | |
| if (snapshotInfo != volumeToSnapshotInfoMapForRollback.get(vol.getId())) { |
| @@ -448,6 +455,7 @@ protected SnapshotInfo createDiskSnapshot(VMSnapshot vmSnapshot, List<SnapshotIn | |||
| vol.addPayload(setPayload(vol, snapshot, quiescevm)); | |||
| SnapshotInfo snapshotInfo = snapshotDataFactory.getSnapshot(snapshot.getId(), vol.getDataStore()); | |||
| snapshotInfo.addPayload(vol.getpayload()); | |||
| volumeToSnapshotInfoMapForRollback.put(vol.getId(), snapshotInfo); | |||
| SnapshotStrategy snapshotStrategy = storageStrategyFactory.getSnapshotStrategy(snapshotInfo, SnapshotOperation.TAKE); | |||
| if (snapshotStrategy == null) { | |||
| throw new CloudRuntimeException("Could not find strategy for snapshot uuid:" + snapshotInfo.getUuid()); | |||
| @@ -456,7 +464,7 @@ protected SnapshotInfo createDiskSnapshot(VMSnapshot vmSnapshot, List<SnapshotIn | |||
| if (snapshotInfo == null) { | |||
| throw new CloudRuntimeException("Failed to create snapshot"); | |||
| } else { | |||
There was a problem hiding this comment.
The new rollback behavior relies on adding the snapshot to the rollback map before snapshotStrategy.takeSnapshot(...) runs so that failures/exceptions still get cleaned up. There’s no unit test covering the failure path (e.g., takeSnapshot throws/returns null) and verifying that rollback deletes the persisted snapshot record. Please add a test for this scenario to prevent regressions of #12927.
DaanHoogland
left a comment
There was a problem hiding this comment.
I may be missing some context, hence my questions?!?
| vol.addPayload(setPayload(vol, snapshot, quiescevm)); | ||
| SnapshotInfo snapshotInfo = snapshotDataFactory.getSnapshot(snapshot.getId(), vol.getDataStore()); | ||
| snapshotInfo.addPayload(vol.getpayload()); | ||
| volumeToSnapshotInfoMapForRollback.put(vol.getId(), snapshotInfo); |
There was a problem hiding this comment.
so it might be added twice right? so why the else at line 466?
There was a problem hiding this comment.
The else is so we have the updated snapshot info to delete. Although, I'm not familiar enough with this code to say that it makes a difference. Anyway, if it does make a difference, I would just remove the else to reduce indentation; if it does not make a difference, I would only add the info once on the hashmap.
| } | ||
| if (!result) { | ||
| for (SnapshotInfo snapshotInfo : forRollback) { | ||
| for (SnapshotInfo snapshotInfo : volumeToSnapshotInfoMapForRollback.values()) { |
There was a problem hiding this comment.
this seems the only usage, why not a HashSet?
There was a problem hiding this comment.
the snapshot info object after take snapshot can be new/updated one, which will be considered during rollback. with HashSet, chances that another snapshot object for the same volume is added if it's new.
There was a problem hiding this comment.
@DaanHoogland I updated it, back to List
|
@blueorangutan package |
|
@sureshanaparti a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress. |
|
Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ el10 ✔️ debian ✔️ suse15. SL-JID 17383 |
JoaoJandre
left a comment
There was a problem hiding this comment.
CLGTM, did not test it
Description
This PR fixes rollback disk snapshots on instance snapshot failure.
Fixes #12927
Types of changes
Feature/Enhancement Scale or Bug Severity
Feature/Enhancement Scale
Bug Severity
Screenshots (if appropriate):
How Has This Been Tested?
How did you try to break this feature and the system with this change?