Child is still alive after the big object was deleted after expiration #1515

Open
opened 2025-12-28 17:35:41 +00:00 by sami · 3 comments
Owner

Originally created by @End-rey on GitHub (Oct 6, 2025).

Expected Behavior

When a big object is deleted after expiration, all of its parts are also deleted.

Current Behavior

After we delete expired objects on the garbage collector tick in #3582, a situation may arise where a big object has been deleted after expiration in one node, but not yet in another, and the policy is triggered. The parent is not allowed by the expiration attribute, but the child is copied, so it becomes available.

2025-10-06T16:06:49.564+0300	error	replicator/process.go:87	could not replicate object	{"component": "Object Replicator", "node": "03e340a3b89e1e6398c7c956eb0497d2a5faf5e61d5b911845a807e5e742fb7050", "object": "F6hNZM2hkgdaPDQ8nP3TATMxGMiX8yqmYYKRi4sH97RF/8RTPQ1sb8HbMP9vvdoMPRHviFTHnrwsprK4Fhp9cpzS7", "error": "copy object using NeoFS API client of the remote node: /dns4/localhost/tcp/38359: status: code = 1024 message = failed to verify and store object locally: validate object format: object did not pass expiration check: object has expired: attribute: 4, current: 5"}
2025-10-06T16:06:49.564+0300	debug	replicator/process.go:36	finish work	{"component": "Object Replicator", "amount of unfinished replicas": 1}
2025-10-06T16:06:49.582+0300	debug	replicator/process.go:91	object successfully replicated	{"component": "Object Replicator", "node": "03e340a3b89e1e6398c7c956eb0497d2a5faf5e61d5b911845a807e5e742fb7050", "object": "F6hNZM2hkgdaPDQ8nP3TATMxGMiX8yqmYYKRi4sH97RF/EBpfw1PwNhazenkBYz1a4XhHCLX9aKvGvEKP781r4VrE"}

Possible Solution

Maybe, in the policy, check for more accurate expiration attributes.

Steps to Reproduce (for bugs)

Put a big object with multiple parts and an expiration attribute in a container with replicas on multiple nodes. Wait for the expiration to occur, then head the child objects. Sometimes these parts may still be available.

Context

Flacky test case https://rest.fs.neo.org/HXSaMJXk2g8C14ht8HSi7BBaiYZ1HeWh2xnWPGQCg4H6/4145-1759747062/index.html#suites/87d8da5f92824d091055170c29079bc0/2215af6c5636b4/.

Regression

Yes, store dead objects that will no longer be removed.

Originally created by @End-rey on GitHub (Oct 6, 2025). <!-- Provide a general summary of the issue in the Title above --> ## Expected Behavior <!-- If you're describing a bug, tell us what should happen If you're suggesting a change/improvement, tell us how it should work --> When a big object is deleted after expiration, all of its parts are also deleted. ## Current Behavior <!-- If describing a bug, tell us what happens instead of the expected behavior If suggesting a change/improvement, explain the difference from current behavior --> After we delete expired objects on the garbage collector tick in #3582, a situation may arise where a big object has been deleted after expiration in one node, but not yet in another, and the policy is triggered. The parent is not allowed by the expiration attribute, but the child is copied, so it becomes available. ``` 2025-10-06T16:06:49.564+0300 error replicator/process.go:87 could not replicate object {"component": "Object Replicator", "node": "03e340a3b89e1e6398c7c956eb0497d2a5faf5e61d5b911845a807e5e742fb7050", "object": "F6hNZM2hkgdaPDQ8nP3TATMxGMiX8yqmYYKRi4sH97RF/8RTPQ1sb8HbMP9vvdoMPRHviFTHnrwsprK4Fhp9cpzS7", "error": "copy object using NeoFS API client of the remote node: /dns4/localhost/tcp/38359: status: code = 1024 message = failed to verify and store object locally: validate object format: object did not pass expiration check: object has expired: attribute: 4, current: 5"} 2025-10-06T16:06:49.564+0300 debug replicator/process.go:36 finish work {"component": "Object Replicator", "amount of unfinished replicas": 1} 2025-10-06T16:06:49.582+0300 debug replicator/process.go:91 object successfully replicated {"component": "Object Replicator", "node": "03e340a3b89e1e6398c7c956eb0497d2a5faf5e61d5b911845a807e5e742fb7050", "object": "F6hNZM2hkgdaPDQ8nP3TATMxGMiX8yqmYYKRi4sH97RF/EBpfw1PwNhazenkBYz1a4XhHCLX9aKvGvEKP781r4VrE"} ``` ## Possible Solution <!-- Not obligatory, but suggest a fix/reason for the bug, or ideas how to implement the addition or change --> Maybe, in the policy, check for more accurate expiration attributes. ## Steps to Reproduce (for bugs) <!-- Provide a link to a live example, or an unambiguous set of steps to reproduce this bug. --> Put a big object with multiple parts and an expiration attribute in a container with replicas on multiple nodes. Wait for the expiration to occur, then head the child objects. Sometimes these parts may still be available. ## Context <!-- How has this issue affected you? What are you trying to accomplish? Providing context helps us come up with a solution that is most useful in the real world --> Flacky test case https://rest.fs.neo.org/HXSaMJXk2g8C14ht8HSi7BBaiYZ1HeWh2xnWPGQCg4H6/4145-1759747062/index.html#suites/87d8da5f92824d091055170c29079bc0/2215af6c5636b4/. ## Regression <!-- Is this issue a regression? (Yes / No) If Yes, optionally please include version or commit id or PR# that caused this regression, if you have these details --> Yes, store dead objects that will no longer be removed.
Author
Owner

@roman-khimov commented on GitHub (Oct 15, 2025):

Same thing can happen with TS marks.

@roman-khimov commented on GitHub (Oct 15, 2025): Same thing can happen with TS marks.
Author
Owner

@roman-khimov commented on GitHub (Dec 3, 2025):

Seems like we can't solve it without a mark of some kind for expired big objects. And it needs to have first oid to match any part of the object.

@roman-khimov commented on GitHub (Dec 3, 2025): Seems like we can't solve it without a mark of some kind for expired big objects. And it needs to have first oid to match any part of the object.
Author
Owner

@roman-khimov commented on GitHub (Dec 24, 2025):

Or we can just delay the deletion by a single epoch. Normally this race has rather narrow window of time since nodes eventually synchronize their epoch number, so if we're to drop objects an epoch later this won't be a problem unless sender is totally out of sync with the network, but it's likely to have some other problems if it is.

@roman-khimov commented on GitHub (Dec 24, 2025): Or we can just delay the deletion by a single epoch. Normally this race has rather narrow window of time since nodes eventually synchronize their epoch number, so if we're to drop objects an epoch later this won't be a problem unless sender is totally out of sync with the network, but it's likely to have some other problems if it is.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
nspcc-dev/neofs-node#1515
No description provided.