Node fails to init shard in some cases #1449

Closed
opened 2025-12-28 17:23:02 +00:00 by sami · 13 comments
Owner

Originally created by @roman-khimov on GitHub (Jul 1, 2025).

Originally assigned to: @cthulhu-rider on GitHub.

Expected Behavior

🟢

Current Behavior

🔴

2025-07-01T10:19:08.773Z info log/log.go:12 local object storage operation {"shard_id": "EkxisqejLygajrSQwBjzLo", "address": "ETnpgfmz1enwKKoPk3yHF9uod5zUXoUB2nNgb1PxZteX/D1W7erDeAR94b2VgYYAuYcbunn22bW3zrzLumXWtDR1p", "op": "metabase PUT"}
2025/07/01 10:19:08 init shard EkxisqejLygajrSQwBjzLo: could not initialize *shard.metabaseSynchronizer: could not put objects to the meta from blobstor: could not inhume objects: status: code = 2050 message = object is locked

Possible Solution

Unknown

Steps to Reproduce (for bugs)

https://rest.fs.neo.org/HXSaMJXk2g8C14ht8HSi7BBaiYZ1HeWh2xnWPGQCg4H6/3489-1751369771/index.html#suites/295ed2e000f8fa6f3ade59cc13b98615/dad9f158acc8a212/

Regression

Maybe.

Your Environment

  • Version used: master
Originally created by @roman-khimov on GitHub (Jul 1, 2025). Originally assigned to: @cthulhu-rider on GitHub. ## Expected Behavior 🟢 ## Current Behavior 🔴 > 2025-07-01T10:19:08.773Z info log/log.go:12 local object storage operation {"shard_id": "EkxisqejLygajrSQwBjzLo", "address": "ETnpgfmz1enwKKoPk3yHF9uod5zUXoUB2nNgb1PxZteX/D1W7erDeAR94b2VgYYAuYcbunn22bW3zrzLumXWtDR1p", "op": "metabase PUT"} 2025/07/01 10:19:08 init shard EkxisqejLygajrSQwBjzLo: could not initialize *shard.metabaseSynchronizer: could not put objects to the meta from blobstor: could not inhume objects: status: code = 2050 message = object is locked ## Possible Solution Unknown ## Steps to Reproduce (for bugs) https://rest.fs.neo.org/HXSaMJXk2g8C14ht8HSi7BBaiYZ1HeWh2xnWPGQCg4H6/3489-1751369771/index.html#suites/295ed2e000f8fa6f3ade59cc13b98615/dad9f158acc8a212/ ## Regression Maybe. ## Your Environment * Version used: master
sami 2025-12-28 17:23:02 +00:00
Author
Owner

@roman-khimov commented on GitHub (Jul 9, 2025):

So it's a lock and a tombstone in a single shard. Nice combo.

@roman-khimov commented on GitHub (Jul 9, 2025): So it's a lock and a tombstone in a single shard. Nice combo.
Author
Owner
@roman-khimov commented on GitHub (Jul 10, 2025): https://rest.fs.neo.org/HXSaMJXk2g8C14ht8HSi7BBaiYZ1HeWh2xnWPGQCg4H6/3567-1752135041/index.html#suites/295ed2e000f8fa6f3ade59cc13b98615/6057db5bdc59daee/
Author
Owner

@roman-khimov commented on GitHub (Jul 24, 2025):

Blocked by https://github.com/nspcc-dev/neofs-testcases/issues/1108?

@roman-khimov commented on GitHub (Jul 24, 2025): Blocked by https://github.com/nspcc-dev/neofs-testcases/issues/1108?
Author
Owner

@carpawell commented on GitHub (Jul 25, 2025):

Blocked by https://github.com/nspcc-dev/neofs-testcases/issues/1108?

No.

@carpawell commented on GitHub (Jul 25, 2025): > Blocked by https://github.com/nspcc-dev/neofs-testcases/issues/1108? No.
Author
Owner

@carpawell commented on GitHub (Jul 25, 2025):

There was once a temporary state when the error was more detailed, but resync error was not skipped yet: https://rest.fs.neo.org/HXSaMJXk2g8C14ht8HSi7BBaiYZ1HeWh2xnWPGQCg4H6/3636-1753225895/index.html#suites/295ed2e000f8fa6f3ade59cc13b98615/51606c8ab75498c4/

It gives us a TS address, and it is seen that it was PUT to all 4 nodes successfully, and then at the resyncing stage, it is surprisingly known that it should not be accepted because its target has been locked (wow).

@carpawell commented on GitHub (Jul 25, 2025): There was once a temporary state when the error was more detailed, but resync error was not skipped yet: https://rest.fs.neo.org/HXSaMJXk2g8C14ht8HSi7BBaiYZ1HeWh2xnWPGQCg4H6/3636-1753225895/index.html#suites/295ed2e000f8fa6f3ade59cc13b98615/51606c8ab75498c4/ It gives us a TS address, and it is seen that it was PUT to all 4 nodes successfully, and then at the resyncing stage, it is surprisingly known that it should not be accepted because its target has been locked (wow).
Author
Owner

@cthulhu-rider commented on GitHub (Jul 28, 2025):

seems like

  1. 5115e2f48c covered occurring error. So both TOMBSTONE and LOCK are stored in the metabase
  2. 7105afffc3 injected LOCK check into metabase PUT (here), returning an error

so, the smallest fix i see is to not fail tombstone PUT complely but skip garbage bucket update (as before)

@cthulhu-rider commented on GitHub (Jul 28, 2025): seems like 1. 5115e2f48c76e65caa48ef141989dca845d91a80 covered occurring error. So both TOMBSTONE and LOCK are stored in the metabase 2. 7105afffc351f9825fcc78ac2d9e9c5936f7a2f9 injected LOCK check into metabase PUT ([here](https://github.com/nspcc-dev/neofs-node/blob/ba3431d3cd5cd9f107aa19db05e8ecff2e161744/pkg/local_object_storage/metabase/put.go#L154-L156)), returning an error so, the smallest fix i see is to not fail tombstone PUT complely but skip garbage bucket update (as before)
Author
Owner

@cthulhu-rider commented on GitHub (Jul 29, 2025):

alright, idk how this happens in the test (dont see any removal), but currently LOCKing the TOMBSTONEd object is not prohibited. Having container incl. single SN with single shard, the situation is 100% reproducible using API. So,

So it's a lock and a tombstone in a single shard. Nice combo.

is very possible

@cthulhu-rider commented on GitHub (Jul 29, 2025): alright, idk how this happens in the test (dont see any removal), but currently LOCKing the TOMBSTONEd object is not prohibited. Having container incl. single SN with single shard, the situation is 100% reproducible using API. So, > So it's a lock and a tombstone in a single shard. Nice combo. is very possible
Author
Owner

@cthulhu-rider commented on GitHub (Jul 29, 2025):

assuming that LOCK+TOMBSTONE in same shard is exactly the situation, i think inconsitency occurs due to variable order of objects depending on IDs in resync process:

  1. if O and L go before T, metabase rejects T (see also 5115e2f48c)
  2. if T goes before O, metabase saves TOMBSTONE but does not mark objects as garbage regardless of L (condition)
  3. iiuc lookin at the code, L marks are lost on resync in new format case
  4. iiuc lookin at the code, if T goes before L, L does not protect O from garbaging when resync

i need to test 3 and 4. If 4 is true, it existed before v0.48.0. Overall, dependence on order is unsafe


it seems inconvenient to me that handling of old and new L/T formats are done at different layers now. Old are handled by Shard while new by Metabase. They are equivalent up to the number of associated elements

@cthulhu-rider commented on GitHub (Jul 29, 2025): assuming that LOCK+TOMBSTONE in same shard is exactly the situation, i think inconsitency occurs due to variable order of objects depending on IDs in resync process: 1. if O and L go before T, metabase rejects T (see also 5115e2f48c76e65caa48ef141989dca845d91a80) 2. if T goes before O, metabase saves TOMBSTONE but does not mark objects as garbage regardless of L ([condition](https://github.com/nspcc-dev/neofs-node/blob/bf5a7518dd5b3b78d3ec9b80a2b3d821223a6f85/pkg/local_object_storage/metabase/put.go#L136-L138)) 3. ~~iiuc lookin at the code, L marks are lost on resync in new format case~~ 4. iiuc lookin at the code, if T goes before L, L does not protect O from garbaging when resync i need to test 3 and 4. If 4 is true, it existed before v0.48.0. Overall, dependence on order is unsafe --- it seems inconvenient to me that handling of old and new L/T formats are done at different layers now. Old are handled by [Shard](https://github.com/nspcc-dev/neofs-node/blob/bf5a7518dd5b3b78d3ec9b80a2b3d821223a6f85/pkg/local_object_storage/shard/control.go#L195) while new by [Metabase](https://github.com/nspcc-dev/neofs-node/blob/bf5a7518dd5b3b78d3ec9b80a2b3d821223a6f85/pkg/local_object_storage/metabase/put.go#L120). They are equivalent up to the number of associated elements
Author
Owner

@roman-khimov commented on GitHub (Jul 29, 2025):

Old indexes are to be removed eventually, so we're in transition phase.

@roman-khimov commented on GitHub (Jul 29, 2025): Old indexes are to be removed eventually, so we're in transition phase.
Author
Owner

@cthulhu-rider commented on GitHub (Jul 29, 2025):

iiuc lookin at the code, if T goes before L, L does not protect O from garbaging when resync

seems so

RESYNC OBJECT REGULAR 4CbFhT87ZEFKqdL1ixvjgUV31nCkmS5RYAD8pmGd2vd6
info	log/log.go:12	local object storage operation	{"shard_id": "UxuJt3X68gigFWa2xAY5BV", "address": "31iz6oYmnyyKvi9DNheozuf5Q1oKmMJcwKkpGDGtw6QZ/4CbFhT87ZEFKqdL1ixvjgUV31nCkmS5RYAD8pmGd2vd6", "op": "metabase PUT"}
RESYNC OBJECT TOMBSTONE 4wc6PgePAGht74KyhQ6vxdnEWv2RnbPdnSpmH5AzJ4sK
OBJECT IS NOT LOCKED 4CbFhT87ZEFKqdL1ixvjgUV31nCkmS5RYAD8pmGd2vd6
info	log/log.go:12	local object storage operation	{"shard_id": "UxuJt3X68gigFWa2xAY5BV", "address": "31iz6oYmnyyKvi9DNheozuf5Q1oKmMJcwKkpGDGtw6QZ/4wc6PgePAGht74KyhQ6vxdnEWv2RnbPdnSpmH5AzJ4sK", "op": "metabase PUT"}
RESYNC OBJECT LOCK 7qGZGGBkQa9yKjW6xgiWAXMWenAMgYw9haeUiroUXEV
info	log/log.go:12	local object storage operation	{"shard_id": "UxuJt3X68gigFWa2xAY5BV", "address": "31iz6oYmnyyKvi9DNheozuf5Q1oKmMJcwKkpGDGtw6QZ/7qGZGGBkQa9yKjW6xgiWAXMWenAMgYw9haeUiroUXEV", "op": "metabase PUT"}
...
info	log/log.go:12	local object storage operation	{"shard_id": "UxuJt3X68gigFWa2xAY5BV", "address": "31iz6oYmnyyKvi9DNheozuf5Q1oKmMJcwKkpGDGtw6QZ/4CbFhT87ZEFKqdL1ixvjgUV31nCkmS5RYAD8pmGd2vd6", "op": "metabase DELETE"}
info	log/log.go:12	local object storage operation	{"shard_id": "UxuJt3X68gigFWa2xAY5BV", "address": "31iz6oYmnyyKvi9DNheozuf5Q1oKmMJcwKkpGDGtw6QZ/4CbFhT87ZEFKqdL1ixvjgUV31nCkmS5RYAD8pmGd2vd6", "op": "DELETE"}
@cthulhu-rider commented on GitHub (Jul 29, 2025): > iiuc lookin at the code, if T goes before L, L does not protect O from garbaging when resync seems so ``` RESYNC OBJECT REGULAR 4CbFhT87ZEFKqdL1ixvjgUV31nCkmS5RYAD8pmGd2vd6 info log/log.go:12 local object storage operation {"shard_id": "UxuJt3X68gigFWa2xAY5BV", "address": "31iz6oYmnyyKvi9DNheozuf5Q1oKmMJcwKkpGDGtw6QZ/4CbFhT87ZEFKqdL1ixvjgUV31nCkmS5RYAD8pmGd2vd6", "op": "metabase PUT"} RESYNC OBJECT TOMBSTONE 4wc6PgePAGht74KyhQ6vxdnEWv2RnbPdnSpmH5AzJ4sK OBJECT IS NOT LOCKED 4CbFhT87ZEFKqdL1ixvjgUV31nCkmS5RYAD8pmGd2vd6 info log/log.go:12 local object storage operation {"shard_id": "UxuJt3X68gigFWa2xAY5BV", "address": "31iz6oYmnyyKvi9DNheozuf5Q1oKmMJcwKkpGDGtw6QZ/4wc6PgePAGht74KyhQ6vxdnEWv2RnbPdnSpmH5AzJ4sK", "op": "metabase PUT"} RESYNC OBJECT LOCK 7qGZGGBkQa9yKjW6xgiWAXMWenAMgYw9haeUiroUXEV info log/log.go:12 local object storage operation {"shard_id": "UxuJt3X68gigFWa2xAY5BV", "address": "31iz6oYmnyyKvi9DNheozuf5Q1oKmMJcwKkpGDGtw6QZ/7qGZGGBkQa9yKjW6xgiWAXMWenAMgYw9haeUiroUXEV", "op": "metabase PUT"} ... info log/log.go:12 local object storage operation {"shard_id": "UxuJt3X68gigFWa2xAY5BV", "address": "31iz6oYmnyyKvi9DNheozuf5Q1oKmMJcwKkpGDGtw6QZ/4CbFhT87ZEFKqdL1ixvjgUV31nCkmS5RYAD8pmGd2vd6", "op": "metabase DELETE"} info log/log.go:12 local object storage operation {"shard_id": "UxuJt3X68gigFWa2xAY5BV", "address": "31iz6oYmnyyKvi9DNheozuf5Q1oKmMJcwKkpGDGtw6QZ/4CbFhT87ZEFKqdL1ixvjgUV31nCkmS5RYAD8pmGd2vd6", "op": "DELETE"} ```
Author
Owner
@roman-khimov commented on GitHub (Aug 5, 2025): The same test fails with https://rest.fs.neo.org/HXSaMJXk2g8C14ht8HSi7BBaiYZ1HeWh2xnWPGQCg4H6/3709-1754406330/index.html#suites/295ed2e000f8fa6f3ade59cc13b98615/f2b70d4bbc29088e/ now
Author
Owner

@cthulhu-rider commented on GitHub (Aug 11, 2025):

SN2

2025-08-05T14:14:49.409Z	error	policer/check.go:276	receive object header to check policy compliance	{"component": "Object Policer", "object": "EQzSdEmkX6gPwd24Pzzud7hzmf4tJNSzp8Pc1fYENydx/3KQ2NfgHNhdQjYKv5ptwUEWv6WAhJsUY8ajEsUSuWXm9", "error": "(*headsvc.RemoteHeader) could not head object in [/dns4/localhost/tcp/49278]: read object header from NeoFS: all endpoints failed, first error: /dns4/localhost/tcp/49278: rpc failure: rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: dial tcp 127.0.0.1:49278: connect: connection refused\""}
2025-08-05T14:14:49.417Z	debug	policer/check.go:291	shortage of object copies detected	{"component": "Object Policer", "object": "EQzSdEmkX6gPwd24Pzzud7hzmf4tJNSzp8Pc1fYENydx/3KQ2NfgHNhdQjYKv5ptwUEWv6WAhJsUY8ajEsUSuWXm9", "shortage": 1}
2025-08-05T14:14:49.449Z	debug	replicator/process.go:91	object successfully replicated	{"component": "Object Replicator", "node": "03de2effcd84a7a2f54a66aeceb04a8a9af71afdb406c9c4c1e1ad5e038b6bfd18", "object": "EQzSdEmkX6gPwd24Pzzud7hzmf4tJNSzp8Pc1fYENydx/3KQ2NfgHNhdQjYKv5ptwUEWv6WAhJsUY8ajEsUSuWXm9"}

SN1

2025-08-05T14:14:49.422Z	info	log/log.go:12	local object storage operation	{"shard_id": "QFGp9CEc4sQEx154XnYecA", "address": "EQzSdEmkX6gPwd24Pzzud7hzmf4tJNSzp8Pc1fYENydx/3KQ2NfgHNhdQjYKv5ptwUEWv6WAhJsUY8ajEsUSuWXm9", "op": "PUT"}
2025-08-05T14:14:49.449Z	info	log/log.go:12	local object storage operation	{"shard_id": "QFGp9CEc4sQEx154XnYecA", "address": "EQzSdEmkX6gPwd24Pzzud7hzmf4tJNSzp8Pc1fYENydx/3KQ2NfgHNhdQjYKv5ptwUEWv6WAhJsUY8ajEsUSuWXm9", "op": "metabase PUT"}
2025-08-05T14:15:29.834Z	info	log/log.go:12	local object storage operation	{"shard_id": "Lxp5vocMbi9SLcr9uXmPUZ", "address": "EQzSdEmkX6gPwd24Pzzud7hzmf4tJNSzp8Pc1fYENydx/3KQ2NfgHNhdQjYKv5ptwUEWv6WAhJsUY8ajEsUSuWXm9", "op": "metabase PUT"}
2025-08-05T14:15:45.516Z	info	policer/check.go:169	local replica of the object is redundant in the container, removing...	{"component": "Object Policer", "object": "EQzSdEmkX6gPwd24Pzzud7hzmf4tJNSzp8Pc1fYENydx/3KQ2NfgHNhdQjYKv5ptwUEWv6WAhJsUY8ajEsUSuWXm9"}

test queries:

COMMAND: ./neofs-cli --config /Users/runner/work/neofs-node/neofs-node/neofs-testcases/test-run-2025-08-05-13-32-38-967696/env_files/neofs-env-2025-08-05-13-32-39-8657342336/sn_1_jaurfqwotg/sn_1_cli_config_bnxpogzdbh.yml object head --rpc-endpoint 'localhost:49270' --wallet '/Users/runner/work/neofs-node/neofs-node/neofs-testcases/test-run-2025-08-05-13-32-38-967696/env_files/neofs-env-2025-08-05-13-32-39-8657342336/sn_1_jaurfqwotg/sn_1_wallet_wxdmqxjohj' --cid 'EQzSdEmkX6gPwd24Pzzud7hzmf4tJNSzp8Pc1fYENydx' --oid '3KQ2NfgHNhdQjYKv5ptwUEWv6WAhJsUY8ajEsUSuWXm9' --json --ttl 1
RETCODE: 0
Start / End / Elapsed	 14:15:19.506634 / 14:15:19.769501 / 0:00:00.262867

COMMAND: ./neofs-cli --config /Users/runner/work/neofs-node/neofs-node/neofs-testcases/test-run-2025-08-05-13-32-38-967696/env_files/neofs-env-2025-08-05-13-32-39-8657342336/sn_1_jaurfqwotg/sn_1_cli_config_bnxpogzdbh.yml object head --rpc-endpoint 'localhost:49270' --wallet '/Users/runner/work/neofs-node/neofs-node/neofs-testcases/test-run-2025-08-05-13-32-38-967696/env_files/neofs-env-2025-08-05-13-32-39-8657342336/sn_1_jaurfqwotg/sn_1_wallet_wxdmqxjohj' --cid 'EQzSdEmkX6gPwd24Pzzud7hzmf4tJNSzp8Pc1fYENydx' --oid '3KQ2NfgHNhdQjYKv5ptwUEWv6WAhJsUY8ajEsUSuWXm9' --json --ttl 1
RETCODE: 1

STDOUT:
rpc error: read object header via client: status: code = 2049 message = object not found

Start / End / Elapsed	 14:15:55.966558 / 14:15:56.216026 / 0:00:00.249468

according to the timings, the test first encountered the +1 replica state (*), and then the restored amount

(*) transport: Error while dialing: dial tcp 127.0.0.1:49278: connect: connection refused: apparently SN3 was in shutdown (restart?) at that moment

  1. if a SN-SN connection error is not allowed in a runtime of this test, then it needs to be fixed, thereby preventing a replicator trigger leading to assertion failure
  2. otherwise, state diff is possible, which is what happened. For stabilization, I propose the following scheme:
    1. exec neofs-cli object nodes and get N list
    2. do Get Nodes With Object stage. Require first REP (2 now) of N to respond with 200. Require the rest to
      respond with either 200 or 404
    3. restart/resync/etc.
    4. repeat Get Nodes With Object with same asserts

i'd stick to 2, but let @evgeniiz321 review 1 first

@cthulhu-rider commented on GitHub (Aug 11, 2025): SN2 ```zap 2025-08-05T14:14:49.409Z error policer/check.go:276 receive object header to check policy compliance {"component": "Object Policer", "object": "EQzSdEmkX6gPwd24Pzzud7hzmf4tJNSzp8Pc1fYENydx/3KQ2NfgHNhdQjYKv5ptwUEWv6WAhJsUY8ajEsUSuWXm9", "error": "(*headsvc.RemoteHeader) could not head object in [/dns4/localhost/tcp/49278]: read object header from NeoFS: all endpoints failed, first error: /dns4/localhost/tcp/49278: rpc failure: rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: dial tcp 127.0.0.1:49278: connect: connection refused\""} 2025-08-05T14:14:49.417Z debug policer/check.go:291 shortage of object copies detected {"component": "Object Policer", "object": "EQzSdEmkX6gPwd24Pzzud7hzmf4tJNSzp8Pc1fYENydx/3KQ2NfgHNhdQjYKv5ptwUEWv6WAhJsUY8ajEsUSuWXm9", "shortage": 1} 2025-08-05T14:14:49.449Z debug replicator/process.go:91 object successfully replicated {"component": "Object Replicator", "node": "03de2effcd84a7a2f54a66aeceb04a8a9af71afdb406c9c4c1e1ad5e038b6bfd18", "object": "EQzSdEmkX6gPwd24Pzzud7hzmf4tJNSzp8Pc1fYENydx/3KQ2NfgHNhdQjYKv5ptwUEWv6WAhJsUY8ajEsUSuWXm9"} ``` SN1 ```zap 2025-08-05T14:14:49.422Z info log/log.go:12 local object storage operation {"shard_id": "QFGp9CEc4sQEx154XnYecA", "address": "EQzSdEmkX6gPwd24Pzzud7hzmf4tJNSzp8Pc1fYENydx/3KQ2NfgHNhdQjYKv5ptwUEWv6WAhJsUY8ajEsUSuWXm9", "op": "PUT"} 2025-08-05T14:14:49.449Z info log/log.go:12 local object storage operation {"shard_id": "QFGp9CEc4sQEx154XnYecA", "address": "EQzSdEmkX6gPwd24Pzzud7hzmf4tJNSzp8Pc1fYENydx/3KQ2NfgHNhdQjYKv5ptwUEWv6WAhJsUY8ajEsUSuWXm9", "op": "metabase PUT"} 2025-08-05T14:15:29.834Z info log/log.go:12 local object storage operation {"shard_id": "Lxp5vocMbi9SLcr9uXmPUZ", "address": "EQzSdEmkX6gPwd24Pzzud7hzmf4tJNSzp8Pc1fYENydx/3KQ2NfgHNhdQjYKv5ptwUEWv6WAhJsUY8ajEsUSuWXm9", "op": "metabase PUT"} 2025-08-05T14:15:45.516Z info policer/check.go:169 local replica of the object is redundant in the container, removing... {"component": "Object Policer", "object": "EQzSdEmkX6gPwd24Pzzud7hzmf4tJNSzp8Pc1fYENydx/3KQ2NfgHNhdQjYKv5ptwUEWv6WAhJsUY8ajEsUSuWXm9"} ``` test queries: ``` COMMAND: ./neofs-cli --config /Users/runner/work/neofs-node/neofs-node/neofs-testcases/test-run-2025-08-05-13-32-38-967696/env_files/neofs-env-2025-08-05-13-32-39-8657342336/sn_1_jaurfqwotg/sn_1_cli_config_bnxpogzdbh.yml object head --rpc-endpoint 'localhost:49270' --wallet '/Users/runner/work/neofs-node/neofs-node/neofs-testcases/test-run-2025-08-05-13-32-38-967696/env_files/neofs-env-2025-08-05-13-32-39-8657342336/sn_1_jaurfqwotg/sn_1_wallet_wxdmqxjohj' --cid 'EQzSdEmkX6gPwd24Pzzud7hzmf4tJNSzp8Pc1fYENydx' --oid '3KQ2NfgHNhdQjYKv5ptwUEWv6WAhJsUY8ajEsUSuWXm9' --json --ttl 1 RETCODE: 0 Start / End / Elapsed 14:15:19.506634 / 14:15:19.769501 / 0:00:00.262867 COMMAND: ./neofs-cli --config /Users/runner/work/neofs-node/neofs-node/neofs-testcases/test-run-2025-08-05-13-32-38-967696/env_files/neofs-env-2025-08-05-13-32-39-8657342336/sn_1_jaurfqwotg/sn_1_cli_config_bnxpogzdbh.yml object head --rpc-endpoint 'localhost:49270' --wallet '/Users/runner/work/neofs-node/neofs-node/neofs-testcases/test-run-2025-08-05-13-32-38-967696/env_files/neofs-env-2025-08-05-13-32-39-8657342336/sn_1_jaurfqwotg/sn_1_wallet_wxdmqxjohj' --cid 'EQzSdEmkX6gPwd24Pzzud7hzmf4tJNSzp8Pc1fYENydx' --oid '3KQ2NfgHNhdQjYKv5ptwUEWv6WAhJsUY8ajEsUSuWXm9' --json --ttl 1 RETCODE: 1 STDOUT: rpc error: read object header via client: status: code = 2049 message = object not found Start / End / Elapsed 14:15:55.966558 / 14:15:56.216026 / 0:00:00.249468 ``` --- according to the timings, the test first encountered the +1 replica state (*), and then the restored amount (*) `transport: Error while dialing: dial tcp 127.0.0.1:49278: connect: connection refused`: apparently SN3 was in shutdown (restart?) at that moment 1. if a SN-SN connection error is not allowed in a runtime of this test, then it needs to be fixed, thereby preventing a replicator trigger leading to assertion failure 2. otherwise, state diff is possible, which is what happened. For stabilization, I propose the following scheme: 1. exec `neofs-cli object nodes` and get `N` list 2. do `Get Nodes With Object` stage. Require first `REP` (2 now) of `N` to respond with `200`. Require the rest to respond with either `200` or `404` 3. restart/resync/etc. 4. repeat `Get Nodes With Object` with same asserts i'd stick to 2, but let @evgeniiz321 review 1 first
Author
Owner

@roman-khimov commented on GitHub (Aug 11, 2025):

I think the original definition of the problem is no longer relevant. The issue of excessive copies is a test problem to me because there is a restart in this test and other nodes can and should react to that. Either we're making it so that the object copy is unique or we deal with (sometimes) excessive copies.

So let's reopen this in testcases.

@roman-khimov commented on GitHub (Aug 11, 2025): I think the original definition of the problem is no longer relevant. The issue of excessive copies is a test problem to me because there is a restart in this test and other nodes can and should react to that. Either we're making it so that the object copy is unique or we deal with (sometimes) excessive copies. So let's reopen this in testcases.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
nspcc-dev/neofs-node#1449
No description provided.