Shard doesn't switch to read-only mode when shard_ro_error_threshold reached ("error": "could not save the object in any blobovnicza") #873

Closed
opened 2025-12-28 17:20:59 +00:00 by sami · 4 comments
Owner

Originally created by @anikeev-yadro on GitHub (Nov 9, 2022).

Originally assigned to: @fyrchik on GitHub.

Related to https://github.com/nspcc-dev/neofs-node/issues/1857

Expected Behavior

Shard should switch to read-only mode when when shard_ro_error_threshold reached.

Current Behavior

Shard doesn't switch to read-only mode when shard_ro_error_threshold reached

Steps to Reproduce (for bugs)

1.Make some WRITE disk errors on the shard:

Nov 09 13:39:14 az neofs-node[2899]: 2022-11-09T13:39:14.705Z        warn        engine/engine.go:51        can't flush an object to blobstor        {"shard_id": "KGjwru9PsvofvDeCWHYZKL", "error count": 7, "error": "could not save the object in any blobovnicza"}
Nov 09 13:39:14 az neofs-node[2899]: 2022-11-09T13:39:14.705Z        debug        writecache/flush.go:122        tried to flush items from write-cache        {"shard_id": "KGjwru9PsvofvDeCWHYZKL", "count": 2, "start": ""}
Nov 09 13:39:14 az neofs-node[2899]: 2022-11-09T13:39:14.706Z        debug        blobovniczatree/put.go:64        could not put object to active blobovnicza        {"path": "2/0", "error": "write /data/neofs/data0/blobovnicza/2/0: broken pipe"}
Nov 09 13:39:14 az neofs-node[2899]: 2022-11-09T13:39:14.706Z        debug        blobovniczatree/put.go:64        could not put object to active blobovnicza        {"path": "1/0", "error": "write /data/neofs/data0/blobovnicza/1/0: resource temporarily unavailable"}
Nov 09 13:39:14 az neofs-node[2899]: 2022-11-09T13:39:14.706Z        debug        blobovniczatree/put.go:64        could not put object to active blobovnicza        {"path": "1/0", "error": "write /data/neofs/data0/blobovnicza/1/0: resource temporarily unavailable"}
Nov 09 13:39:14 az neofs-node[2899]: 2022-11-09T13:39:14.706Z        debug        blobovniczatree/put.go:64        could not put object to active blobovnicza        {"path": "1/0", "error": "write /data/neofs/data0/blobovnicza/1/0: resource temporarily unavailable"}
Nov 09 13:39:14 az neofs-node[2899]: 2022-11-09T13:39:14.706Z        debug        blobovniczatree/put.go:64        could not put object to active blobovnicza        {"path": "1/0", "error": "write /data/neofs/data0/blobovnicza/1/0: resource temporarily unavailable"}
Nov 09 13:39:14 az neofs-node[2899]: 2022-11-09T13:39:14.706Z        debug        blobovniczatree/put.go:64        could not put object to active blobovnicza        {"path": "1/0", "error": "write /data/neofs/data0/blobovnicza/1/0: resource temporarily unavailable"}
Nov 09 13:39:14 az neofs-node[2899]: 2022-11-09T13:39:14.706Z        warn        engine/engine.go:51        can't flush an object to blobstor        {"shard_id": "KGjwru9PsvofvDeCWHYZKL", "error count": 8, "error": "could not save the object in any blobovnicza"}
Nov 09 13:39:14 az neofs-node[2899]: 2022-11-09T13:39:14.707Z        debug        blobovniczatree/put.go:64        could not put object to active blobovnicza        {"path": "1/0", "error": "write /data/neofs/data0/blobovnicza/1/0: resource temporarily unavailable"}
Nov 09 13:39:14 az neofs-node[2899]: 2022-11-09T13:39:14.707Z        warn        engine/engine.go:51        can't flush an object to blobstor        {"shard_id": "KGjwru9PsvofvDeCWHYZKL", "error count": 9, "error": "could not save the object in any blobovnicza"}
Nov 09 13:39:14 az neofs-node[2899]: 2022-11-09T13:39:14.707Z        debug        writecache/flush.go:122        tried to flush items from write-cache        {"shard_id": "KGjwru9PsvofvDeCWHYZKL", "count": 2, "start": ""}
Nov 09 13:39:14 az neofs-node[2899]: 2022-11-09T13:39:14.707Z        debug        blobovniczatree/put.go:64        could not put object to active blobovnicza        {"path": "1/0", "error": "write /data/neofs/data0/blobovnicza/1/0: resource temporarily unavailable"}
Nov 09 13:39:14 az neofs-node[2899]: 2022-11-09T13:39:14.707Z        warn        engine/engine.go:51        can't flush an object to blobstor        {"shard_id": "KGjwru9PsvofvDeCWHYZKL", "error count": 10, "error": "could not save the object in any blobovnicza"}
Nov 09 13:39:14 az neofs-node[2899]: 2022-11-09T13:39:14.708Z        debug        blobovniczatree/put.go:64        could not put object to active blobovnicza        {"path": "1/0", "error": "write /data/neofs/data0/blobovnicza/1/0: resource temporarily unavailable"}
Nov 09 13:39:14 az neofs-node[2899]: 2022-11-09T13:39:14.708Z        warn        engine/engine.go:51        can't flush an object to blobstor        {"shard_id": "KGjwru9PsvofvDeCWHYZKL", "error count": 11, "error": "could not save the object in any blobovnicza"}
Nov 09 13:39:14 az neofs-node[2899]: 2022-11-09T13:39:14.708Z        debug        writecache/flush.go:122        tried to flush items from write-cache        {"shard_id": "KGjwru9PsvofvDeCWHYZKL", "count": 2, "start": ""}
Nov 09 13:39:14 az neofs-node[2899]: 2022-11-09T13:39:14.708Z        debug        blobovniczatree/put.go:64        could not put object to active blobovnicza        {"path": "1/0", "error": "write /data/neofs/data0/blobovnicza/1/0: resource temporarily unavailable"}
Nov 09 13:39:14 az neofs-node[2899]: 2022-11-09T13:39:14.708Z        warn        engine/engine.go:51        can't flush an object to blobstor        {"shard_id": "KGjwru9PsvofvDeCWHYZKL", "error count": 12, "error": "could not save the object in any blobovnicza"}
Nov 09 13:39:14 az neofs-node[2899]: 2022-11-09T13:39:14.709Z        debug        blobovniczatree/put.go:64        could not put object to active blobovnicza        {"path": "1/0", "error": "write /data/neofs/data0/blobovnicza/1/0: resource temporarily unavailable"}
Nov 09 13:39:14 az neofs-node[2899]: 2022-11-09T13:39:14.709Z        warn        engine/engine.go:51        can't flush an object to blobstor        {"shard_id": "KGjwru9PsvofvDeCWHYZKL", "error count": 13, "error": "could not save the object in any blobovnicza"}
Nov 09 13:39:14 az neofs-node[2899]: 2022-11-09T13:39:14.709Z        debug        writecache/flush.go:122        tried to flush items from write-cache        {"shard_id": "KGjwru9PsvofvDeCWHYZKL", "count": 2, "start": ""}
Nov 09 13:39:14 az neofs-node[2899]: 2022-11-09T13:39:14.709Z        debug        blobovniczatree/put.go:64        could not put object to active blobovnicza        {"path": "1/0", "error": "write /data/neofs/data0/blobovnicza/1/0: resource temporarily unavailable"}
Nov 09 13:39:14 az neofs-node[2899]: 2022-11-09T13:39:14.709Z        warn        engine/engine.go:51        can't flush an object to blobstor        {"shard_id": "KGjwru9PsvofvDeCWHYZKL", "error count": 14, "error": "could not save the object in any blobovnicza"}

2.shard_ro_error_threshold reached, but shard doesn't switch to read-obly mode according to config:

root@az:/etc/neofs/storage# grep shard_ro_error_threshold config.yml
    shard_ro_error_threshold: 10
root@az:/# sudo neofs-cli --endpoint 127.0.0.1:8091 -w /etc/neofs/storage/wallet.json control shards list
Enter password >
Shard KGjwru9PsvofvDeCWHYZKL:
Mode: read-write
Metabase: /srv/neofs/meta0/metabase0.db
Blobstor:
        Path 0: /data/neofs/data0/blobovnicza
        Type 0: blobovnicza
        Path 1: /data/neofs/data0
        Type 1: fstree
Write-cache: /srv/neofs/meta0/write_cache0
Pilorama: /srv/neofs/meta0/pilorama0.db
Error count: 27204
Shard B55apDQRxqLW5XAKyvDEwM:
Mode: read-write
Metabase: /srv/neofs/meta0/metabase1.db
Blobstor:
        Path 0: /data1/neofs/data1/blobovnicza
        Type 0: blobovnicza
        Path 1: /data1/neofs/data1
        Type 1: fstree
Write-cache: /srv/neofs/meta0/write_cache1
Pilorama: /srv/neofs/meta0/pilorama1.db
Error count: 0

Config: config.zip
Config of unreliablefs:

neofs-storage@az:/data$ cat unreliablefs.conf
[errinj_errno]
op_regexp = WRITE
path_regexp = .data0*
probability = 90

Versions:

NeoFS Storage node
Version: v0.34.0-19-gd2cce629
GoVersion: go1.18.4

Your Environment
Server setup and configuration:
cloud, 4 VMs, 4 SN, 4 http qw, 4 s3 gw

Operating System and version (uname -a):
linux vedi 5.10.0-16-amd64 https://github.com/nspcc-dev/neofs-node/issues/1 SMP Debian 5.10.127-1 (2022-06-30) x86_64 GNU/Linux

Originally created by @anikeev-yadro on GitHub (Nov 9, 2022). Originally assigned to: @fyrchik on GitHub. Related to https://github.com/nspcc-dev/neofs-node/issues/1857 ## Expected Behavior Shard should switch to read-only mode when when shard_ro_error_threshold reached. ## Current Behavior Shard doesn't switch to read-only mode when shard_ro_error_threshold reached ## Steps to Reproduce (for bugs) 1.Make some WRITE disk errors on the shard: ``` Nov 09 13:39:14 az neofs-node[2899]: 2022-11-09T13:39:14.705Z warn engine/engine.go:51 can't flush an object to blobstor {"shard_id": "KGjwru9PsvofvDeCWHYZKL", "error count": 7, "error": "could not save the object in any blobovnicza"} Nov 09 13:39:14 az neofs-node[2899]: 2022-11-09T13:39:14.705Z debug writecache/flush.go:122 tried to flush items from write-cache {"shard_id": "KGjwru9PsvofvDeCWHYZKL", "count": 2, "start": ""} Nov 09 13:39:14 az neofs-node[2899]: 2022-11-09T13:39:14.706Z debug blobovniczatree/put.go:64 could not put object to active blobovnicza {"path": "2/0", "error": "write /data/neofs/data0/blobovnicza/2/0: broken pipe"} Nov 09 13:39:14 az neofs-node[2899]: 2022-11-09T13:39:14.706Z debug blobovniczatree/put.go:64 could not put object to active blobovnicza {"path": "1/0", "error": "write /data/neofs/data0/blobovnicza/1/0: resource temporarily unavailable"} Nov 09 13:39:14 az neofs-node[2899]: 2022-11-09T13:39:14.706Z debug blobovniczatree/put.go:64 could not put object to active blobovnicza {"path": "1/0", "error": "write /data/neofs/data0/blobovnicza/1/0: resource temporarily unavailable"} Nov 09 13:39:14 az neofs-node[2899]: 2022-11-09T13:39:14.706Z debug blobovniczatree/put.go:64 could not put object to active blobovnicza {"path": "1/0", "error": "write /data/neofs/data0/blobovnicza/1/0: resource temporarily unavailable"} Nov 09 13:39:14 az neofs-node[2899]: 2022-11-09T13:39:14.706Z debug blobovniczatree/put.go:64 could not put object to active blobovnicza {"path": "1/0", "error": "write /data/neofs/data0/blobovnicza/1/0: resource temporarily unavailable"} Nov 09 13:39:14 az neofs-node[2899]: 2022-11-09T13:39:14.706Z debug blobovniczatree/put.go:64 could not put object to active blobovnicza {"path": "1/0", "error": "write /data/neofs/data0/blobovnicza/1/0: resource temporarily unavailable"} Nov 09 13:39:14 az neofs-node[2899]: 2022-11-09T13:39:14.706Z warn engine/engine.go:51 can't flush an object to blobstor {"shard_id": "KGjwru9PsvofvDeCWHYZKL", "error count": 8, "error": "could not save the object in any blobovnicza"} Nov 09 13:39:14 az neofs-node[2899]: 2022-11-09T13:39:14.707Z debug blobovniczatree/put.go:64 could not put object to active blobovnicza {"path": "1/0", "error": "write /data/neofs/data0/blobovnicza/1/0: resource temporarily unavailable"} Nov 09 13:39:14 az neofs-node[2899]: 2022-11-09T13:39:14.707Z warn engine/engine.go:51 can't flush an object to blobstor {"shard_id": "KGjwru9PsvofvDeCWHYZKL", "error count": 9, "error": "could not save the object in any blobovnicza"} Nov 09 13:39:14 az neofs-node[2899]: 2022-11-09T13:39:14.707Z debug writecache/flush.go:122 tried to flush items from write-cache {"shard_id": "KGjwru9PsvofvDeCWHYZKL", "count": 2, "start": ""} Nov 09 13:39:14 az neofs-node[2899]: 2022-11-09T13:39:14.707Z debug blobovniczatree/put.go:64 could not put object to active blobovnicza {"path": "1/0", "error": "write /data/neofs/data0/blobovnicza/1/0: resource temporarily unavailable"} Nov 09 13:39:14 az neofs-node[2899]: 2022-11-09T13:39:14.707Z warn engine/engine.go:51 can't flush an object to blobstor {"shard_id": "KGjwru9PsvofvDeCWHYZKL", "error count": 10, "error": "could not save the object in any blobovnicza"} Nov 09 13:39:14 az neofs-node[2899]: 2022-11-09T13:39:14.708Z debug blobovniczatree/put.go:64 could not put object to active blobovnicza {"path": "1/0", "error": "write /data/neofs/data0/blobovnicza/1/0: resource temporarily unavailable"} Nov 09 13:39:14 az neofs-node[2899]: 2022-11-09T13:39:14.708Z warn engine/engine.go:51 can't flush an object to blobstor {"shard_id": "KGjwru9PsvofvDeCWHYZKL", "error count": 11, "error": "could not save the object in any blobovnicza"} Nov 09 13:39:14 az neofs-node[2899]: 2022-11-09T13:39:14.708Z debug writecache/flush.go:122 tried to flush items from write-cache {"shard_id": "KGjwru9PsvofvDeCWHYZKL", "count": 2, "start": ""} Nov 09 13:39:14 az neofs-node[2899]: 2022-11-09T13:39:14.708Z debug blobovniczatree/put.go:64 could not put object to active blobovnicza {"path": "1/0", "error": "write /data/neofs/data0/blobovnicza/1/0: resource temporarily unavailable"} Nov 09 13:39:14 az neofs-node[2899]: 2022-11-09T13:39:14.708Z warn engine/engine.go:51 can't flush an object to blobstor {"shard_id": "KGjwru9PsvofvDeCWHYZKL", "error count": 12, "error": "could not save the object in any blobovnicza"} Nov 09 13:39:14 az neofs-node[2899]: 2022-11-09T13:39:14.709Z debug blobovniczatree/put.go:64 could not put object to active blobovnicza {"path": "1/0", "error": "write /data/neofs/data0/blobovnicza/1/0: resource temporarily unavailable"} Nov 09 13:39:14 az neofs-node[2899]: 2022-11-09T13:39:14.709Z warn engine/engine.go:51 can't flush an object to blobstor {"shard_id": "KGjwru9PsvofvDeCWHYZKL", "error count": 13, "error": "could not save the object in any blobovnicza"} Nov 09 13:39:14 az neofs-node[2899]: 2022-11-09T13:39:14.709Z debug writecache/flush.go:122 tried to flush items from write-cache {"shard_id": "KGjwru9PsvofvDeCWHYZKL", "count": 2, "start": ""} Nov 09 13:39:14 az neofs-node[2899]: 2022-11-09T13:39:14.709Z debug blobovniczatree/put.go:64 could not put object to active blobovnicza {"path": "1/0", "error": "write /data/neofs/data0/blobovnicza/1/0: resource temporarily unavailable"} Nov 09 13:39:14 az neofs-node[2899]: 2022-11-09T13:39:14.709Z warn engine/engine.go:51 can't flush an object to blobstor {"shard_id": "KGjwru9PsvofvDeCWHYZKL", "error count": 14, "error": "could not save the object in any blobovnicza"} ``` 2.shard_ro_error_threshold reached, but shard doesn't switch to read-obly mode according to config: ``` root@az:/etc/neofs/storage# grep shard_ro_error_threshold config.yml shard_ro_error_threshold: 10 ``` ``` root@az:/# sudo neofs-cli --endpoint 127.0.0.1:8091 -w /etc/neofs/storage/wallet.json control shards list Enter password > Shard KGjwru9PsvofvDeCWHYZKL: Mode: read-write Metabase: /srv/neofs/meta0/metabase0.db Blobstor: Path 0: /data/neofs/data0/blobovnicza Type 0: blobovnicza Path 1: /data/neofs/data0 Type 1: fstree Write-cache: /srv/neofs/meta0/write_cache0 Pilorama: /srv/neofs/meta0/pilorama0.db Error count: 27204 Shard B55apDQRxqLW5XAKyvDEwM: Mode: read-write Metabase: /srv/neofs/meta0/metabase1.db Blobstor: Path 0: /data1/neofs/data1/blobovnicza Type 0: blobovnicza Path 1: /data1/neofs/data1 Type 1: fstree Write-cache: /srv/neofs/meta0/write_cache1 Pilorama: /srv/neofs/meta0/pilorama1.db Error count: 0 ``` **Config:** [config.zip](https://github.com/nspcc-dev/neofs-node/files/9972012/config.zip) **Config of unreliablefs:** ``` neofs-storage@az:/data$ cat unreliablefs.conf [errinj_errno] op_regexp = WRITE path_regexp = .data0* probability = 90 ``` **Versions:** ``` NeoFS Storage node Version: v0.34.0-19-gd2cce629 GoVersion: go1.18.4 ``` **Your Environment** Server setup and configuration: cloud, 4 VMs, 4 SN, 4 http qw, 4 s3 gw Operating System and version (uname -a): linux vedi 5.10.0-16-amd64 https://github.com/nspcc-dev/neofs-node/issues/1 SMP Debian 5.10.127-1 (2022-06-30) x86_64 GNU/Linux
sami 2025-12-28 17:20:59 +00:00
Author
Owner

@fyrchik commented on GitHub (Nov 9, 2022):

Related #2013. This is currently an expected behaviour. Let's think if we can do better.
To be clear, background errors accumulate without switching mode. All user operations still trigger mode switch.

@fyrchik commented on GitHub (Nov 9, 2022): Related #2013. This is currently an expected behaviour. Let's think if we can do better. To be clear, background errors accumulate without switching mode. All user operations still trigger mode switch.
Author
Owner

@anikeev-yadro commented on GitHub (Nov 9, 2022):

Logs :bug_err_count2.tar.gz.zip

@anikeev-yadro commented on GitHub (Nov 9, 2022): **Logs :**[bug_err_count2.tar.gz.zip](https://github.com/nspcc-dev/neofs-node/files/9972203/bug_err_count2.tar.gz.zip)
Author
Owner

@anikeev-yadro commented on GitHub (Nov 9, 2022):

Related #2013. This is currently an expected behaviour. Let's think if we can do better. To be clear, background errors accumulate without switching mode. All user operations still trigger mode switch.

As example we can use 2 different counters: internal operation errors and user operation errors. Switching mode will happen only if the user operational errors reached threshold.

@anikeev-yadro commented on GitHub (Nov 9, 2022): > Related #2013. This is currently an expected behaviour. Let's think if we can do better. To be clear, background errors accumulate without switching mode. All user operations still trigger mode switch. As example we can use 2 different counters: internal operation errors and user operation errors. Switching mode will happen only if the user operational errors reached threshold.
Author
Owner

@fyrchik commented on GitHub (Nov 12, 2022):

Closed via #2032

@fyrchik commented on GitHub (Nov 12, 2022): Closed via #2032
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
nspcc-dev/neofs-node#873
No description provided.