Couldn't get object after node rebooted (gRPC, complex object) #966

Closed
opened 2025-12-28 17:21:19 +00:00 by sami · 5 comments
Owner

Originally created by @anikeev-yadro on GitHub (Jan 30, 2023).

Originally assigned to: @carpawell on GitHub.

Steps to Reproduce (for bugs)

  1. Preset objects and start k6 gRPC load
date;./scenarios/preset/preset_grpc.py --size 150000 --containers 40 --out /var/data/tmp/grpc_300123-150000.json --endpoint 10.78.70.118:8080 --preload_obj 20  --wallet scenarios/files/wallet.json --config scenarios/files/empty_pass.yaml

date;./k6 run -e DURATION=600 -e WRITERS=10 -e READERS=10 -e DELETERS=0 -e DELETE_AGE=10 -e REGISTRY_FILE=/var/data/tmp/grpc_reg_300123-150000-1.json -e WRITE_OBJ_SIZE=150000  -e GRPC_ENDPOINTS=10.78.70.118:8080,10.78.70.119:8080,10.78.70.121:8080 -e PREGEN_JSON=/var/data/tmp/grpc_300123-150000.json scenarios/grpc.js
  1. Rebooted node vedi (10.78.70.120)
anikeev@NB-1670:~$ date;ipmitool -I lanplus -H 10.78.68.120 -U admin -P admin power reset
Mon Jan 30 14:47:45 MSK 2023
Chassis Power Control: Reset
  1. Load has been finished without errors
root@loader01:/home/service# date;./k6 run -e DURATION=600 -e WRITERS=10 -e READERS=10 -e DELETERS=0 -e DELETE_AGE=10 -e REGISTRY_FILE=/var/data/tmp/grpc_reg_300123-150000-1.json -e WRITE_OBJ_SIZE=150000  -e GRPC_ENDPOINTS=10.78.70.118:8080,10.78.70.119:8080,10.78.70.121:8080 -e PREGEN_JSON=/var/data/tmp/grpc_300123-150000.json -e STREAM_TIMEOUT=30 scenarios/grpc.js
Mon Jan 30 11:46:43 UTC 2023

          /\      |‾‾| /‾‾/   /‾‾/
     /\  /  \     |  |/  /   /  /
    /  \/    \    |     (   /   ‾‾\
   /          \   |  |\  \ |  (‾)  |
  / __________ \  |__| \__\ \_____/ .io

  execution: local
     script: scenarios/grpc.js
     output: -

  scenarios: (100.00%) 2 scenarios, 20 max VUs, 10m5s max duration (incl. graceful stop):
           * read: 10 looping VUs for 10m0s (exec: obj_read, gracefulStop: 5s)
           * write: 10 looping VUs for 10m0s (exec: obj_write, gracefulStop: 5s)

INFO[0001] Pregenerated containers:       40             source=console
INFO[0001] Pregenerated read object size: 150000 Kb      source=console
INFO[0001] Pregenerated total objects:    800            source=console
INFO[0001] Reading VUs:                   10             source=console
INFO[0001] Writing VUs:                   10             source=console
INFO[0001] Deleting VUs:                  0              source=console
INFO[0001] Total VUs:                     20             source=console
INFO[0606] {"cid":"5SFvCr97SeKPpnhskK8XzHMHr6ypTVBMoo1un51YUqFT","error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}  source=console

running (10m05.0s), 00/20 VUs, 2665 complete and 4 interrupted iterations
read  ✓ [======================================] 10 VUs  10m0s
write ✓ [======================================] 10 VUs  10m0s

     █ setup

     █ teardown

     data_received............: 324 GB 536 MB/s
     data_sent................: 85 GB  140 MB/s
     iteration_duration.......: avg=4.5s   min=183.65µs med=2.87s max=35.15s p(90)=10.36s p(95)=12.09s
     iterations...............: 2665   4.404614/s
     neofs_obj_get_duration...: avg=2.84s  min=856.32ms med=2.57s max=21.38s p(90)=4.05s  p(95)=4.7s
     neofs_obj_get_total......: 2112   3.490636/s
     neofs_obj_put_duration...: avg=10.31s min=6.58s    med=9.45s max=34.58s p(90)=13.76s p(95)=16.11s
     neofs_obj_put_total......: 557    0.920589/s
     vus......................: 5      min=5      max=20
     vus_max..................: 20     min=20     max=20
  1. Waited for glagoli booted up.
  2. Start k6 verify
  3. Verify got 121 errors context deadline exceeded and object not found
date;./k6 run -e TIME_LIMIT=1200 -e CLIENTS=150 -e GRPC_ENDPOINTS=10.78.70.118:8080,10.78.70.119:8080,10.78.70.120:8080,10.78.70.121:8080 -e REGISTRY_FILE=/var/data/tmp/grpc_reg_15000123-9000-1.json scenarios/verify.js

Mon Jan 30 12:18:54 UTC 2023

          /\      |‾‾| /‾‾/   /‾‾/
     /\  /  \     |  |/  /   /  /
    /  \/    \    |     (   /   ‾‾\
   /          \   |  |\  \ |  (‾)  |
  / __________ \  |__| \__\ \_____/ .io

  execution: local
     script: scenarios/verify.js
     output: -

  scenarios: (100.00%) 1 scenario, 150 max VUs, 20m5s max duration (incl. graceful stop):
           * verify: 553 iterations shared among 150 VUs (maxDuration: 20m0s, exec: obj_verify, gracefulStop: 5s)

INFO[0010] Verify error on 37: context deadline exceeded. Object will be re-tried (bucket: DxDp6Q7ih8fz2fzgwWrFU8WM4pghoCxdbkh
INFO[0010] Verify error on 91: context deadline exceeded. Object will be re-tried (bucket: 8y77dHB
.
.
.
INFO[0049] Verify error on 228: status: code = 2049 message = object not found. Object will be re-tried (bucket: AHURLs4GXXsbT3t)  source=console

running (00m51.1s), 000/150 VUs, 553 complete and 0 interrupted iterations
verify ✓ [======================================] 150 VUs  00m51.0s/20m0s  553/553 shared iters

     █ setup

     data_received........: 0 B 0 B/s
     data_sent............: 0 B 0 B/s
     invalid_obj..........: 121 2.369485/s
     iteration_duration...: avg=11.13s min=8.59ms med=8.87s max=44.13s p(90)=22.36s p(95)=30.96s
     iterations...........: 553 10.829133/s
     skipped_obj..........: 0   0/s
     verified_obj.........: 432 8.459648/s
     vus..................: 1   min=1       max=150
     vus_max..............: 150 min=150     max=150

There is an example object with errors:

Jan 30 12:22:31 vedi neofs-node[8520]: 2023-01-30T12:22:31.984Z        debug        get/container.go:18        trying to execute in container...        {"component": "Object.Get service", "request": "HEAD", "address": "AHURLs4GXXsbTDybDMovvKxKy9ehSUs3VHrPRj66fFfo/2oCx8iK2Zpp9mxDRn4V7JGeLsVXvRw8KNqwEaq5Uc13t", "raw": false, "local": false, "with session": false, "with bearer": false, "netmap lookup depth": 0}
Jan 30 12:22:31 vedi neofs-node[8520]: 2023-01-30T12:22:31.984Z        debug        get/container.go:46        process epoch        {"component": "Object.Get service", "request": "HEAD", "address": "AHURLs4GXXsbTDybDMovvKxKy9ehSUs3VHrPRj66fFfo/2oCx8iK2Zpp9mxDRn4V7JGeLsVXvRw8KNqwEaq5Uc13t", "raw": false, "local": false, "with session": false, "with bearer": false, "number": 581}
Jan 30 12:22:31 vedi neofs-node[8520]: 2023-01-30T12:22:31.984Z        debug        get/remote.go:14        processing node...        {"component": "Object.Get service", "request": "HEAD", "address": "AHURLs4GXXsbTDybDMovvKxKy9ehSUs3VHrPRj66fFfo/2oCx8iK2Zpp9mxDRn4V7JGeLsVXvRw8KNqwEaq5Uc13t", "raw": false, "local": false, "with session": false, "with bearer": false}
Jan 30 12:22:31 vedi neofs-node[8520]: 2023-01-30T12:22:31.990Z        debug        get/remote.go:34        remote call failed        {"component": "Object.Get service", "request": "HEAD", "address": "AHURLs4GXXsbTDybDMovvKxKy9ehSUs3VHrPRj66fFfo/2oCx8iK2Zpp9mxDRn4V7JGeLsVXvRw8KNqwEaq5Uc13t", "raw": false, "local": false, "with session": false, "with bearer": false, "error": "unexpected header type <nil>"}
Jan 30 12:22:31 vedi neofs-node[8520]: 2023-01-30T12:22:31.990Z        debug        get/remote.go:14        processing node...        {"component": "Object.Get service", "request": "HEAD", "address": "AHURLs4GXXsbTDybDMovvKxKy9ehSUs3VHrPRj66fFfo/2oCx8iK2Zpp9mxDRn4V7JGeLsVXvRw8KNqwEaq5Uc13t", "raw": false, "local": false, "with session": false, "with bearer": false}
Jan 30 12:22:31 vedi neofs-node[8520]: 2023-01-30T12:22:31.994Z        debug        get/container.go:87        completing the operation        {"component": "Object.Get service", "request": "HEAD", "address": "AHURLs4GXXsbTDybDMovvKxKy9ehSUs3VHrPRj66fFfo/2oCx8iK2Zpp9mxDRn4V7JGeLsVXvRw8KNqwEaq5Uc13t", "raw": false, "local": false, "with session": false, "with bearer": false}

Logs

Versions

NeoFS Storage node
Version: v0.35.0-21-geda6fba3-dirty
GoVersion: go1.18.4

Your Environment

Server setup and configuration:
HW, 4 servers, 4 SN, 4 http qw, 4 s3 gw

Operating System and version (uname -a):
linux vedi 5.10.0-16-amd64 https://github.com/nspcc-dev/neofs-node/issues/1 SMP Debian 5.10.127-1 (2022-06-30) x86_64 GNU/Linux

Originally created by @anikeev-yadro on GitHub (Jan 30, 2023). Originally assigned to: @carpawell on GitHub. ## Steps to Reproduce (for bugs) 1. Preset objects and start k6 gRPC load ``` date;./scenarios/preset/preset_grpc.py --size 150000 --containers 40 --out /var/data/tmp/grpc_300123-150000.json --endpoint 10.78.70.118:8080 --preload_obj 20 --wallet scenarios/files/wallet.json --config scenarios/files/empty_pass.yaml date;./k6 run -e DURATION=600 -e WRITERS=10 -e READERS=10 -e DELETERS=0 -e DELETE_AGE=10 -e REGISTRY_FILE=/var/data/tmp/grpc_reg_300123-150000-1.json -e WRITE_OBJ_SIZE=150000 -e GRPC_ENDPOINTS=10.78.70.118:8080,10.78.70.119:8080,10.78.70.121:8080 -e PREGEN_JSON=/var/data/tmp/grpc_300123-150000.json scenarios/grpc.js ``` 2. Rebooted node vedi (10.78.70.120) ``` anikeev@NB-1670:~$ date;ipmitool -I lanplus -H 10.78.68.120 -U admin -P admin power reset Mon Jan 30 14:47:45 MSK 2023 Chassis Power Control: Reset ``` 4. Load has been finished without errors ``` root@loader01:/home/service# date;./k6 run -e DURATION=600 -e WRITERS=10 -e READERS=10 -e DELETERS=0 -e DELETE_AGE=10 -e REGISTRY_FILE=/var/data/tmp/grpc_reg_300123-150000-1.json -e WRITE_OBJ_SIZE=150000 -e GRPC_ENDPOINTS=10.78.70.118:8080,10.78.70.119:8080,10.78.70.121:8080 -e PREGEN_JSON=/var/data/tmp/grpc_300123-150000.json -e STREAM_TIMEOUT=30 scenarios/grpc.js Mon Jan 30 11:46:43 UTC 2023 /\ |‾‾| /‾‾/ /‾‾/ /\ / \ | |/ / / / / \/ \ | ( / ‾‾\ / \ | |\ \ | (‾) | / __________ \ |__| \__\ \_____/ .io execution: local script: scenarios/grpc.js output: - scenarios: (100.00%) 2 scenarios, 20 max VUs, 10m5s max duration (incl. graceful stop): * read: 10 looping VUs for 10m0s (exec: obj_read, gracefulStop: 5s) * write: 10 looping VUs for 10m0s (exec: obj_write, gracefulStop: 5s) INFO[0001] Pregenerated containers: 40 source=console INFO[0001] Pregenerated read object size: 150000 Kb source=console INFO[0001] Pregenerated total objects: 800 source=console INFO[0001] Reading VUs: 10 source=console INFO[0001] Writing VUs: 10 source=console INFO[0001] Deleting VUs: 0 source=console INFO[0001] Total VUs: 20 source=console INFO[0606] {"cid":"5SFvCr97SeKPpnhskK8XzHMHr6ypTVBMoo1un51YUqFT","error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"} source=console running (10m05.0s), 00/20 VUs, 2665 complete and 4 interrupted iterations read ✓ [======================================] 10 VUs 10m0s write ✓ [======================================] 10 VUs 10m0s █ setup █ teardown data_received............: 324 GB 536 MB/s data_sent................: 85 GB 140 MB/s iteration_duration.......: avg=4.5s min=183.65µs med=2.87s max=35.15s p(90)=10.36s p(95)=12.09s iterations...............: 2665 4.404614/s neofs_obj_get_duration...: avg=2.84s min=856.32ms med=2.57s max=21.38s p(90)=4.05s p(95)=4.7s neofs_obj_get_total......: 2112 3.490636/s neofs_obj_put_duration...: avg=10.31s min=6.58s med=9.45s max=34.58s p(90)=13.76s p(95)=16.11s neofs_obj_put_total......: 557 0.920589/s vus......................: 5 min=5 max=20 vus_max..................: 20 min=20 max=20 ``` 5. Waited for glagoli booted up. 6. Start k6 verify 7. Verify got 121 errors ```context deadline exceeded``` and ```object not found``` ``` date;./k6 run -e TIME_LIMIT=1200 -e CLIENTS=150 -e GRPC_ENDPOINTS=10.78.70.118:8080,10.78.70.119:8080,10.78.70.120:8080,10.78.70.121:8080 -e REGISTRY_FILE=/var/data/tmp/grpc_reg_15000123-9000-1.json scenarios/verify.js Mon Jan 30 12:18:54 UTC 2023 /\ |‾‾| /‾‾/ /‾‾/ /\ / \ | |/ / / / / \/ \ | ( / ‾‾\ / \ | |\ \ | (‾) | / __________ \ |__| \__\ \_____/ .io execution: local script: scenarios/verify.js output: - scenarios: (100.00%) 1 scenario, 150 max VUs, 20m5s max duration (incl. graceful stop): * verify: 553 iterations shared among 150 VUs (maxDuration: 20m0s, exec: obj_verify, gracefulStop: 5s) INFO[0010] Verify error on 37: context deadline exceeded. Object will be re-tried (bucket: DxDp6Q7ih8fz2fzgwWrFU8WM4pghoCxdbkh INFO[0010] Verify error on 91: context deadline exceeded. Object will be re-tried (bucket: 8y77dHB . . . INFO[0049] Verify error on 228: status: code = 2049 message = object not found. Object will be re-tried (bucket: AHURLs4GXXsbT3t) source=console running (00m51.1s), 000/150 VUs, 553 complete and 0 interrupted iterations verify ✓ [======================================] 150 VUs 00m51.0s/20m0s 553/553 shared iters █ setup data_received........: 0 B 0 B/s data_sent............: 0 B 0 B/s invalid_obj..........: 121 2.369485/s iteration_duration...: avg=11.13s min=8.59ms med=8.87s max=44.13s p(90)=22.36s p(95)=30.96s iterations...........: 553 10.829133/s skipped_obj..........: 0 0/s verified_obj.........: 432 8.459648/s vus..................: 1 min=1 max=150 vus_max..............: 150 min=150 max=150 ``` There is an example object with errors: ``` Jan 30 12:22:31 vedi neofs-node[8520]: 2023-01-30T12:22:31.984Z debug get/container.go:18 trying to execute in container... {"component": "Object.Get service", "request": "HEAD", "address": "AHURLs4GXXsbTDybDMovvKxKy9ehSUs3VHrPRj66fFfo/2oCx8iK2Zpp9mxDRn4V7JGeLsVXvRw8KNqwEaq5Uc13t", "raw": false, "local": false, "with session": false, "with bearer": false, "netmap lookup depth": 0} Jan 30 12:22:31 vedi neofs-node[8520]: 2023-01-30T12:22:31.984Z debug get/container.go:46 process epoch {"component": "Object.Get service", "request": "HEAD", "address": "AHURLs4GXXsbTDybDMovvKxKy9ehSUs3VHrPRj66fFfo/2oCx8iK2Zpp9mxDRn4V7JGeLsVXvRw8KNqwEaq5Uc13t", "raw": false, "local": false, "with session": false, "with bearer": false, "number": 581} Jan 30 12:22:31 vedi neofs-node[8520]: 2023-01-30T12:22:31.984Z debug get/remote.go:14 processing node... {"component": "Object.Get service", "request": "HEAD", "address": "AHURLs4GXXsbTDybDMovvKxKy9ehSUs3VHrPRj66fFfo/2oCx8iK2Zpp9mxDRn4V7JGeLsVXvRw8KNqwEaq5Uc13t", "raw": false, "local": false, "with session": false, "with bearer": false} Jan 30 12:22:31 vedi neofs-node[8520]: 2023-01-30T12:22:31.990Z debug get/remote.go:34 remote call failed {"component": "Object.Get service", "request": "HEAD", "address": "AHURLs4GXXsbTDybDMovvKxKy9ehSUs3VHrPRj66fFfo/2oCx8iK2Zpp9mxDRn4V7JGeLsVXvRw8KNqwEaq5Uc13t", "raw": false, "local": false, "with session": false, "with bearer": false, "error": "unexpected header type <nil>"} Jan 30 12:22:31 vedi neofs-node[8520]: 2023-01-30T12:22:31.990Z debug get/remote.go:14 processing node... {"component": "Object.Get service", "request": "HEAD", "address": "AHURLs4GXXsbTDybDMovvKxKy9ehSUs3VHrPRj66fFfo/2oCx8iK2Zpp9mxDRn4V7JGeLsVXvRw8KNqwEaq5Uc13t", "raw": false, "local": false, "with session": false, "with bearer": false} Jan 30 12:22:31 vedi neofs-node[8520]: 2023-01-30T12:22:31.994Z debug get/container.go:87 completing the operation {"component": "Object.Get service", "request": "HEAD", "address": "AHURLs4GXXsbTDybDMovvKxKy9ehSUs3VHrPRj66fFfo/2oCx8iK2Zpp9mxDRn4V7JGeLsVXvRw8KNqwEaq5Uc13t", "raw": false, "local": false, "with session": false, "with bearer": false} ``` ## Logs ## Versions ``` NeoFS Storage node Version: v0.35.0-21-geda6fba3-dirty GoVersion: go1.18.4 ``` ## Your Environment Server setup and configuration: HW, 4 servers, 4 SN, 4 http qw, 4 s3 gw Operating System and version (uname -a): linux vedi 5.10.0-16-amd64 https://github.com/nspcc-dev/neofs-node/issues/1 SMP Debian 5.10.127-1 (2022-06-30) x86_64 GNU/Linux
sami 2025-12-28 17:21:19 +00:00
  • closed this issue
  • added the
    bug
    U0
    labels
Author
Owner

@roman-khimov commented on GitHub (Jan 30, 2023):

Is there anything else interesting for this object in logs? Maybe replication errors?

@roman-khimov commented on GitHub (Jan 30, 2023): Is there anything else interesting for this object in logs? Maybe replication errors?
Author
Owner

@anikeev-yadro commented on GitHub (Jan 30, 2023):

where can I upload 211Mb of logs?
Github allow only 25 Mb per file.

@anikeev-yadro commented on GitHub (Jan 30, 2023): where can I upload 211Mb of logs? Github allow only 25 Mb per file.
Author
Owner

@roman-khimov commented on GitHub (Jan 30, 2023):

Is that compressed? You can filter it for this object specifically (2oCx8iK2Zpp9mxDRn4V7JGeLsVXvRw8KNqwEaq5Uc13t IIUC).

@roman-khimov commented on GitHub (Jan 30, 2023): Is that compressed? You can filter it for this object specifically (`2oCx8iK2Zpp9mxDRn4V7JGeLsVXvRw8KNqwEaq5Uc13t` IIUC).
Author
Owner
@anikeev-yadro commented on GitHub (Jan 30, 2023): [az_2oCx8iK2Zpp9mxDRn4V7JGeLsVXvRw8KNqwEaq5Uc13t.txt](https://github.com/nspcc-dev/neofs-node/files/10536719/az_2oCx8iK2Zpp9mxDRn4V7JGeLsVXvRw8KNqwEaq5Uc13t.txt) [buki_2oCx8iK2Zpp9mxDRn4V7JGeLsVXvRw8KNqwEaq5Uc13t.txt](https://github.com/nspcc-dev/neofs-node/files/10536722/buki_2oCx8iK2Zpp9mxDRn4V7JGeLsVXvRw8KNqwEaq5Uc13t.txt) [glagoli_2oCx8iK2Zpp9mxDRn4V7JGeLsVXvRw8KNqwEaq5Uc13t.txt](https://github.com/nspcc-dev/neofs-node/files/10536723/glagoli_2oCx8iK2Zpp9mxDRn4V7JGeLsVXvRw8KNqwEaq5Uc13t.txt) [vedi_2oCx8iK2Zpp9mxDRn4V7JGeLsVXvRw8KNqwEaq5Uc13t.txt](https://github.com/nspcc-dev/neofs-node/files/10536724/vedi_2oCx8iK2Zpp9mxDRn4V7JGeLsVXvRw8KNqwEaq5Uc13t.txt)
Author
Owner

@anikeev-yadro commented on GitHub (Feb 8, 2023):

I suggest to close because root cause connected to different epochs on nodes.

service@glagoli:~$ neofs-cli netmap epoch -r 10.78.70.119:8080 -g
799
service@glagoli:~$ neofs-cli netmap epoch -r 10.78.70.118:8080 -g
808
service@glagoli:~$ neofs-cli netmap epoch -r 10.78.70.120:8080 -g
808
service@glagoli:~$ neofs-cli netmap epoch -r 10.78.70.121:8080 -g
792 

It was fixed in https://github.com/nspcc-dev/neofs-node/issues/2212.

@anikeev-yadro commented on GitHub (Feb 8, 2023): I suggest to close because root cause connected to different epochs on nodes. ``` service@glagoli:~$ neofs-cli netmap epoch -r 10.78.70.119:8080 -g 799 service@glagoli:~$ neofs-cli netmap epoch -r 10.78.70.118:8080 -g 808 service@glagoli:~$ neofs-cli netmap epoch -r 10.78.70.120:8080 -g 808 service@glagoli:~$ neofs-cli netmap epoch -r 10.78.70.121:8080 -g 792 ``` It was fixed in https://github.com/nspcc-dev/neofs-node/issues/2212.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
nspcc-dev/neofs-node#966
No description provided.