Couldn't get object after node rebooted (s3 load) #959

Closed
opened 2025-12-28 17:21:18 +00:00 by sami · 4 comments
Owner

Originally created by @anikeev-yadro on GitHub (Jan 26, 2023).

Originally assigned to: @fyrchik on GitHub.

Steps to Reproduce (for bugs)

  1. Started k6 s3 load
./k6 run -e DURATION=600 -e WRITERS=100 -e READERS=0 -e DELETERS=0 -e DELETE_AGE=10 -e REGISTRY_FILE=/var/data/tmp/s3_reg_260123-1000-1.json -e WRITE_OBJ_SIZE=1000 -e S3_ENDPOINTS=10.78.70.118:8084,10.78.70.119:8084,10.78.70.120:8084 -e PREGEN_JSON=/var/data/tmp/s3_260123-1.json scenarios/s3.js
  1. Rebooted node glagoli (10.78.70.121)
  2. Load has been finished without errors
./k6 run -e DURATION=600 -e WRITERS=100 -e READERS=0 -e DELETERS=0 -e DELETE_AGE=10 -e REGISTRY_FILE=/var/data/tmp/s3_reg_260123-1000-1.json -e WRITE_OBJ_SIZE=1000 -e S3_ENDPOINTS=10.78.70.118:8084,10.78.70.119:8084,10.78.70.120:8084 -e PREGEN_JSON=/var/data/tmp/s3_260123-1.json scenarios/s3.js

      /\      |‾‾| /‾‾/   /‾‾/
 /\  /  \     |  |/  /   /  /
/  \/    \    |     (   /   ‾‾\
/ \ | |\ \ | (‾) |
/ __________ \ |__| __\ _____/ .io

execution: local
script: scenarios/s3.js
output: -

scenarios: (100.00%) 1 scenario, 100 max VUs, 10m5s max duration (incl. graceful stop):
* write: 100 looping VUs for 10m0s (exec: obj_write, gracefulStop: 5s)

INFO[0001] Pregenerated buckets: 40 source=console
INFO[0001] Pregenerated read object size: 64 Kb source=console
INFO[0001] Pregenerated total objects: 800 source=console
INFO[0001] Reading VUs: 0 source=console
INFO[0001] Writing VUs: 100 source=console
INFO[0001] Deleting VUs: 0 source=console
INFO[0001] Total VUs: 100 source=console

running (10m00.1s), 000/100 VUs, 184426 complete and 0 interrupted iterations
write ✓ [======================================] 100 VUs 10m0s

 █ setup

 █ teardown

 aws_obj_put_duration...: avg=318.95ms min=57.72ms  med=133.82ms max=13.15s p(90)=354.68ms p(95)=758.15ms
 aws_obj_put_total......: 184426 307.311864/s
 data_received..........: 0 B    0 B/s
 data_sent..............: 189 GB 314 MB/s
 iteration_duration.....: avg=325.33ms min=314.83µs med=140.39ms max=13.16s p(90)=361.44ms p(95)=764.55ms
 iterations.............: 184426 307.311864/s
 vus....................: 100    min=100      max=100
 vus_max................: 100    min=100      max=100
  1. Waited for glagoli booted up.
  2. Start k6 verify
  3. Verify got 159 errors "object not found"
root@loader01:/home/service# ./k6 run -e TIME_LIMIT=1200 -e CLIENTS=150 -e S3_ENDPOINTS=10.78.70.118:8084,10.78.70.119:8084,10.78.70.120:8084,10.78.70.121:8084 -e REGISTRY_FILE=/var/data/tmp/s3_reg_260123-1000-1.json scenarios/verify.js

          /\      |‾‾| /‾‾/   /‾‾/
     /\  /  \     |  |/  /   /  /
    /  \/    \    |     (   /   ‾‾\
   /          \   |  |\  \ |  (‾)  |
  / __________ \  |__| \__\ \_____/ .io

  execution: local
     script: scenarios/verify.js
     output: -

  scenarios: (100.00%) 1 scenario, 150 max VUs, 20m5s max duration (incl. graceful stop):
           * verify: 184426 iterations shared among 150 VUs (maxDuration: 20m0s, exec: obj_verify, gracefulStop: 5s)

INFO[0052] Verify error on 58982: operation error S3: GetObject, https response error StatusCode: 404, RequestID: e1b91169-64d3-4480-9573-27c7b3427f52, HostID: e292b8be-6506-4972-966e-ad76b7252759, NoSuchKey: . Object will be re-tried  source=console
.
.
.
.
INFO[0172] Verify error on 109949: operation error S3: GetObject, https response error StatusCode: 404, RequestID: d9e7743e-fe83-4d61-b09a-26255fe93980, HostID: e292b8be-6506-4972-966e-ad76b7252759, NoSuchKey: . Object will be re-tried  source=console

running (02m45.0s), 000/150 VUs, 110365 complete and 150 interrupted iterations
verify ✗ [=====================>----------------] 150 VUs  02m40.0s/20m0s  110365/184426 shared iters

     █ setup

     data_received........: 0 B    0 B/s
     data_sent............: 0 B    0 B/s
     invalid_obj..........: 159    0.963816/s
     iteration_duration...: avg=217.3ms min=7.05ms med=99.52ms max=17.02s p(90)=334.62ms p(95)=547.58ms
     iterations...........: 110365 669.003515/s
     skipped_obj..........: 0      0/s
     verified_obj.........: 110206 668.039699/s
     vus..................: 150    min=0        max=150
     vus_max..............: 150    min=0        max=150

  1. After ~20min objects successfully got

Example object with errors:

Jan 26 12:58:11 vedi neofs-s3-gw[8995]: 2023-01-26T12:58:11.203Z        info        api/router.go:162        call method        {"status": 200, "host": "10.78.70.120:8084", "request_id": "af7eee58-d2e7-42d2-8a99-8f3dd8484d4a", "method": "PutObject", "bucket": "171b9127-4f9d-457e-b826-684daaa485a4", "object": "557c9631-711e-4030-8a1d-3f90104e7a77", "description": "OK"}

Jan 26 13:09:05 glagoli neofs-s3-gw[9217]: 2023-01-26T13:09:05.432Z        error        handler/util.go:29        call method        {"status": 404, "request_id": "f36f785d-c011-4da4-8c2d-b469317991ef", "method": "GetObject", "bucket": "171b9127-4f9d-457e-b826-684daaa485a4", "object": "557c9631-711e-4030-8a1d-3f90104e7a77", "description": "could not find object", "error": "NoSuchKey: 404 => The specified key does not exist."}
Jan 26 13:09:05 glagoli neofs-s3-gw[9217]: 2023-01-26T13:09:05.437Z        error        handler/util.go:29        call method        {"status": 404, "request_id": "3ba5a646-ff29-423a-9024-a772eab5ce88", "method": "GetObject", "bucket": "171b9127-4f9d-457e-b826-684daaa485a4", "object": "557c9631-711e-4030-8a1d-3f90104e7a77", "description": "could not find object", "error": "NoSuchKey: 404 => The specified key does not exist."}
Jan 26 13:09:05 glagoli neofs-s3-gw[9217]: 2023-01-26T13:09:05.443Z        error        handler/util.go:29        call method        {"status": 404, "request_id": "d9e7743e-fe83-4d61-b09a-26255fe93980", "method": "GetObject", "bucket": "171b9127-4f9d-457e-b826-684daaa485a4", "object": "557c9631-711e-4030-8a1d-3f90104e7a77", "description": "could not find object", "error": "NoSuchKey: 404 => The specified key does not exist."}

Jan 26 13:28:08 glagoli neofs-s3-gw[9217]: 2023-01-26T13:28:08.404Z        info        api/router.go:162        call method        {"status": 200, "host": "10.78.70.121:8084", "request_id": "fa4b75c0-ccd7-4c51-a9cf-e00200d0352d", "method": "HeadObject", "bucket": "171b9127-4f9d-457e-b826-684daaa485a4", "object": "557c9631-711e-4030-8a1d-3f90104e7a77", "description": "OK"}

Jan 26 13:29:58 glagoli neofs-s3-gw[9217]: 2023-01-26T13:29:58.435Z        info        api/router.go:162        call method        {"status": 200, "host": "10.78.70.121:8084", "request_id": "716cfe3c-0eaf-499c-a326-5ee4bd4a2162", "method": "GetObject", "bucket": "171b9127-4f9d-457e-b826-684daaa485a4", "object": "557c9631-711e-4030-8a1d-3f90104e7a77", "description": "OK"}

Logs

glagoli.log.tar.gz
vedi.log.tar.gz

Versions

NeoFS Storage node
Version: v0.35.0-21-geda6fba3-dirty
GoVersion: go1.18.4

Your Environment

Server setup and configuration:
HW, 4 servers, 4 SN, 4 http qw, 4 s3 gw

Operating System and version (uname -a):
linux vedi 5.10.0-16-amd64 https://github.com/nspcc-dev/neofs-node/issues/1 SMP Debian 5.10.127-1 (2022-06-30) x86_64 GNU/Linux

Originally created by @anikeev-yadro on GitHub (Jan 26, 2023). Originally assigned to: @fyrchik on GitHub. ## Steps to Reproduce (for bugs) 1. Started k6 s3 load ``` ./k6 run -e DURATION=600 -e WRITERS=100 -e READERS=0 -e DELETERS=0 -e DELETE_AGE=10 -e REGISTRY_FILE=/var/data/tmp/s3_reg_260123-1000-1.json -e WRITE_OBJ_SIZE=1000 -e S3_ENDPOINTS=10.78.70.118:8084,10.78.70.119:8084,10.78.70.120:8084 -e PREGEN_JSON=/var/data/tmp/s3_260123-1.json scenarios/s3.js ``` 2. Rebooted node glagoli (10.78.70.121) 3. Load has been finished without errors ``` ./k6 run -e DURATION=600 -e WRITERS=100 -e READERS=0 -e DELETERS=0 -e DELETE_AGE=10 -e REGISTRY_FILE=/var/data/tmp/s3_reg_260123-1000-1.json -e WRITE_OBJ_SIZE=1000 -e S3_ENDPOINTS=10.78.70.118:8084,10.78.70.119:8084,10.78.70.120:8084 -e PREGEN_JSON=/var/data/tmp/s3_260123-1.json scenarios/s3.js /\ |‾‾| /‾‾/ /‾‾/ /\ / \ | |/ / / / / \/ \ | ( / ‾‾\ / \ | |\ \ | (‾) | / __________ \ |__| __\ _____/ .io execution: local script: scenarios/s3.js output: - scenarios: (100.00%) 1 scenario, 100 max VUs, 10m5s max duration (incl. graceful stop): * write: 100 looping VUs for 10m0s (exec: obj_write, gracefulStop: 5s) INFO[0001] Pregenerated buckets: 40 source=console INFO[0001] Pregenerated read object size: 64 Kb source=console INFO[0001] Pregenerated total objects: 800 source=console INFO[0001] Reading VUs: 0 source=console INFO[0001] Writing VUs: 100 source=console INFO[0001] Deleting VUs: 0 source=console INFO[0001] Total VUs: 100 source=console running (10m00.1s), 000/100 VUs, 184426 complete and 0 interrupted iterations write ✓ [======================================] 100 VUs 10m0s █ setup █ teardown aws_obj_put_duration...: avg=318.95ms min=57.72ms med=133.82ms max=13.15s p(90)=354.68ms p(95)=758.15ms aws_obj_put_total......: 184426 307.311864/s data_received..........: 0 B 0 B/s data_sent..............: 189 GB 314 MB/s iteration_duration.....: avg=325.33ms min=314.83µs med=140.39ms max=13.16s p(90)=361.44ms p(95)=764.55ms iterations.............: 184426 307.311864/s vus....................: 100 min=100 max=100 vus_max................: 100 min=100 max=100 ``` 4. Waited for glagoli booted up. 5. Start k6 verify 6. Verify got 159 errors "object not found" ``` root@loader01:/home/service# ./k6 run -e TIME_LIMIT=1200 -e CLIENTS=150 -e S3_ENDPOINTS=10.78.70.118:8084,10.78.70.119:8084,10.78.70.120:8084,10.78.70.121:8084 -e REGISTRY_FILE=/var/data/tmp/s3_reg_260123-1000-1.json scenarios/verify.js /\ |‾‾| /‾‾/ /‾‾/ /\ / \ | |/ / / / / \/ \ | ( / ‾‾\ / \ | |\ \ | (‾) | / __________ \ |__| \__\ \_____/ .io execution: local script: scenarios/verify.js output: - scenarios: (100.00%) 1 scenario, 150 max VUs, 20m5s max duration (incl. graceful stop): * verify: 184426 iterations shared among 150 VUs (maxDuration: 20m0s, exec: obj_verify, gracefulStop: 5s) INFO[0052] Verify error on 58982: operation error S3: GetObject, https response error StatusCode: 404, RequestID: e1b91169-64d3-4480-9573-27c7b3427f52, HostID: e292b8be-6506-4972-966e-ad76b7252759, NoSuchKey: . Object will be re-tried source=console . . . . INFO[0172] Verify error on 109949: operation error S3: GetObject, https response error StatusCode: 404, RequestID: d9e7743e-fe83-4d61-b09a-26255fe93980, HostID: e292b8be-6506-4972-966e-ad76b7252759, NoSuchKey: . Object will be re-tried source=console running (02m45.0s), 000/150 VUs, 110365 complete and 150 interrupted iterations verify ✗ [=====================>----------------] 150 VUs 02m40.0s/20m0s 110365/184426 shared iters █ setup data_received........: 0 B 0 B/s data_sent............: 0 B 0 B/s invalid_obj..........: 159 0.963816/s iteration_duration...: avg=217.3ms min=7.05ms med=99.52ms max=17.02s p(90)=334.62ms p(95)=547.58ms iterations...........: 110365 669.003515/s skipped_obj..........: 0 0/s verified_obj.........: 110206 668.039699/s vus..................: 150 min=0 max=150 vus_max..............: 150 min=0 max=150 ``` 7. After ~20min objects successfully got Example object with errors: ``` Jan 26 12:58:11 vedi neofs-s3-gw[8995]: 2023-01-26T12:58:11.203Z info api/router.go:162 call method {"status": 200, "host": "10.78.70.120:8084", "request_id": "af7eee58-d2e7-42d2-8a99-8f3dd8484d4a", "method": "PutObject", "bucket": "171b9127-4f9d-457e-b826-684daaa485a4", "object": "557c9631-711e-4030-8a1d-3f90104e7a77", "description": "OK"} Jan 26 13:09:05 glagoli neofs-s3-gw[9217]: 2023-01-26T13:09:05.432Z error handler/util.go:29 call method {"status": 404, "request_id": "f36f785d-c011-4da4-8c2d-b469317991ef", "method": "GetObject", "bucket": "171b9127-4f9d-457e-b826-684daaa485a4", "object": "557c9631-711e-4030-8a1d-3f90104e7a77", "description": "could not find object", "error": "NoSuchKey: 404 => The specified key does not exist."} Jan 26 13:09:05 glagoli neofs-s3-gw[9217]: 2023-01-26T13:09:05.437Z error handler/util.go:29 call method {"status": 404, "request_id": "3ba5a646-ff29-423a-9024-a772eab5ce88", "method": "GetObject", "bucket": "171b9127-4f9d-457e-b826-684daaa485a4", "object": "557c9631-711e-4030-8a1d-3f90104e7a77", "description": "could not find object", "error": "NoSuchKey: 404 => The specified key does not exist."} Jan 26 13:09:05 glagoli neofs-s3-gw[9217]: 2023-01-26T13:09:05.443Z error handler/util.go:29 call method {"status": 404, "request_id": "d9e7743e-fe83-4d61-b09a-26255fe93980", "method": "GetObject", "bucket": "171b9127-4f9d-457e-b826-684daaa485a4", "object": "557c9631-711e-4030-8a1d-3f90104e7a77", "description": "could not find object", "error": "NoSuchKey: 404 => The specified key does not exist."} Jan 26 13:28:08 glagoli neofs-s3-gw[9217]: 2023-01-26T13:28:08.404Z info api/router.go:162 call method {"status": 200, "host": "10.78.70.121:8084", "request_id": "fa4b75c0-ccd7-4c51-a9cf-e00200d0352d", "method": "HeadObject", "bucket": "171b9127-4f9d-457e-b826-684daaa485a4", "object": "557c9631-711e-4030-8a1d-3f90104e7a77", "description": "OK"} Jan 26 13:29:58 glagoli neofs-s3-gw[9217]: 2023-01-26T13:29:58.435Z info api/router.go:162 call method {"status": 200, "host": "10.78.70.121:8084", "request_id": "716cfe3c-0eaf-499c-a326-5ee4bd4a2162", "method": "GetObject", "bucket": "171b9127-4f9d-457e-b826-684daaa485a4", "object": "557c9631-711e-4030-8a1d-3f90104e7a77", "description": "OK"} ``` ## Logs [glagoli.log.tar.gz](https://github.com/nspcc-dev/neofs-node/files/10510111/glagoli.log.tar.gz) [vedi.log.tar.gz](https://github.com/nspcc-dev/neofs-node/files/10510112/vedi.log.tar.gz) ## Versions ``` NeoFS Storage node Version: v0.35.0-21-geda6fba3-dirty GoVersion: go1.18.4 ``` ## Your Environment Server setup and configuration: HW, 4 servers, 4 SN, 4 http qw, 4 s3 gw Operating System and version (uname -a): linux vedi 5.10.0-16-amd64 https://github.com/nspcc-dev/neofs-node/issues/1 SMP Debian 5.10.127-1 (2022-06-30) x86_64 GNU/Linux
sami 2025-12-28 17:21:18 +00:00
  • closed this issue
  • added the
    bug
    U0
    labels
Author
Owner

@anikeev-yadro commented on GitHub (Jan 27, 2023):

Additional note:
after failover nodes have different epohs:

root@glagoli:~# neofs-cli netmap epoch -r 10.78.70.118:8080 -g
137
root@glagoli:~# neofs-cli netmap epoch -r 10.78.70.119:8080 -g
137
root@glagoli:~# neofs-cli netmap epoch -r 10.78.70.120:8080 -g
137
root@glagoli:~# neofs-cli netmap epoch -r 10.78.70.121:8080 -g
59
@anikeev-yadro commented on GitHub (Jan 27, 2023): Additional note: after failover nodes have different epohs: ``` root@glagoli:~# neofs-cli netmap epoch -r 10.78.70.118:8080 -g 137 root@glagoli:~# neofs-cli netmap epoch -r 10.78.70.119:8080 -g 137 root@glagoli:~# neofs-cli netmap epoch -r 10.78.70.120:8080 -g 137 root@glagoli:~# neofs-cli netmap epoch -r 10.78.70.121:8080 -g 59 ```
Author
Owner

@anikeev-yadro commented on GitHub (Jan 27, 2023):

After I aligned epoch on glagoli use following command:

systemctl restart neofs-ir; systemctl restart neofs-storage; systemctl restart neo-go

objects can be get from glagoli.

@anikeev-yadro commented on GitHub (Jan 27, 2023): After I aligned epoch on glagoli use following command: ``` systemctl restart neofs-ir; systemctl restart neofs-storage; systemctl restart neo-go ``` objects can be get from glagoli.
Author
Owner

@fyrchik commented on GitHub (Jan 27, 2023):

I believe mentioned S3 error is from here nspcc-dev/neofs-s3-gw@50d85dc7ed/api/layer/object.go (L315)

So the error is from the tree service and looks reasonable: tree service is eventually consistent, so we need some time to sync changes which we lost (because of shutdown or other reasons).

@fyrchik commented on GitHub (Jan 27, 2023): I believe mentioned S3 error is from here https://github.com/nspcc-dev/neofs-s3-gw/blob/50d85dc7edabe6a753c346c388bf18bf9134cd90/api/layer/object.go#L315 So the error is from the tree service and looks reasonable: tree service is eventually consistent, so we need some time to sync changes which we lost (because of shutdown or other reasons).
Author
Owner

@anikeev-yadro commented on GitHub (Feb 20, 2023):

works as designed

@anikeev-yadro commented on GitHub (Feb 20, 2023): works as designed
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
nspcc-dev/neofs-node#959
No description provided.