Can't put object after removing storage node from netmap and then restoring it #75

Closed
opened 2025-12-28 17:36:02 +00:00 by sami · 10 comments
Owner

Originally created by @vdomnich-yadro on GitHub (Aug 29, 2022).

The issue reproduces on devenv with 4 storage nodes and 1 http gateway.

Steps to Reproduce (for bugs)

NOTE1: In brackets I provide timestamp from the log to match steps with the test run.
NOTE2: I am using specific node names in the steps, but the issue reproduces if other nodes are chosen.
Referenced logs of the steps:
log1
log2

Steps:

  1. Move storage node s04 to offline state using neofs-cli control set-status (log1 timestamp 06:41:40).
  2. Tick epoch and make sure that the node is not present in the netmap (log1 timestamp 06:41:42-06:41:46)
  3. Stop storage node's Docker container and wait until it stops (log1 timestamp 06:41:47)
  4. Create neofs container with placement policy "3 replicas, 3 storage nodes" (log1 timestamp 06:41:48)
  5. Put object in the container and make sure it was replicated to all 3 alive nodes (log1 timestamp 06:41:50-06:41:52)
  6. Start Docker container of storage node s04 and wait until it goes into running state (log1 timestamp 06:41:53)
  7. Change s04's network status to online, tick epoch and make sure that s04 is present in the netmap (log1 timestamp 06:41:54-06:42:01)
  8. Move storage node s03 to offline state, tick epoch and ensure s03 is not present in the netmap (log1 timestamp 06:42:02-06:42:08)
  9. Wait for object (from p.5) to replicate to s04 (log1 timestamp 06:42:08-06:42:10).
  10. Return storage node s03 to online state, tick epoch and ensure s03 is not present in the netmap (log1 timestamp 06:42:10-06:42:17)
  11. By now all nodes are in the running state, all are ONLINE, READY and should be operating normally. So, we proceed with the other test set:
  12. Run test suite http_gate
  13. One or few tests will fail while putting object via http gate (log2 timestamp 06:42:43) Client-side error message:
Failed to get object via HTTP gate:
  request: /upload/9Ki5V6LGZqzSCiMGdq8y896jqGpfUjpK7jX2PNjpiDnp,
  response: could not store file in neofs: init writing on API client: client failure: missing ID field in the response

Expected Behavior

Object PUT should succeed.

Current Behavior

Some random tests (1-3 tests per run) are failing with Go SDK error "missing ID field in the response"

However, if we do not do any manipulations with taking storage nodes down and then restoring them (steps 1-12), the http gateway tests are passing.

In any case client-side logging of neofs-go-sdk is not sufficient to figure out more info about what is going on, so I would like us to extend the logging there so that we can at least see what the response from the storage node was.

Regression

Might be a regression after refactoring of object PUT in neofs-go-sdk, but not sure.

Your Environment

  • Version used:
neo-go 0.99.2-2-g536303ef
neofs-cli v0.31.0-51-gc7c1c257
neofs-authmate v0.23.0-49-g163038b
neofs-node build from commit c7c1c257e1b0da0eafff9943f3297581817778f1
  • Server setup and configuration: devenv with 4 storage nodes and 1 http gateway
Originally created by @vdomnich-yadro on GitHub (Aug 29, 2022). The issue reproduces on devenv with 4 storage nodes and 1 http gateway. ## Steps to Reproduce (for bugs) NOTE1: In brackets I provide timestamp from the log to match steps with the test run. NOTE2: I am using specific node names in the steps, but the issue reproduces if other nodes are chosen. Referenced logs of the steps: [log1](https://github.com/nspcc-dev/neofs-http-gw/files/9442533/log1.txt) [log2](https://github.com/nspcc-dev/neofs-http-gw/files/9442534/log2.txt) Steps: 1. Move storage node s04 to offline state using `neofs-cli control set-status` (`log1 timestamp 06:41:40`). 2. Tick epoch and make sure that the node is not present in the netmap (`log1 timestamp 06:41:42-06:41:46`) 3. Stop storage node's Docker container and wait until it stops (`log1 timestamp 06:41:47`) 4. Create neofs container with placement policy "3 replicas, 3 storage nodes" (`log1 timestamp 06:41:48`) 5. Put object in the container and make sure it was replicated to all 3 alive nodes (`log1 timestamp 06:41:50-06:41:52`) 6. Start Docker container of storage node s04 and wait until it goes into running state (`log1 timestamp 06:41:53`) 7. Change s04's network status to online, tick epoch and make sure that s04 is present in the netmap (`log1 timestamp 06:41:54-06:42:01`) 9. Move storage node s03 to offline state, tick epoch and ensure s03 is not present in the netmap (`log1 timestamp 06:42:02-06:42:08`) 11. Wait for object (from p.5) to replicate to s04 (`log1 timestamp 06:42:08-06:42:10`). 12. Return storage node s03 to online state, tick epoch and ensure s03 is not present in the netmap (`log1 timestamp 06:42:10-06:42:17`) 13. By now all nodes are in the running state, all are ONLINE, READY and should be operating normally. So, we proceed with the other test set: 14. Run [test suite http_gate](https://github.com/nspcc-dev/neofs-testcases/blob/develop/pytest_tests/testsuites/services/test_http_gate.py) 15. One or few tests will fail while putting object via http gate (`log2 timestamp 06:42:43`) Client-side error message: ``` Failed to get object via HTTP gate: request: /upload/9Ki5V6LGZqzSCiMGdq8y896jqGpfUjpK7jX2PNjpiDnp, response: could not store file in neofs: init writing on API client: client failure: missing ID field in the response ``` ## Expected Behavior Object PUT should succeed. ## Current Behavior Some random tests (1-3 tests per run) are failing with Go SDK error "missing ID field in the response" However, if we do not do any manipulations with taking storage nodes down and then restoring them (steps 1-12), the http gateway tests are passing. In any case client-side logging of neofs-go-sdk is not sufficient to figure out more info about what is going on, so I would like us to extend the logging there so that we can at least see what the response from the storage node was. ## Regression Might be a regression after refactoring of object PUT in neofs-go-sdk, but not sure. ## Your Environment * Version used: ``` neo-go 0.99.2-2-g536303ef neofs-cli v0.31.0-51-gc7c1c257 neofs-authmate v0.23.0-49-g163038b neofs-node build from commit c7c1c257e1b0da0eafff9943f3297581817778f1 ``` * Server setup and configuration: devenv with 4 storage nodes and 1 http gateway
sami 2025-12-28 17:36:02 +00:00
  • closed this issue
  • added the
    bug
    U3
    labels
Author
Owner

@vdomnich-yadro commented on GitHub (Aug 29, 2022):

Steps logs:
log1.txt
log2.txt

HTTP gateway log:
http_log.txt

Storage node logs:
s01_log.txt
s02_log.txt
s03_log.txt
s04_log.txt

@vdomnich-yadro commented on GitHub (Aug 29, 2022): Steps logs: [log1.txt](https://github.com/nspcc-dev/neofs-http-gw/files/9442533/log1.txt) [log2.txt](https://github.com/nspcc-dev/neofs-http-gw/files/9442534/log2.txt) HTTP gateway log: [http_log.txt](https://github.com/nspcc-dev/neofs-http-gw/files/9442532/http_log.txt) Storage node logs: [s01_log.txt](https://github.com/nspcc-dev/neofs-http-gw/files/9442535/s01_log.txt) [s02_log.txt](https://github.com/nspcc-dev/neofs-http-gw/files/9442537/s02_log.txt) [s03_log.txt](https://github.com/nspcc-dev/neofs-http-gw/files/9442538/s03_log.txt) [s04_log.txt](https://github.com/nspcc-dev/neofs-http-gw/files/9442539/s04_log.txt)
Author
Owner

@KirillovDenis commented on GitHub (Aug 29, 2022):

Please provide version of neofs-http-gw

@KirillovDenis commented on GitHub (Aug 29, 2022): Please provide version of `neofs-http-gw`
Author
Owner

@vdomnich-yadro commented on GitHub (Aug 29, 2022):

@KirillovDenis sorry for missing it in the first place, http gateway was built from: e0ab9294103bab4a6cf96c5500a5430069d0fce7

@vdomnich-yadro commented on GitHub (Aug 29, 2022): @KirillovDenis sorry for missing it in the first place, http gateway was built from: `e0ab9294103bab4a6cf96c5500a5430069d0fce7`
Author
Owner

@vdomnich-yadro commented on GitHub (Aug 29, 2022):

And the strange thing is that logs of storage nodes do not reflect the failing call at all. Not sure whether the request from http gateway dies before actually hitting any of the storage nodes...

@vdomnich-yadro commented on GitHub (Aug 29, 2022): And the strange thing is that logs of storage nodes do not reflect the failing call at all. Not sure whether the request from http gateway dies before actually hitting any of the storage nodes...
Author
Owner

@KirillovDenis commented on GitHub (Aug 29, 2022):

Could you reproduce this issue using http-gw from this branch? We will get more clear error (instead of missing ID field in the response )

@KirillovDenis commented on GitHub (Aug 29, 2022): Could you reproduce this issue using http-gw from [this branch](https://github.com/KirillovDenis/neofs-http-gw/tree/update_sdk)? We will get more clear error (instead of `missing ID field in the response` )
Author
Owner

@vdomnich-yadro commented on GitHub (Aug 29, 2022):

@KirillovDenis I tried multiple times with the branch that you've suggested. And the only error that was reported with it was "session token not found". Please, find attached the logs

I saw "session token not found" a couple weeks before and discussed it with @realloc. Per our discussion it was a strange error because the container that we are using in tests is public read-write.

@vdomnich-yadro commented on GitHub (Aug 29, 2022): @KirillovDenis I tried multiple times with the branch that you've suggested. And the only error that was reported with it was "session token not found". Please, find attached the [logs](https://github.com/nspcc-dev/neofs-http-gw/files/9446531/session-token-not-found.zip) I saw "session token not found" a couple weeks before and discussed it with @realloc. Per our discussion it was a strange error because the container that we are using in tests is public read-write.
Author
Owner

@vdomnich-yadro commented on GitHub (Aug 30, 2022):

A bit more about steps: if I run the same steps without stopping and then starting Docker container (steps #3 and #6), then the issue does not occur.

@vdomnich-yadro commented on GitHub (Aug 30, 2022): A bit more about steps: if I run the same steps without stopping and then starting Docker container (steps #3 and #6), then the issue does not occur.
Author
Owner

@KirillovDenis commented on GitHub (Aug 30, 2022):

It seems persisting session in nodes is disabled. Try to enable it by using this variable

@KirillovDenis commented on GitHub (Aug 30, 2022): It seems persisting session in nodes is disabled. Try to enable it by using [this variable](https://github.com/nspcc-dev/neofs-node/blob/c7c1c257e1b0da0eafff9943f3297581817778f1/config/example/node.env#L20)
Author
Owner

@vdomnich-yadro commented on GitHub (Aug 30, 2022):

@KirillovDenis I changed the setting and made several runs with it. The session token not found and missing ID field in the response errors did not occur when I am putting an object.

Should we consider this as resolved?

If yes, then I would ask @alexvanin to do corresponding change in devenv branch for nightly builds.

@vdomnich-yadro commented on GitHub (Aug 30, 2022): @KirillovDenis I changed the setting and made several runs with it. The `session token not found` and `missing ID field in the response` errors did not occur when I am putting an object. Should we consider this as resolved? If yes, then I would ask @alexvanin to do corresponding change in devenv branch for nightly builds.
Author
Owner

@KirillovDenis commented on GitHub (Aug 30, 2022):

Should we consider this as resolved?

I think so.

@KirillovDenis commented on GitHub (Aug 30, 2022): > Should we consider this as resolved? I think so.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
nspcc-dev/neofs-http-gw#75
No description provided.