evacuate is not working with big objects #808

Closed
opened 2025-12-28 17:20:46 +00:00 by sami · 4 comments
Owner

Originally created by @vkarak1 on GitHub (Oct 17, 2022).

Originally assigned to: @cthulhu-rider on GitHub.

I have created object1 with size of 500M and put on node1, then i set all shards on node4 to RO state and issued evacuate command against one shard and object count on each node hasn't been changed at all.

Expected Behavior

Count of objects on evacuated shard have to be migrated to node1/node2/node3.

Current Behavior

The evacuate command reports that "Shard has successfully been evacuated." but amount of objects wasn't changed at all.

Steps to Reproduce (for bugs)

  1. Create container:
    neofs-cli --rpc-endpoint node1.neofs:8080 --wallet wallet.json container create --name test --policy "REP 1 " --basic-acl public-read-write --await
  2. Created object1 with size of 500M:
    dd if=/dev/urandom of=object1 bs=1M count=500
  3. Put object1:
    neofs-cli --rpc-endpoint node1.neofs:8080 -w wallet.json object put --file object1 --cid AAPiVrUsdbwJ79KSpJukLSc4AXVM77STF5znhG7tWCYf --no-progress
  4. After some minutes, issued the following command to check amount of objects against each node in config:
    curl -s localhost:6672 | rg neofs_node_object_counter | sed 1,2d
    Please find the result below:

node1:

neofs_node_object_counter{shard="28mtSFx2YYCTjaWyyCKA9b",type="logic"} 6
neofs_node_object_counter{shard="28mtSFx2YYCTjaWyyCKA9b",type="phy"} 6
neofs_node_object_counter{shard="4FuK7KYHCAVBsZ1FVHoCnj",type="logic"} 3
neofs_node_object_counter{shard="4FuK7KYHCAVBsZ1FVHoCnj",type="phy"} 3
neofs_node_object_counter{shard="6wS5Lq2vWCTCVbHYE2z4Nz",type="logic"} 4
neofs_node_object_counter{shard="6wS5Lq2vWCTCVbHYE2z4Nz",type="phy"} 4
neofs_node_object_counter{shard="PkfskRUC2A1b5YyrQXkP74",type="logic"} 1
neofs_node_object_counter{shard="PkfskRUC2A1b5YyrQXkP74",type="phy"} 1

node2:

neofs_node_object_counter{shard="9ckbELdMobJJg9AFXPpeJW",type="logic"} 1
neofs_node_object_counter{shard="9ckbELdMobJJg9AFXPpeJW",type="phy"} 1
neofs_node_object_counter{shard="DPzCBuwKEacPu6vxTJNrxi",type="logic"} 7
neofs_node_object_counter{shard="DPzCBuwKEacPu6vxTJNrxi",type="phy"} 7
neofs_node_object_counter{shard="LGSPLaaFERjKiowZ4u4Gqz",type="logic"} 2
neofs_node_object_counter{shard="LGSPLaaFERjKiowZ4u4Gqz",type="phy"} 2
neofs_node_object_counter{shard="WkkpuWSC55sD347W7eXFDk",type="logic"} 1
neofs_node_object_counter{shard="WkkpuWSC55sD347W7eXFDk",type="phy"} 1

node3: empty

node4:

neofs_node_object_counter{shard="8du5xvUJ3Kt7CmBstSVdSi",type="logic"} 2
neofs_node_object_counter{shard="8du5xvUJ3Kt7CmBstSVdSi",type="phy"} 2
neofs_node_object_counter{shard="8wKjxR3DkukFXgMt5TCNeB",type="logic"} 4
neofs_node_object_counter{shard="8wKjxR3DkukFXgMt5TCNeB",type="phy"} 4
neofs_node_object_counter{shard="HA4ooUXzAYBtGCZrMmt1vJ",type="logic"} 4
neofs_node_object_counter{shard="HA4ooUXzAYBtGCZrMmt1vJ",type="phy"} 4
neofs_node_object_counter{shard="TgL3kqeWRgXRECQTk3QEXC",type="logic"} 6 
neofs_node_object_counter{shard="TgL3kqeWRgXRECQTk3QEXC",type="phy"} 6 
  1. Then moved all shards on Node4 to RO state:
    neofs-cli control shards set-mode --mode read-only --endpoint localhost:8091 -w /etc/neofs/storage/wallet.json --all
    output:
Shard 8du5xvUJ3Kt7CmBstSVdSi:
Mode: read-only
Error count: 0
Shard 8wKjxR3DkukFXgMt5TCNeB:
Mode: read-only
Error count: 0
Shard TgL3kqeWRgXRECQTk3QEXC:
Mode: read-only
Error count: 0
Shard HA4ooUXzAYBtGCZrMmt1vJ:
Mode: read-only
Error count: 0
  1. Issued evacuate command to node4:
root@glagoli:/etc/neofs/storage/tatlin-object-sber-tfstate/vkarakozov# neofs-cli  --wallet /etc/neofs/storage/wallet.json --endpoint localhost:8091 control shards evacuate --id HA4ooUXzAYBtGCZrMmt1vJ
Enter password >
Objects moved: 4
Shard has successfully been evacuated.
.
  1. Then I expected to see upcomming objects on node1/node2/node3, but amount of objects haven't been changed.

Please find netmap snapshot result below:

root@glagoli:/etc/neofs/storage/tatlin-object-sber-tfstate/vkarakozov# neofs-cli netmap snapshot -g -r node1.neofs:8080
Epoch: 13
Node 1: 021f2bf1b5102d3e946fa1590d7f2c1855ee74acabea96c0dcbcc6ead064657058 ONLINE /dns4/node4.neofs/tcp/8080
        Continent: Europe
        Country: Finland
        CountryCode: FI
        Deployed: YACZROKH
        Location: Helsinki (Helsingfors)
        Node: node4
        Price: 10
        SubDiv: Uusimaa
        SubDivCode: 18
        UN-LOCODE: FI HEL
Node 2: 02f175c8b5435709c01d21e553440930e085c62ca130b557379d3981245d54b8a5 ONLINE /dns4/node2.neofs/tcp/8080
        Continent: Europe
        Country: Russia
        CountryCode: RU
        Deployed: YACZROKH
        Location: Saint Petersburg (ex Leningrad)
        Node: node2
        Price: 10
        SubDiv: Sankt-Peterburg
        SubDivCode: SPE
        UN-LOCODE: RU LED
Node 3: 033a267f33db9824adbe9fe06a41080495ce129c7133734caca06a9120c1a0ed9f ONLINE /dns4/node1.neofs/tcp/8080
        Continent: Europe
        Country: Russia
        CountryCode: RU
        Deployed: YACZROKH
        Location: Moskva
        Node: node1
        Price: 10
        SubDiv: Moskva
        SubDivCode: MOW
        UN-LOCODE: RU MOW
Node 4: 039de30ee9429446bce8689ca72111c21faa77747f5fc2181e8c89456d29bd990a ONLINE /dns4/node3.neofs/tcp/8080
        Continent: Europe
        Country: Sweden
        CountryCode: SE
        Deployed: YACZROKH
        Location: Stockholm
        Node: node3
        Price: 10
        SubDiv: Stockholms l�n
        SubDivCode: AB
        UN-LOCODE: SE STO

Logs

Your Environment

NeoFS Storage node
Version: v0.32.0-125-gbcf3df35
GoVersion: go1.18.4

Linux glagoli 5.10.0-18-amd64 #1 SMP Debian 5.10.140-1 (2022-09-02) x86_64 GNU/Linux

Server setup and configuration:
cloud, 4 VMs, 4 SN, 4 http qw, 4 s3 gw

Originally created by @vkarak1 on GitHub (Oct 17, 2022). Originally assigned to: @cthulhu-rider on GitHub. <!-- Provide a general summary of the issue in the Title above --> I have created object1 with size of 500M and put on node1, then i set all shards on node4 to RO state and issued evacuate command against one shard and object count on each node hasn't been changed at all. ## Expected Behavior <!-- If you're describing a bug, tell us what should happen If you're suggesting a change/improvement, tell us how it should work --> Count of objects on evacuated shard have to be migrated to node1/node2/node3. ## Current Behavior <!-- If describing a bug, tell us what happens instead of the expected behavior If suggesting a change/improvement, explain the difference from current behavior --> The evacuate command reports that "Shard has successfully been evacuated." but amount of objects wasn't changed at all. ## Steps to Reproduce (for bugs) <!-- Provide a link to a live example, or an unambiguous set of steps to reproduce this bug. --> 1. Create container: `neofs-cli --rpc-endpoint node1.neofs:8080 --wallet wallet.json container create --name test --policy "REP 1 " --basic-acl public-read-write --await` 3. Created object1 with size of 500M: ` dd if=/dev/urandom of=object1 bs=1M count=500` 5. Put object1: `neofs-cli --rpc-endpoint node1.neofs:8080 -w wallet.json object put --file object1 --cid AAPiVrUsdbwJ79KSpJukLSc4AXVM77STF5znhG7tWCYf --no-progress` 7. After some minutes, issued the following command to check amount of objects against each node in config: `curl -s localhost:6672 | rg neofs_node_object_counter | sed 1,2d` Please find the result below: node1: ``` neofs_node_object_counter{shard="28mtSFx2YYCTjaWyyCKA9b",type="logic"} 6 neofs_node_object_counter{shard="28mtSFx2YYCTjaWyyCKA9b",type="phy"} 6 neofs_node_object_counter{shard="4FuK7KYHCAVBsZ1FVHoCnj",type="logic"} 3 neofs_node_object_counter{shard="4FuK7KYHCAVBsZ1FVHoCnj",type="phy"} 3 neofs_node_object_counter{shard="6wS5Lq2vWCTCVbHYE2z4Nz",type="logic"} 4 neofs_node_object_counter{shard="6wS5Lq2vWCTCVbHYE2z4Nz",type="phy"} 4 neofs_node_object_counter{shard="PkfskRUC2A1b5YyrQXkP74",type="logic"} 1 neofs_node_object_counter{shard="PkfskRUC2A1b5YyrQXkP74",type="phy"} 1 ``` node2: ``` neofs_node_object_counter{shard="9ckbELdMobJJg9AFXPpeJW",type="logic"} 1 neofs_node_object_counter{shard="9ckbELdMobJJg9AFXPpeJW",type="phy"} 1 neofs_node_object_counter{shard="DPzCBuwKEacPu6vxTJNrxi",type="logic"} 7 neofs_node_object_counter{shard="DPzCBuwKEacPu6vxTJNrxi",type="phy"} 7 neofs_node_object_counter{shard="LGSPLaaFERjKiowZ4u4Gqz",type="logic"} 2 neofs_node_object_counter{shard="LGSPLaaFERjKiowZ4u4Gqz",type="phy"} 2 neofs_node_object_counter{shard="WkkpuWSC55sD347W7eXFDk",type="logic"} 1 neofs_node_object_counter{shard="WkkpuWSC55sD347W7eXFDk",type="phy"} 1 ``` node3: empty node4: ``` neofs_node_object_counter{shard="8du5xvUJ3Kt7CmBstSVdSi",type="logic"} 2 neofs_node_object_counter{shard="8du5xvUJ3Kt7CmBstSVdSi",type="phy"} 2 neofs_node_object_counter{shard="8wKjxR3DkukFXgMt5TCNeB",type="logic"} 4 neofs_node_object_counter{shard="8wKjxR3DkukFXgMt5TCNeB",type="phy"} 4 neofs_node_object_counter{shard="HA4ooUXzAYBtGCZrMmt1vJ",type="logic"} 4 neofs_node_object_counter{shard="HA4ooUXzAYBtGCZrMmt1vJ",type="phy"} 4 neofs_node_object_counter{shard="TgL3kqeWRgXRECQTk3QEXC",type="logic"} 6 neofs_node_object_counter{shard="TgL3kqeWRgXRECQTk3QEXC",type="phy"} 6 ``` 8. Then moved all shards on Node4 to RO state: `neofs-cli control shards set-mode --mode read-only --endpoint localhost:8091 -w /etc/neofs/storage/wallet.json --all` output: ``` Shard 8du5xvUJ3Kt7CmBstSVdSi: Mode: read-only Error count: 0 Shard 8wKjxR3DkukFXgMt5TCNeB: Mode: read-only Error count: 0 Shard TgL3kqeWRgXRECQTk3QEXC: Mode: read-only Error count: 0 Shard HA4ooUXzAYBtGCZrMmt1vJ: Mode: read-only Error count: 0 ``` 9. Issued evacuate command to node4: ``` root@glagoli:/etc/neofs/storage/tatlin-object-sber-tfstate/vkarakozov# neofs-cli --wallet /etc/neofs/storage/wallet.json --endpoint localhost:8091 control shards evacuate --id HA4ooUXzAYBtGCZrMmt1vJ Enter password > Objects moved: 4 Shard has successfully been evacuated. . ``` 10. Then I expected to see upcomming objects on node1/node2/node3, but amount of objects haven't been changed. Please find netmap snapshot result below: ``` root@glagoli:/etc/neofs/storage/tatlin-object-sber-tfstate/vkarakozov# neofs-cli netmap snapshot -g -r node1.neofs:8080 Epoch: 13 Node 1: 021f2bf1b5102d3e946fa1590d7f2c1855ee74acabea96c0dcbcc6ead064657058 ONLINE /dns4/node4.neofs/tcp/8080 Continent: Europe Country: Finland CountryCode: FI Deployed: YACZROKH Location: Helsinki (Helsingfors) Node: node4 Price: 10 SubDiv: Uusimaa SubDivCode: 18 UN-LOCODE: FI HEL Node 2: 02f175c8b5435709c01d21e553440930e085c62ca130b557379d3981245d54b8a5 ONLINE /dns4/node2.neofs/tcp/8080 Continent: Europe Country: Russia CountryCode: RU Deployed: YACZROKH Location: Saint Petersburg (ex Leningrad) Node: node2 Price: 10 SubDiv: Sankt-Peterburg SubDivCode: SPE UN-LOCODE: RU LED Node 3: 033a267f33db9824adbe9fe06a41080495ce129c7133734caca06a9120c1a0ed9f ONLINE /dns4/node1.neofs/tcp/8080 Continent: Europe Country: Russia CountryCode: RU Deployed: YACZROKH Location: Moskva Node: node1 Price: 10 SubDiv: Moskva SubDivCode: MOW UN-LOCODE: RU MOW Node 4: 039de30ee9429446bce8689ca72111c21faa77747f5fc2181e8c89456d29bd990a ONLINE /dns4/node3.neofs/tcp/8080 Continent: Europe Country: Sweden CountryCode: SE Deployed: YACZROKH Location: Stockholm Node: node3 Price: 10 SubDiv: Stockholms l�n SubDivCode: AB UN-LOCODE: SE STO ``` [Logs](https://github.com/nspcc-dev/neofs-node/files/9801614/evacuate_failure.17.10.zip) ## ## Your Environment <!-- Include as many relevant details about the environment you experienced the bug in --> NeoFS Storage node Version: v0.32.0-125-gbcf3df35 GoVersion: go1.18.4 Linux glagoli 5.10.0-18-amd64 #1 SMP Debian 5.10.140-1 (2022-09-02) x86_64 GNU/Linux Server setup and configuration: cloud, 4 VMs, 4 SN, 4 http qw, 4 s3 gw
sami 2025-12-28 17:20:46 +00:00
  • closed this issue
  • added the
    bug
    U3
    labels
Author
Owner

@cthulhu-rider commented on GitHub (Oct 25, 2022):

Seems like evacuation doesn't work for "small" objects too. Here what I've seen in logs on evacutation job:

2022-10-25T08:40:05.043Z	debug	blobovnicza/control.go:17	creating directory for BoltDB	{"component": "Blobovnicza", "path": "/storage/blobovnicza0/1/2/0", "ro": true}
2022-10-25T08:40:05.043Z	debug	blobovnicza/control.go:31	opening BoltDB	{"component": "Blobovnicza", "path": "/storage/blobovnicza0/1/2/0", "permissions": "-rwxrwxrwx"}
2022-10-25T08:40:05.043Z	warn	engine/put.go:133	could not put object to shard	{"shard_id": "5dB5iRKDNudjbaVHPgFbDr", "error": "shard is in read-only mode"}
2022-10-25T08:40:05.044Z	debug	neofs-node/object.go:453	writing local reputation values	{"epoch": 4, "satisfactory": true}
2022-10-25T08:40:05.048Z	debug	replicator/process.go:66	object successfully replicated	{"component": "Object Replicator", "node": "03ff65b6ae79134a4dce9d0d39d3851e9bab4ee97abf86e81e1c5bbc50cd2826ae", "object": "2EYrreVYGNPA6c4LXQPUpkfNTGPnQBJttpVFCi9S4HvE/8xm3vg7xSf5FLyia4Dr3rCzhawHS6VFQiPShCVGCU87K"}
2022-10-25T08:40:05.048Z	debug	replicator/process.go:23	finish work	{"component": "Object Replicator", "amount of unfinished replicas": 0}

First strange thing that node has opened some blobovnicza. Despite this, node initialized replication routine as expected. But the object successfully replicated message contains node which already contains the replica. So we have a false-positive replication here.

@cthulhu-rider commented on GitHub (Oct 25, 2022): Seems like evacuation doesn't work for "small" objects too. Here what I've seen in logs on evacutation job: ``` 2022-10-25T08:40:05.043Z debug blobovnicza/control.go:17 creating directory for BoltDB {"component": "Blobovnicza", "path": "/storage/blobovnicza0/1/2/0", "ro": true} 2022-10-25T08:40:05.043Z debug blobovnicza/control.go:31 opening BoltDB {"component": "Blobovnicza", "path": "/storage/blobovnicza0/1/2/0", "permissions": "-rwxrwxrwx"} 2022-10-25T08:40:05.043Z warn engine/put.go:133 could not put object to shard {"shard_id": "5dB5iRKDNudjbaVHPgFbDr", "error": "shard is in read-only mode"} 2022-10-25T08:40:05.044Z debug neofs-node/object.go:453 writing local reputation values {"epoch": 4, "satisfactory": true} 2022-10-25T08:40:05.048Z debug replicator/process.go:66 object successfully replicated {"component": "Object Replicator", "node": "03ff65b6ae79134a4dce9d0d39d3851e9bab4ee97abf86e81e1c5bbc50cd2826ae", "object": "2EYrreVYGNPA6c4LXQPUpkfNTGPnQBJttpVFCi9S4HvE/8xm3vg7xSf5FLyia4Dr3rCzhawHS6VFQiPShCVGCU87K"} 2022-10-25T08:40:05.048Z debug replicator/process.go:23 finish work {"component": "Object Replicator", "amount of unfinished replicas": 0} ``` First strange thing that node has opened some blobovnicza. Despite this, node initialized replication routine as expected. But the `object successfully replicated` message contains node which already contains the replica. So we have a false-positive replication here.
Author
Owner

@cthulhu-rider commented on GitHub (Oct 25, 2022):

We've discussed the behavior with @fyrchik and it is expected. Evacuation routine doesn't seek to evacuate data according to placement policy. Instead, it makes sure that object is available in the container (at least 1 replica is needed).

@vkarak1 I suggest to adjust the test to this behavior: we expect to see no more than 1 missing replica of the object, but at least 1 stored replica.

I'm gonna document the evacuation to be more clear, but this doesn't block the testing.

@cthulhu-rider commented on GitHub (Oct 25, 2022): We've discussed the behavior with @fyrchik and it is expected. Evacuation routine doesn't seek to evacuate data according to placement policy. Instead, it makes sure that object is available in the container (at least 1 replica is needed). @vkarak1 I suggest to adjust the test to this behavior: we expect to see no more than 1 missing replica of the object, but at least 1 stored replica. I'm gonna document the evacuation to be more clear, but this doesn't block the testing.
Author
Owner

@cthulhu-rider commented on GitHub (Oct 26, 2022):

I reproduced the problem...nah that's actually not a problem in current system design.

Node A evacuates the objects from the shard to the other container node B. After that node's B Policer checks if the object is stored in the container according to its policy. And there is: node A still respond with the object, so B decides that it holds redundant replica and throws it away.

@cthulhu-rider commented on GitHub (Oct 26, 2022): I reproduced the problem...nah that's actually not a problem in current system design. Node `A` evacuates the objects from the shard to the other container node `B`. After that node's `B` Policer checks if the object is stored in the container according to its policy. And there is: node `A` still respond with the object, so `B` decides that it holds redundant replica and throws it away.
Author
Owner

@vkarak1 commented on GitHub (Oct 26, 2022):

This is expected behavior with REP1, closing the issue.

@vkarak1 commented on GitHub (Oct 26, 2022): This is expected behavior with REP1, closing the issue.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
nspcc-dev/neofs-node#808
No description provided.