Add a possibility to add error injections to the code from system tests

sami commented

2025-12-28 17:22:22 +00:00

Owner

Originally created by @evgeniiz321 on GitHub (Sep 23, 2024).

For tests like these - https://github.com/nspcc-dev/neofs-testcases/issues/794 we need a way to put the system under the test to certain conditions that can't be easily and reliably reproduced under regular circumstances, but we can encounter them running the production code.

The most convenient thing from the test perspective would be the possibility to add error injections to some code places. E.g. if we want to get an unfinished object and want to validate how it is cleaned, it would be nice to inject an error during the object creation operation and got such an unfinished object.

Same feature could be also applied to verify more unusual scenarios and different states of the system. To ensure the robust recovery from other possible faulty situations.

I would expect it to be implemented either via env variables/config files before service starts, or, more conveniently via some internal cli commands - like ./neofs-cli inject-error ERROR_LOCATION ERROR_TYPE, where ERROR_LOCATION is a specific place in the code and ERROR_TYPE is either a panic, or a sleep with different number of seconds, or a bool value that somehow is used in the code. E.g.
./neofs-cli inject-error OBJECT_CREATION_POST_PARTS_CREATE RETURN_FALSE

Originally created by @evgeniiz321 on GitHub (Sep 23, 2024). For tests like these - https://github.com/nspcc-dev/neofs-testcases/issues/794 we need a way to put the system under the test to certain conditions that can't be easily and reliably reproduced under regular circumstances, but we can encounter them running the production code. The most convenient thing from the test perspective would be the possibility to add error injections to some code places. E.g. if we want to get an unfinished object and want to validate how it is cleaned, it would be nice to inject an error during the object creation operation and got such an unfinished object. Same feature could be also applied to verify more unusual scenarios and different states of the system. To ensure the robust recovery from other possible faulty situations. I would expect it to be implemented either via env variables/config files before service starts, or, more conveniently via some internal cli commands - like `./neofs-cli inject-error ERROR_LOCATION ERROR_TYPE`, where ERROR_LOCATION is a specific place in the code and ERROR_TYPE is either a panic, or a sleep with different number of seconds, or a bool value that somehow is used in the code. E.g. `./neofs-cli inject-error OBJECT_CREATION_POST_PARTS_CREATE RETURN_FALSE`

sami added the

labels

2025-12-28 17:22:22 +00:00

sami commented

2025-12-28 17:22:22 +00:00

Author

Owner

@roman-khimov commented on GitHub (Sep 24, 2024):

I don't think we can do this in a generic way, too many possible errors. Likely it'd also be some kind of control service feature since it's the node that should do this, not CLI. But for your specific use case we can make some kind of multipart upload via CLI which will push a part of the object, but not finish it.

@roman-khimov commented on GitHub (Sep 24, 2024): I don't think we can do this in a generic way, too many possible errors. Likely it'd also be some kind of control service feature since it's the node that should do this, not CLI. But for your specific use case we can make some kind of multipart upload via CLI which will push a part of the object, but not finish it.

sami commented

2025-12-28 17:22:23 +00:00

Author

Owner

@evgeniiz321 commented on GitHub (Sep 24, 2024):

We can start with one error location and one specific error for my case, that cause the system to create an unfinished object. And then extend these errors when it is needed, we don't need to cover every possible situation at once. Ideally, every service should implement this type of functionality. It doesn't matter how to invoke it, either via cli, or rest, or by specifying env variable. At least we will have a generic/code syntax to cause it.
Otherwise we always will be implementing hacks into current commands with a different syntax to implement the behavior specific for this command that actually no one needs except tests. This is also the way to do it, but as for me - it is not as straightforward as with error injections.
To sum up, I can live with a 'special' multipart upload, do I need a separate issue for it? Or it will be done here?

@evgeniiz321 commented on GitHub (Sep 24, 2024): We can start with one error location and one specific error for my case, that cause the system to create an unfinished object. And then extend these errors when it is needed, we don't need to cover every possible situation at once. Ideally, every service should implement this type of functionality. It doesn't matter how to invoke it, either via cli, or rest, or by specifying env variable. At least we will have a generic/code syntax to cause it. Otherwise we always will be implementing hacks into current commands with a different syntax to implement the behavior specific for this command that actually no one needs except tests. This is also the way to do it, but as for me - it is not as straightforward as with error injections. To sum up, I can live with a 'special' multipart upload, do I need a separate issue for it? Or it will be done here?

sami commented

2025-12-28 17:22:23 +00:00

Author

Owner

@roman-khimov commented on GitHub (Sep 24, 2024):

There are good things in error injection, don't get me wrong, I'm just not sure how we can implement it better. Error injection can really simplify testing some hardly reproducible things like disk errors or specific network interaction problems. At the same time there is some cost to it, it can get deep into the code (including hot paths of various kind) and control service will get this super-ability that could be abused.

@roman-khimov commented on GitHub (Sep 24, 2024): There are good things in error injection, don't get me wrong, I'm just not sure how we can implement it better. Error injection can really simplify testing some hardly reproducible things like disk errors or specific network interaction problems. At the same time there is some cost to it, it can get deep into the code (including hot paths of various kind) and control service will get this super-ability that could be abused.

sami commented

2025-12-28 17:22:23 +00:00

Author

Owner

@carpawell commented on GitHub (Sep 24, 2024):

@evgeniiz321, do you have some understanding of what you want as a new test framework? Is ./neofs-cli inject-error OBJECT_CREATION_POST_PARTS_CREATE RETURN_FALSE your generic suggestion for the whole neofs-cli command? Do you have any more thoughts about how it can be used or only incomplete object load for now?
I do not like the idea of extending original applications (neofs-cli, adm, node, etc) with things that are done only for tests (neofs-lens that inspects storages is OK, it is just a suitable useful debug tool that matches tests' needs). It is kinda intentional breaking of the thing that is made (or planned at least) to be as user-friendly as possible, without any strange functionality. Solution can be a separate internal thing written in Go that does any required thing for tests with any weird interface and be used only by automated python code. We can support it, not release and not attach it to anything except testcases repo.

But again, only if we know that we need it and there is no external request for it. For the only provided wish in this issue, we can just solve https://github.com/nspcc-dev/neofs-node/issues/952. It appeared years ago and tests were not the original requester. I can imagine that it may be helpful in general.

@carpawell commented on GitHub (Sep 24, 2024): @evgeniiz321, do you have some understanding of what you want as a new test framework? Is `./neofs-cli inject-error OBJECT_CREATION_POST_PARTS_CREATE RETURN_FALSE` your generic suggestion for the whole `neofs-cli` command? Do you have any more thoughts about how it can be used or only incomplete object load for now? I do not like the idea of extending original applications (neofs-cli, adm, node, etc) with things that are done _only for tests_ (neofs-lens that inspects storages is OK, it is just a suitable useful debug tool that matches tests' needs). It is kinda intentional breaking of the thing that is made (or planned at least) to be as user-friendly as possible, without any strange functionality. Solution can be a separate internal thing written in Go that does _any required thing for tests with any weird interface_ and be used only by automated python code. We can support it, not release and not attach it to anything except testcases repo. But again, only if we know that we need it and there is no external request for it. For the only provided wish in this issue, we can just solve https://github.com/nspcc-dev/neofs-node/issues/952. It appeared _years_ ago and tests were not the original requester. I can imagine that it may be helpful in general.

sami commented

2025-12-28 17:22:23 +00:00

Author

Owner

@roman-khimov commented on GitHub (Sep 24, 2024):

separate internal thing

The problem is, sometimes you need to break the node.

@roman-khimov commented on GitHub (Sep 24, 2024): > separate internal thing The problem is, sometimes you need to break the node.

sami commented

2025-12-28 17:22:23 +00:00

Author

Owner

@evgeniiz321 commented on GitHub (Sep 25, 2024):

@carpawell details of the implementation is on you, from tests it is ok to work with rest api, with a new cli tool (or old one), with env variables, with any other possible variations, you should pick the one that is easiest for you to implement. I'm not sure it can be done via a separate internal thing, but if it can be done - that is fine to me.
There is also a possibility to have special 'debug' builds with this error injection code, so it wouldn't appear on the production code at all.
Right now I would start with the incomplete object load, but we can easily extend it for failover testing - e.g. define set of dependencies for each service, and emulate errors during different states/phases of communication.

@evgeniiz321 commented on GitHub (Sep 25, 2024): @carpawell details of the implementation is on you, from tests it is ok to work with rest api, with a new cli tool (or old one), with env variables, with any other possible variations, you should pick the one that is easiest for you to implement. I'm not sure it can be done via a separate internal thing, but if it can be done - that is fine to me. There is also a possibility to have special 'debug' builds with this error injection code, so it wouldn't appear on the production code at all. Right now I would start with the incomplete object load, but we can easily extend it for failover testing - e.g. define set of dependencies for each service, and emulate errors during different states/phases of communication.

sami commented

2025-12-28 17:22:23 +00:00

Author

Owner

@cthulhu-rider commented on GitHub (Sep 25, 2024):

these are lightweight scripts to me, they may be written in Go, and not a part of the neofs-cli. I'd maintain them in neofs-testcases repo, and the best - https://github.com/nspcc-dev/neofs-testcases/issues/258

@cthulhu-rider commented on GitHub (Sep 25, 2024): these are lightweight scripts to me, they may be written in Go, and not a part of the neofs-cli. I'd maintain them in neofs-testcases repo, and the best - https://github.com/nspcc-dev/neofs-testcases/issues/258

sami commented

2025-12-28 17:22:23 +00:00

Author

Owner

@roman-khimov commented on GitHub (Sep 25, 2024):

nspcc-dev/neofs-testcases#258 won't help. And scripts won't help as well. If you need some weird behavior from node, you have to have some code in node to do that.

@roman-khimov commented on GitHub (Sep 25, 2024): nspcc-dev/neofs-testcases#258 won't help. And scripts won't help as well. If you need some weird behavior from node, you have to have some code in node to do that.

sami commented

2025-12-28 17:22:23 +00:00

Author

Owner

@cthulhu-rider commented on GitHub (Sep 25, 2024):

so its not about CLI overriding but the node. Veeery strange to have this in the "main" node, it's a custom node to me

@cthulhu-rider commented on GitHub (Sep 25, 2024): so its not about CLI overriding but the node. Veeery strange to have this in the "main" node, it's a custom node to me

sami commented

2025-12-28 17:22:23 +00:00

Author

Owner

@roman-khimov commented on GitHub (Nov 13, 2024):

@carpawell, think of testing our relay/upload problem. How can we do that without a node purposefully failing requests? How can we get a node with this behavior?

@roman-khimov commented on GitHub (Nov 13, 2024): @carpawell, think of testing our relay/upload problem. How can we do that without a node purposefully failing requests? How can we get a node with this behavior?

sami commented

2025-12-28 17:22:23 +00:00

Author

Owner

@carpawell commented on GitHub (Nov 14, 2024):

Do you want to solve it in this issue's context? Well, we need some status response that does not stop a non-container node. This may work: nspcc-dev/neofs-api@d7c033317a/status/types.proto (L116-L123). Put lock when it is not allowed or PUT TS on a lock object. Putting already removed objects also could work.

@carpawell commented on GitHub (Nov 14, 2024): Do you want to solve it in this issue's context? Well, we need some status response that does not stop a non-container node. This may work: https://github.com/nspcc-dev/neofs-api/blob/d7c033317a66aff467728f4ca5a86879edb66e54/status/types.proto#L116-L123. Put lock when it is not allowed or PUT TS on a lock object. Putting already removed objects also could work.

Rows
Columns

Add a possibility to add error injections to the code from system tests #1261