Implement searchv2 #1301

Closed
opened 2025-12-28 17:22:31 +00:00 by sami · 10 comments
Owner

Originally created by @roman-khimov on GitHub (Dec 18, 2024).

I'm always frustrated when we don't have an implementation for https://github.com/nspcc-dev/neofs-api/pull/314.

Describe the solution you'd like

The per-container DB should be structured like:

Given
DELIM = 0x00 // Not a valid UTF-8, can't be attribute name or value
KEY // Attribute key or special attribute that can be searched for
VALUE // Attribute value which is either a string or number, strings are used as is, numbers are converted to fixed-width (256?) BE
PREFIXA // And B/C/D, some byte prefixes
OID // Object ID

For each object the following keys are created (with no values):
PREFIXA_OID // OIDs only to list objects without filters
PREFIXB_KEY_DELIM_VALUE_DELIM_OID // The main search workhorse, created for all keys
PREFIXC_OID_KEY_DELIM_VALUE // Auxiliary for secondary filters and additional data returned

The mechanics is:

  • node accepting the request forwards it to other nodes as in case of original search
  • it accepts answers, merges them and limits to the number requested (or max)
  • it generates the cursor corresponding to the result returned to client (more on cursor below)
  • reply is ready and sent

Each node does the following:

  • if there are no filters we're just looping over PREFIXA in DB, the cursor is OID as is
  • if there are filters, the first one is magic: whatever the key/action/value is there is used to position DB cursor into PREFIXB key if match is EQUAL/NOT_EQUAL/PREFIX/NUM* (don't forget about negative numbers)
  • other matches can be checked for using the same key (key>N && key <M), this can shortcut the search more quickly for numerics
  • iterating over the DB we check all matches via PREFIXC while the first one works (when it's not, we're done)
  • search is limited by the number of elements requested or max (1000), so we return results earlier if we have enough elements
  • we add requested fields of matching elements using PREFIXC
  • a cursor is returned if we're not yet at the end, it's KEY_DELIM_VALUE_DELIM_OID of the last element encoded as base64 or base58, this allows to continue easily
  • if the first filter is NOT_PRESENT we're also looping over PREFIXA and checking PREFIXC

Describe alternatives you've considered

SQL, various other types of DBs. But the scheme above should be sufficient for our primary cases now.

Additional context

#2990, #2757, #2989, https://github.com/nspcc-dev/neofs-api/issues/306

Originally created by @roman-khimov on GitHub (Dec 18, 2024). ## Is your feature request related to a problem? Please describe. I'm always frustrated when we don't have an implementation for https://github.com/nspcc-dev/neofs-api/pull/314. ## Describe the solution you'd like The per-container DB should be structured like: ``` Given DELIM = 0x00 // Not a valid UTF-8, can't be attribute name or value KEY // Attribute key or special attribute that can be searched for VALUE // Attribute value which is either a string or number, strings are used as is, numbers are converted to fixed-width (256?) BE PREFIXA // And B/C/D, some byte prefixes OID // Object ID For each object the following keys are created (with no values): PREFIXA_OID // OIDs only to list objects without filters PREFIXB_KEY_DELIM_VALUE_DELIM_OID // The main search workhorse, created for all keys PREFIXC_OID_KEY_DELIM_VALUE // Auxiliary for secondary filters and additional data returned ``` The mechanics is: * node accepting the request forwards it to other nodes as in case of original search * it accepts answers, merges them and limits to the number requested (or max) * it generates the cursor corresponding to the result returned to client (more on cursor below) * reply is ready and sent Each node does the following: * if there are no filters we're just looping over PREFIXA in DB, the cursor is OID as is * if there are filters, the first one is magic: whatever the key/action/value is there is used to position DB cursor into PREFIXB key if match is EQUAL/NOT_EQUAL/PREFIX/NUM* (don't forget about negative numbers) * other matches can be checked for using the same key (`key>N && key <M`), this can shortcut the search more quickly for numerics * iterating over the DB we check all matches via PREFIXC while the first one works (when it's not, we're done) * search is limited by the number of elements requested or max (1000), so we return results earlier if we have enough elements * we add requested fields of matching elements using PREFIXC * a cursor is returned if we're not yet at the end, it's KEY_DELIM_VALUE_DELIM_OID of the last element encoded as base64 or base58, this allows to continue easily * if the first filter is NOT_PRESENT we're also looping over PREFIXA and checking PREFIXC ## Describe alternatives you've considered SQL, various other types of DBs. But the scheme above should be sufficient for our primary cases now. ## Additional context #2990, #2757, #2989, https://github.com/nspcc-dev/neofs-api/issues/306
sami 2025-12-28 17:22:31 +00:00
Author
Owner

@roman-khimov commented on GitHub (Dec 19, 2024):

Caveat: creating a cursor from merged values can be non-trivial if attribute is not included into the requested list. It can be degraded to a simple OID then (complicating continuation somewhat) and in general most of use cases do need attribute values, but still.

@roman-khimov commented on GitHub (Dec 19, 2024): Caveat: creating a cursor from merged values can be non-trivial if attribute is not included into the requested list. It can be degraded to a simple OID then (complicating continuation somewhat) and in general most of use cases do need attribute values, but still.
Author
Owner

@roman-khimov commented on GitHub (Dec 19, 2024):

Caveat 2: numeric values might require an additional prefix anyway since we can have Index=100500 in one object and Index=abcd in another, using the same prefix they'd be mixed and we can end up treating strings as numbers.

@roman-khimov commented on GitHub (Dec 19, 2024): Caveat 2: numeric values might require an additional prefix anyway since we can have `Index=100500` in one object and `Index=abcd` in another, using the same prefix they'd be mixed and we can end up treating strings as numbers.
Author
Owner

@cthulhu-rider commented on GitHub (Jan 9, 2025):

note: per-container DB describes virtual structure, physically it is split within existing metabases

@cthulhu-rider commented on GitHub (Jan 9, 2025): note: `per-container DB` describes virtual structure, physically it is split within existing metabases
Author
Owner

@roman-khimov commented on GitHub (Jan 9, 2025):

Yes, we need to limit changes to this specific feature (expose API as early as possible) and deal with associated meta code (GC and alike) in future. Search is still possible with multiple DBs since results can be merged similar to the way results from different nodes are merged.

@roman-khimov commented on GitHub (Jan 9, 2025): Yes, we need to limit changes to this specific feature (expose API as early as possible) and deal with associated meta code (GC and alike) in future. Search is still possible with multiple DBs since results can be merged similar to the way results from different nodes are merged.
Author
Owner

@cthulhu-rider commented on GitHub (Jan 10, 2025):

Attribute value which is either a string or number, strings are used as is, numbers are converted to fixed-width (256?) BE

choice is obvious for system fields. For example, owner ID is a string while payload size is an integer

for user-defined attributes it is not so obvious. Like here https://github.com/nspcc-dev/neofs-node/issues/3058#issuecomment-2555706965. In current protocol, there is no way to determine whether user attribute is numeric or not. So, I rly doubt storing them in various formats is legit. But we can resolve this on search query processing. In original search, any non-integer attribute mismatches any numeric query. Do we wanna change this behaviour for SearchV2 somehow?

@roman-khimov u also mentioned some special prefix, could u pls elaborate on this thought?

@cthulhu-rider commented on GitHub (Jan 10, 2025): > Attribute value which is either a string or number, strings are used as is, numbers are converted to fixed-width (256?) BE choice is obvious for system fields. For example, owner ID is a string while payload size is an integer for user-defined attributes it is not so obvious. Like here https://github.com/nspcc-dev/neofs-node/issues/3058#issuecomment-2555706965. In current protocol, there is no way to determine whether user attribute is numeric or not. So, I rly doubt storing them in various formats is legit. But we can resolve this on search query processing. In original search, any non-integer attribute mismatches any numeric query. Do we wanna change this behaviour for SearchV2 somehow? @roman-khimov u also mentioned some special prefix, could u pls elaborate on this thought?
Author
Owner

@roman-khimov commented on GitHub (Jan 10, 2025):

You can only do this content-based, just like you do this now for old search. The only difference is that the choice is made when processing the object instead of when processing the search request.

Special prefix means splitting PREFIXB into B1 and B2 for numeric and string data.

@roman-khimov commented on GitHub (Jan 10, 2025): You can only do this content-based, just like you do this now for old search. The only difference is that the choice is made when processing the object instead of when processing the search request. Special prefix means splitting PREFIXB into B1 and B2 for numeric and string data.
Author
Owner

@cthulhu-rider commented on GitHub (Jan 13, 2025):

if there are no filters we're just looping over PREFIXA in DB, the cursor is OID as is

shouldnt cursor be OID + values of requested attributes to sort/continue in PREFIXC in this case?

UPD: seems like no, missed this requirement nspcc-dev/neofs-api@9f1f12866a/object/service.proto (L554-L555)

@cthulhu-rider commented on GitHub (Jan 13, 2025): > if there are no filters we're just looping over PREFIXA in DB, the cursor is OID as is shouldnt cursor be OID + values of requested attributes to sort/continue in PREFIXC in this case? UPD: seems like no, missed this requirement https://github.com/nspcc-dev/neofs-api/blob/9f1f12866a4742adb7778c51bd632cd240f81262/object/service.proto#L554-L555
Author
Owner

@cthulhu-rider commented on GitHub (Feb 3, 2025):

i'd like to clarify primary seek in proposed algo. Consider objects:

ID1 Height:10 Weight:20
ID2 Height:10 Weight:10
ID3 Height:10 Weight:30

where ID1 < ID2 < ID3

request: FILTER Height>0 Count:1 Attributes:{Weight}

first resp: ID2 Weight:10 cursor:Height_10_ID2

on 2nd request, we position to ID2 in PREFIXB bucket. Then the cursor will go to ID3 and skip ID1, which is wrong: next resp item should be ID1 Weight:20 cursor:Height_10_ID1

this example shows that primary Seek() and Next() can go wrong. Instead, we should iterate over all KEY_DELIM_VALUE_DELIM* items. Or am i missing smth?


one more nuance

if last resp was ID1 Weight:20 cursor:Height_10_ID1, then node should skip ID2 and respond with ID3. If node stores all objects, it can restore Weight attribute of ID1 from the DB to compare other items against it. But if node does not store ID1, it'll respond with ID2 althoughts its Weight is less. For this purpose it would help to have a cursor with all requested attributes' values, not just the primary one

@cthulhu-rider commented on GitHub (Feb 3, 2025): i'd like to clarify primary seek in proposed algo. Consider objects: ``` ID1 Height:10 Weight:20 ID2 Height:10 Weight:10 ID3 Height:10 Weight:30 ``` where `ID1 < ID2 < ID3` request: `FILTER Height>0 Count:1 Attributes:{Weight}` first resp: `ID2 Weight:10 cursor:Height_10_ID2` on 2nd request, we position to `ID2` in `PREFIXB` bucket. Then the cursor will go to `ID3` and skip `ID1`, which is wrong: next resp item should be `ID1 Weight:20 cursor:Height_10_ID1` this example shows that primary `Seek()` and `Next()` can go wrong. Instead, we should iterate over all `KEY_DELIM_VALUE_DELIM*` items. Or am i missing smth? --- one more nuance if last resp was `ID1 Weight:20 cursor:Height_10_ID1`, then node should skip `ID2` and respond with `ID3`. If node stores all objects, it can restore `Weight` attribute of `ID1` from the DB to compare other items against it. But if node does not store `ID1`, it'll respond with `ID2` althoughts its `Weight` is less. For this purpose it would help to have a cursor with all requested attributes' values, not just the primary one
Author
Owner

@roman-khimov commented on GitHub (Feb 3, 2025):

Correct. We have two options here:

  • iterate whole primary attribute prefix again
  • relax ordering requirements for secondary attributes

Our primary use cases for now:

  • OID enumeration, isn't affected
  • block index a < x < b search, isn't affected
  • REST FilePath=smth and we need a timestamp
  • S3 FilePath=smth AND Type=smth AND maybe more AND please give me a lot of addtional attributes

Secondary attribute order does have some advantages for the REST/S3 cases. But to be fair both would benefit a bit more from the reverse order, since when we're talking about time stamps we usually need the latest and it's going to be the last. Implementing reverse result order is certainly not something we want now. We still need this to be simple and to be fast. Both REST and S3 cases are not very likely to produce a lot of results at the same time (very likely to fit into 1000 limit). So I'd opt for relaxing ordering requirements to be "primary attribute only". Easier to implement, will work good enough for current users. If we're to find other use cases we can think of (even more advanced) ordering again.

@roman-khimov commented on GitHub (Feb 3, 2025): Correct. We have two options here: * iterate whole primary attribute prefix again * relax ordering requirements for secondary attributes Our primary use cases for now: * OID enumeration, isn't affected * block index `a < x < b` search, isn't affected * REST `FilePath=smth and we need a timestamp` * S3 `FilePath=smth AND Type=smth AND maybe more AND please give me a lot of addtional attributes` Secondary attribute order does have some advantages for the REST/S3 cases. But to be fair both would benefit a bit more from the _reverse_ order, since when we're talking about time stamps we usually need the latest and it's going to be the last. Implementing reverse result order is certainly not something we want now. We still need this to be simple and to be fast. Both REST and S3 cases are not very likely to produce a lot of results at the same time (very likely to fit into 1000 limit). So I'd opt for relaxing ordering requirements to be "primary attribute only". Easier to implement, will work good enough for current users. If we're to find other use cases we can think of (even more advanced) ordering again.
Author
Owner

@cthulhu-rider commented on GitHub (Feb 3, 2025):

So I'd opt for relaxing ordering requirements to be "primary attribute only". Easier to implement

full agree, lets start with this

@cthulhu-rider commented on GitHub (Feb 3, 2025): > So I'd opt for relaxing ordering requirements to be "primary attribute only". Easier to implement full agree, lets start with this
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
nspcc-dev/neofs-node#1301
No description provided.