mirror of
https://github.com/nspcc-dev/neofs-node.git
synced 2026-03-01 04:29:10 +00:00
Implement searchv2 #1301
Labels
No labels
I1
I2
I3
I4
S0
S1
S2
S3
S4
U0
U1
U2
U3
U4
blocked
bug
config
dependencies
discussion
documentation
enhancement
enhancement
epic
feature
go
good first issue
help wanted
neofs-adm
neofs-cli
neofs-cli
neofs-cli
neofs-ir
neofs-lens
neofs-storage
neofs-storage
performance
question
security
task
test
windows
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
nspcc-dev/neofs-node#1301
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @roman-khimov on GitHub (Dec 18, 2024).
Is your feature request related to a problem? Please describe.
I'm always frustrated when we don't have an implementation for https://github.com/nspcc-dev/neofs-api/pull/314.
Describe the solution you'd like
The per-container DB should be structured like:
The mechanics is:
Each node does the following:
key>N && key <M), this can shortcut the search more quickly for numericsDescribe alternatives you've considered
SQL, various other types of DBs. But the scheme above should be sufficient for our primary cases now.
Additional context
#2990, #2757, #2989, https://github.com/nspcc-dev/neofs-api/issues/306
@roman-khimov commented on GitHub (Dec 19, 2024):
Caveat: creating a cursor from merged values can be non-trivial if attribute is not included into the requested list. It can be degraded to a simple OID then (complicating continuation somewhat) and in general most of use cases do need attribute values, but still.
@roman-khimov commented on GitHub (Dec 19, 2024):
Caveat 2: numeric values might require an additional prefix anyway since we can have
Index=100500in one object andIndex=abcdin another, using the same prefix they'd be mixed and we can end up treating strings as numbers.@cthulhu-rider commented on GitHub (Jan 9, 2025):
note:
per-container DBdescribes virtual structure, physically it is split within existing metabases@roman-khimov commented on GitHub (Jan 9, 2025):
Yes, we need to limit changes to this specific feature (expose API as early as possible) and deal with associated meta code (GC and alike) in future. Search is still possible with multiple DBs since results can be merged similar to the way results from different nodes are merged.
@cthulhu-rider commented on GitHub (Jan 10, 2025):
choice is obvious for system fields. For example, owner ID is a string while payload size is an integer
for user-defined attributes it is not so obvious. Like here https://github.com/nspcc-dev/neofs-node/issues/3058#issuecomment-2555706965. In current protocol, there is no way to determine whether user attribute is numeric or not. So, I rly doubt storing them in various formats is legit. But we can resolve this on search query processing. In original search, any non-integer attribute mismatches any numeric query. Do we wanna change this behaviour for SearchV2 somehow?
@roman-khimov u also mentioned some special prefix, could u pls elaborate on this thought?
@roman-khimov commented on GitHub (Jan 10, 2025):
You can only do this content-based, just like you do this now for old search. The only difference is that the choice is made when processing the object instead of when processing the search request.
Special prefix means splitting PREFIXB into B1 and B2 for numeric and string data.
@cthulhu-rider commented on GitHub (Jan 13, 2025):
shouldnt cursor be OID + values of requested attributes to sort/continue in PREFIXC in this case?
UPD: seems like no, missed this requirement
nspcc-dev/neofs-api@9f1f12866a/object/service.proto (L554-L555)@cthulhu-rider commented on GitHub (Feb 3, 2025):
i'd like to clarify primary seek in proposed algo. Consider objects:
where
ID1 < ID2 < ID3request:
FILTER Height>0 Count:1 Attributes:{Weight}first resp:
ID2 Weight:10 cursor:Height_10_ID2on 2nd request, we position to
ID2inPREFIXBbucket. Then the cursor will go toID3and skipID1, which is wrong: next resp item should beID1 Weight:20 cursor:Height_10_ID1this example shows that primary
Seek()andNext()can go wrong. Instead, we should iterate over allKEY_DELIM_VALUE_DELIM*items. Or am i missing smth?one more nuance
if last resp was
ID1 Weight:20 cursor:Height_10_ID1, then node should skipID2and respond withID3. If node stores all objects, it can restoreWeightattribute ofID1from the DB to compare other items against it. But if node does not storeID1, it'll respond withID2althoughts itsWeightis less. For this purpose it would help to have a cursor with all requested attributes' values, not just the primary one@roman-khimov commented on GitHub (Feb 3, 2025):
Correct. We have two options here:
Our primary use cases for now:
a < x < bsearch, isn't affectedFilePath=smth and we need a timestampFilePath=smth AND Type=smth AND maybe more AND please give me a lot of addtional attributesSecondary attribute order does have some advantages for the REST/S3 cases. But to be fair both would benefit a bit more from the reverse order, since when we're talking about time stamps we usually need the latest and it's going to be the last. Implementing reverse result order is certainly not something we want now. We still need this to be simple and to be fast. Both REST and S3 cases are not very likely to produce a lot of results at the same time (very likely to fit into 1000 limit). So I'd opt for relaxing ordering requirements to be "primary attribute only". Easier to implement, will work good enough for current users. If we're to find other use cases we can think of (even more advanced) ordering again.
@cthulhu-rider commented on GitHub (Feb 3, 2025):
full agree, lets start with this