Consider options to improve listing #250

Closed
opened 2025-12-28 17:36:44 +00:00 by sami · 3 comments
Owner

Originally created by @alexvanin on GitHub (Jun 10, 2022).

Originally assigned to: @KirillovDenis on GitHub.

Listing operation might be quite slow on a buckets with thousands of objects. It will be better after #524, but we have options for bigger speed improvements.

Parallelize object.Head requests

One listing request starts a chain of object.Head requests into NeoFS. Run these requests in a pool of asynchronous workers.

🟢 Best effort/effect ratio
🔴 It exponentially increase number of object.Head requests produced by the gate. It will lead for more complex worker management and load distribution (maybe even across NeoFS endpoints).

Stream XML response

If client side software supports streaming XML response, we can do that. After every object.Head request, S3 Gateway will do a bit of post-processing (filling prefixes for dirs, update counters) and then send *data.ObjectInfo into channel / io.Writer.

On the other side, request handler will read from channel / io.Reader and submit XML elements into HTTP response. Implement custom xml.Marshaler interface and use EncodeElement function to manually write object info.

🟢 True streaming
🔴 Not sure if relevant at all. Is there any software that supports XML streaming of S3 responses? aws-sdk-go library does not support that. I doubt other libraries do.

Remove object.Head requests

Store all required data about the object in tree node, including size and owner.

🟢 No head requests
🔴 A bit more complex object uploading routine and more tree service payload

Originally created by @alexvanin on GitHub (Jun 10, 2022). Originally assigned to: @KirillovDenis on GitHub. Listing operation might be quite slow on a buckets with thousands of objects. It will be better after #524, but we have options for bigger speed improvements. ## Parallelize object.Head requests One listing request starts a chain of `object.Head` requests into NeoFS. Run these requests in a pool of asynchronous workers. :green_circle: Best effort/effect ratio :red_circle: It exponentially increase number of `object.Head` requests produced by the gate. It will lead for more complex worker management and load distribution (maybe even across NeoFS endpoints). ## Stream XML response If client side software supports streaming XML response, we can do that. After every `object.Head` request, S3 Gateway will do a bit of post-processing (filling prefixes for dirs, update counters) and then send `*data.ObjectInfo` into channel / io.Writer. On the other side, request handler will read from channel / io.Reader and submit XML elements into HTTP response. Implement custom `xml.Marshaler` interface and use `EncodeElement` function to manually write object info. :green_circle: True streaming :red_circle: Not sure if relevant at all. Is there any software that supports XML streaming of S3 responses? aws-sdk-go library does not support that. I doubt other libraries do. ## Remove object.Head requests Store all required data about the object in tree node, including size and owner. :green_circle: No head requests :red_circle: A bit more complex object uploading routine and more tree service payload
sami 2025-12-28 17:36:44 +00:00
Author
Owner

@alexvanin commented on GitHub (Jun 10, 2022):

/cc @KirillovDenis @masterSplinter01 @realloc

@alexvanin commented on GitHub (Jun 10, 2022): /cc @KirillovDenis @masterSplinter01 @realloc
Author
Owner

@alexvanin commented on GitHub (Jun 10, 2022):

If first byte response time is matter, then we can send first XML Token immediately and then keep processing as usual.

@alexvanin commented on GitHub (Jun 10, 2022): If first byte response time is matter, then we can send first XML Token immediately and then keep processing as usual.
Author
Owner

@KirillovDenis commented on GitHub (Jun 22, 2022):

Done in #538

@KirillovDenis commented on GitHub (Jun 22, 2022): Done in #538
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
nspcc-dev/neofs-s3-gw#250
No description provided.