Default CBF does not work with unspecified / distinct clauses #38

Closed
opened 2025-12-28 18:12:14 +00:00 by sami · 4 comments
Owner

Originally created by @alexvanin on GitHub (Jan 26, 2021).

Originally assigned to: @fyrchik on GitHub.

Consider rules:

  • REP 2
  • REP 2 CBF 3
  • REP 2 IN X\nSELECT 2 FROM * AS X

They all have unspecified selection clause but this is correctly handled (#174).
Also they all have explicit or implicit CBF value 3 (#156).

In network map of 4 nodes I expect GetContainerNodes() to return 4 nodes (#157) because it is smaller than max container size (2 * 3 = 6) but still bigger than min container size (2). Unfortunately all these placement rules produce 2 nodes as if CBF is 1.

However if there is a rule with ClauseSame, then function produce expected amount of nodes.

func TestPlacementPolicy_CBFWithEmptySelector(t *testing.T) {
	nodes := []NodeInfo{
		nodeInfoFromAttributes("ID", "1", "Attr", "Same"),
		nodeInfoFromAttributes("ID", "2", "Attr", "Same"),
		nodeInfoFromAttributes("ID", "3", "Attr", "Same"),
		nodeInfoFromAttributes("ID", "4", "Attr", "Same"),
	}

	p1 := newPlacementPolicy(0,
		[]*Replica{newReplica(2, "")},
		nil, // selectors
		nil, // filters
	)

	p2 := newPlacementPolicy(3,
		[]*Replica{newReplica(2, "")},
		nil, // selectors
		nil, // filters
	)

	p3 := newPlacementPolicy(0,
		[]*Replica{newReplica(2, "X")},
		[]*Selector{newSelector("X", "", ClauseDistinct, 2, "*")},
		nil, // filters
	)

	p4 := newPlacementPolicy(0,
		[]*Replica{newReplica(2, "X")},
		[]*Selector{newSelector("X", "Attr", ClauseSame, 2, "*")},
		nil, // filters
	)

	nm, err := NewNetmap(NodesFromInfo(nodes))
	require.NoError(t, err)

	v, err := nm.GetContainerNodes(p1, nil)
	require.NoError(t, err)
	assert.Len(t, v.Flatten(), 4)

	v, err = nm.GetContainerNodes(p2, nil)
	require.NoError(t, err)
	assert.Len(t, v.Flatten(), 4)

	v, err = nm.GetContainerNodes(p3, nil)
	require.NoError(t, err)
	assert.Len(t, v.Flatten(), 4)

	v, err = nm.GetContainerNodes(p4, nil)
	require.NoError(t, err)
	assert.Len(t, v.Flatten(), 4)

Is it bug or somehow expected behavior?

/cc @realloc @fyrchik

Originally created by @alexvanin on GitHub (Jan 26, 2021). Originally assigned to: @fyrchik on GitHub. Consider rules: - `REP 2` - `REP 2 CBF 3` - `REP 2 IN X\nSELECT 2 FROM * AS X` They all have unspecified selection clause but this is correctly handled (#174). Also they all have explicit or implicit `CBF` value 3 (#156). In network map of 4 nodes I expect `GetContainerNodes()` to return 4 nodes (#157) because it is smaller than max container size (`2 * 3 = 6`) but still bigger than min container size (`2`). Unfortunately all these placement rules produce 2 nodes as if CBF is 1. **However if there is a rule with `ClauseSame`, then function produce expected amount of nodes.** ```go func TestPlacementPolicy_CBFWithEmptySelector(t *testing.T) { nodes := []NodeInfo{ nodeInfoFromAttributes("ID", "1", "Attr", "Same"), nodeInfoFromAttributes("ID", "2", "Attr", "Same"), nodeInfoFromAttributes("ID", "3", "Attr", "Same"), nodeInfoFromAttributes("ID", "4", "Attr", "Same"), } p1 := newPlacementPolicy(0, []*Replica{newReplica(2, "")}, nil, // selectors nil, // filters ) p2 := newPlacementPolicy(3, []*Replica{newReplica(2, "")}, nil, // selectors nil, // filters ) p3 := newPlacementPolicy(0, []*Replica{newReplica(2, "X")}, []*Selector{newSelector("X", "", ClauseDistinct, 2, "*")}, nil, // filters ) p4 := newPlacementPolicy(0, []*Replica{newReplica(2, "X")}, []*Selector{newSelector("X", "Attr", ClauseSame, 2, "*")}, nil, // filters ) nm, err := NewNetmap(NodesFromInfo(nodes)) require.NoError(t, err) v, err := nm.GetContainerNodes(p1, nil) require.NoError(t, err) assert.Len(t, v.Flatten(), 4) v, err = nm.GetContainerNodes(p2, nil) require.NoError(t, err) assert.Len(t, v.Flatten(), 4) v, err = nm.GetContainerNodes(p3, nil) require.NoError(t, err) assert.Len(t, v.Flatten(), 4) v, err = nm.GetContainerNodes(p4, nil) require.NoError(t, err) assert.Len(t, v.Flatten(), 4) ``` Is it bug or somehow expected behavior? /cc @realloc @fyrchik
sami 2025-12-28 18:12:14 +00:00
Author
Owner

@realloc commented on GitHub (Jan 26, 2021):

After a talk with @fyrchik we think that it's a bug. =) A Possible solution will be posted after additional discussion.

@realloc commented on GitHub (Jan 26, 2021): After a talk with @fyrchik we think that it's a bug. =) A Possible solution will be posted after additional discussion.
Author
Owner

@fyrchik commented on GitHub (Jan 27, 2021):

SELECT N IN Attribute groups node by Attribute and returns N best possible buckets ("best" is defined deterministically by HRW). When default attribute is used, nodes can't be grouped by the value of some attribute. I have come up with 2 solutions:

  1. Group nodes in different buckets as before (1 node per bucket for first 3 policies in OP post) and take nodes from other buckets if placement policy isn't fullfilled. This can be done either for every selection (a) or only for default attribute (b).
    1.a. In this case for SELECT 3 IN Country we can receive placement which uses more than 1 country in each replica. However, this happens only in the worst case, when there are less than 3 countries containing enough nodes to reach CBF.
    1.b Here we still receive exactly 3 countries for SELECT from (a). For the default attribute (corresponding to "unique node") each bucket will contain as much nodes as possible.
  2. Use unspecified/distinct clause as a switch between 1a/1b. DISTINCT clause can be set as default by query parser, so SELECT 3 IN Country will behave as in 1b, but raw protobuf format will be able to omit specifying clause thus leading to 1a behaviour.

After all I think there is a tradeoff between simplicity from the user POV and simplicity (i.e "no special cases") of selection algorithm. The former is more desirable in my opinion.

Thoughts? @alexvanin @realloc

@fyrchik commented on GitHub (Jan 27, 2021): `SELECT N IN Attribute` groups node by `Attribute` and returns `N` best possible buckets ("best" is defined deterministically by HRW). When default attribute is used, nodes can't be grouped by the value of some attribute. I have come up with 2 solutions: 1. Group nodes in different buckets as before (1 node per bucket for first 3 policies in OP post) and take nodes from other buckets if placement policy isn't fullfilled. This can be done either for every selection (a) or only for default attribute (b). 1.a. In this case for `SELECT 3 IN Country` we can receive placement which uses more than 1 country in each replica. However, this happens only in the worst case, when there are less than 3 countries containing enough nodes to reach CBF. 1.b Here we still receive exactly 3 countries for `SELECT` from (a). For the default attribute (corresponding to "unique node") each bucket will contain as much nodes as possible. 2. Use unspecified/distinct clause as a switch between `1a`/`1b`. `DISTINCT` clause can be set as default by query parser, so `SELECT 3 IN Country` will behave as in `1b`, but raw protobuf format will be able to omit specifying clause thus leading to `1a` behaviour. After all I think there is a tradeoff between simplicity from the user POV and simplicity (i.e "no special cases") of selection algorithm. The former is more desirable in my opinion. Thoughts? @alexvanin @realloc
Author
Owner

@alexvanin commented on GitHub (Feb 3, 2021):

Seems like 1b option works great for our cases, it is simplest option and provides desired behavior. Option 1a gives more valid placements, however it decreases overall spatial locality in the system.

Tried 1b with such scenario, got expected result:

Netmap
---
Country:X [ Node1, Node2, Node3, Node 4]
Country:Y [ Node5 ]

Placement rule
---
REP 2
CBF 3
SELECT 2 IN DISTINCT Country FROM *

Result
---
Node1, Node2, Node3, Node5
@alexvanin commented on GitHub (Feb 3, 2021): Seems like `1b` option works great for our cases, it is simplest option and provides desired behavior. Option `1a` gives more valid placements, however it decreases overall spatial locality in the system. Tried `1b` with such scenario, got expected result: ``` Netmap --- Country:X [ Node1, Node2, Node3, Node 4] Country:Y [ Node5 ] Placement rule --- REP 2 CBF 3 SELECT 2 IN DISTINCT Country FROM * Result --- Node1, Node2, Node3, Node5 ```
Author
Owner

@alexvanin commented on GitHub (Feb 3, 2021):

Fixed in #252

@alexvanin commented on GitHub (Feb 3, 2021): Fixed in #252
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
nspcc-dev/neofs-api-go#38
No description provided.