diff mbox series

[v6,net-next,07/15] net-shapers: implement shaper cleanup on queue deletion

Message ID 160421ccd6deedfd4d531f0239e80077f19db1d0.1725457317.git.pabeni@redhat.com
State Handled Elsewhere
Headers show
Series net: introduce TX H/W shaping API | expand

Commit Message

Paolo Abeni Sept. 4, 2024, 1:53 p.m. UTC
hook into netif_set_real_num_tx_queues() to cleanup any shaper
configured on top of the to-be-destroyed TX queues.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 net/core/dev.c      |  2 ++
 net/core/dev.h      |  4 ++++
 net/shaper/shaper.c | 31 +++++++++++++++++++++++++++++++
 3 files changed, 37 insertions(+)

Comments

Jakub Kicinski Sept. 5, 2024, 1:33 a.m. UTC | #1
On Wed,  4 Sep 2024 15:53:39 +0200 Paolo Abeni wrote:
> +		net_shaper_set_real_num_tx_queues(dev, txq);
> +
>  		dev_qdisc_change_real_num_tx(dev, txq);
>  
>  		dev->real_num_tx_queues = txq;

The dev->lock has to be taken here, around those three lines,
and then set / group must check QUEUE ids against 
dev->real_num_tx_queues, no? Otherwise the work 
net_shaper_set_real_num_tx_queues() does is prone to races?
Paolo Abeni Sept. 5, 2024, 6:02 p.m. UTC | #2
On 9/5/24 03:33, Jakub Kicinski wrote:
> On Wed,  4 Sep 2024 15:53:39 +0200 Paolo Abeni wrote:
>> +		net_shaper_set_real_num_tx_queues(dev, txq);
>> +
>>   		dev_qdisc_change_real_num_tx(dev, txq);
>>   
>>   		dev->real_num_tx_queues = txq;
> 
> The dev->lock has to be taken here, around those three lines,
> and then set / group must check QUEUE ids against
> dev->real_num_tx_queues, no? Otherwise the work
> net_shaper_set_real_num_tx_queues() does is prone to races?

Yes, I think such race exists, but I'm unsure that tacking the lock 
around the above code will be enough.

i.e. if the relevant devices has 16 channel queues the set() races with 
a channel reconf on different CPUs:

CPU 1						CPU 2

set_channels(8)

driver_set_channel()
// actually change the number of queues to
// 8, dev->real_num_tx_queues is still 16
// dev->lock is not held yet because the
// driver still has to call
// netif_set_real_num_tx_queues()
						set(QUEUE_15,...)
						// will pass validation
						// but queue 15 does not
						// exist anymore

Acquiring dev->lock around set_channel() will not be enough: some driver 
change the channels number i.e. when enabling XDP.

I think/fear we need to replace the dev->lock with the rtnl lock to 
solve the race for good.

/P
Jakub Kicinski Sept. 6, 2024, 1:25 a.m. UTC | #3
On Thu, 5 Sep 2024 20:02:38 +0200 Paolo Abeni wrote:
> > The dev->lock has to be taken here, around those three lines,
> > and then set / group must check QUEUE ids against
> > dev->real_num_tx_queues, no? Otherwise the work
> > net_shaper_set_real_num_tx_queues() does is prone to races?  
> 
> Yes, I think such race exists, but I'm unsure that tacking the lock 
> around the above code will be enough.

I think "enough" will be subjective. Right now patch 7 provides no real
guarantee.

> i.e. if the relevant devices has 16 channel queues the set() races with 
> a channel reconf on different CPUs:
> 
> CPU 1						CPU 2
> 
> set_channels(8)
> 
> driver_set_channel()
> // actually change the number of queues to
> // 8, dev->real_num_tx_queues is still 16
> // dev->lock is not held yet because the
> // driver still has to call
> // netif_set_real_num_tx_queues()
> 						set(QUEUE_15,...)
> 						// will pass validation
> 						// but queue 15 does not
> 						// exist anymore

That may be true - in my proposal the driver can only expect that once
netif_set_real_num_tx_queues() returns core will not issue rate limit
ops on disabled queues. Driver has to make sure rate limit ops for old
queues are accepted all the way up to the call to set_real and ops for
new queues are accepted immediately after.

Importantly, the core's state is always consistent - given both the
flushing inside net_shaper_set_real_num_tx_queues() and proposed check
would be under netdev->lock.

For the driver -- let me flip the question around -- what do you expect
the locking scheme to be in case of channel count change? Alternatively
we could just expect the driver to take netdev->lock around the
appropriate section of code and we'd do:

void net_shaper_set_real_num_tx_queues(struct net_device *dev, ...)
{
	...
	if (!READ_ONCE(dev->net_shaper_hierarchy))
		return;

	lockdep_assert_held(dev->lock);
	...
}

I had a look at iavf, and there is no relevant locking around the queue
count check at all, so that doesn't help..

> Acquiring dev->lock around set_channel() will not be enough: some driver 
> change the channels number i.e. when enabling XDP.

Indeed, trying to lock before calling the driver would be both a huge
job and destined to fail.

> I think/fear we need to replace the dev->lock with the rtnl lock to 
> solve the race for good.

Maybe :( I think we need *an* answer for:
 - how we expect the driver to protect itself (assuming that the racy
   check in iavf_verify_handle() actually serves some purpose, which
   may not be true);
 - how we ensure consistency of core state (no shapers for queues which
   don't exist, assuming we agree having shapers for queues which
   don't exist is counter productive).

Reverting back to rtnl_lock for all would be sad, the scheme of
expecting the driver to take netdev->lock could work?
It's the model we effectively settled on in devlink.
Core->driver callbacks are always locked by the core,
for driver->core calls driver should explicitly take the lock
(some wrappers for lock+op+unlock are provided).
Paolo Abeni Sept. 6, 2024, 2:25 p.m. UTC | #4
On 9/6/24 03:25, Jakub Kicinski wrote:
> For the driver -- let me flip the question around -- what do you expect
> the locking scheme to be in case of channel count change? Alternatively
> we could just expect the driver to take netdev->lock around the
> appropriate section of code and we'd do:
> 
> void net_shaper_set_real_num_tx_queues(struct net_device *dev, ...)
> {
> 	...
> 	if (!READ_ONCE(dev->net_shaper_hierarchy))
> 		return;
> 
> 	lockdep_assert_held(dev->lock);
> 	...
> }

In the IAVF case that will be problematic, as AFAICS the channel reconf 
is done by 2 consecutive async task, the first task - iavf_reset_task - 
changes the actual number of channels freeing/allocating the driver 
resources and the 2nd one - iavf_finish_config - notify the stack 
issuing netif_set_real_num_tx_queues(). iavf_reset_task can't easily 
wait for iavf_finish_config due to locking order.

> I had a look at iavf, and there is no relevant locking around the queue
> count check at all, so that doesn't help.

Yep, that is racy.

>> Acquiring dev->lock around set_channel() will not be enough: some driver
>> change the channels number i.e. when enabling XDP.
> 
> Indeed, trying to lock before calling the driver would be both a huge
> job and destined to fail.
> 
>> I think/fear we need to replace the dev->lock with the rtnl lock to
>> solve the race for good.
> 
> Maybe :( I think we need *an* answer for:
>   - how we expect the driver to protect itself (assuming that the racy
>     check in iavf_verify_handle() actually serves some purpose, which
>     may not be true);
>   - how we ensure consistency of core state (no shapers for queues which
>     don't exist, assuming we agree having shapers for queues which
>     don't exist is counter productive).

I agree we must delete shapers on removed/deleted queues. The 
driver/firmware could reuse the same H/W resources for a different VF 
and such queue must start in the new VF with a default (no shaping) config.

> Reverting back to rtnl_lock for all would be sad, the scheme of
> expecting the driver to take netdev->lock could work?
> It's the model we effectively settled on in devlink.
> Core->driver callbacks are always locked by the core,
> for driver->core calls driver should explicitly take the lock
> (some wrappers for lock+op+unlock are provided).

I think/guess/hope the following could work:

- the driver wraps the h/w resources reconfiguration and 
netif_set_real_num_tx_queues() with dev->lock. In the iavf case, that 
means 2 separate critical sections: in iavf_reset_task() and in 
iavf_finish_config().

- the core, under dev->lock, checks vs real_num_tx_queues and call the 
shaper ops

- the iavf shaper callbacks would still need to check the queue id vs 
the current allocated hw resource number as the shapers ops could run 
in-between the 2 mentioned critical sections. The iavf driver could 
still act consistently with the core:

   - if real_num_tx_queues < qid < current_allocated_hw_resources
     set the shaper,
   - if current_allocated_hw_resources < qid < real_num_tx_queues do
     nothing and return success

In both the above scenarios, real_num_tx_queues will be set to 
current_allocated_hw_resources soon by the upcoming 
iavf_finish_config(), the core will update the hierarchy accordingly, 
the status will be consistent.

I think the code should be more clear, let me try to share it ASAP (or 
please block me soon ;)

Thanks,

Paolo
Jakub Kicinski Sept. 6, 2024, 2:42 p.m. UTC | #5
On Fri, 6 Sep 2024 16:25:25 +0200 Paolo Abeni wrote:
> I think the code should be more clear, let me try to share it ASAP (or 
> please block me soon ;)

No block, AFAIU - sounds great.
Paolo Abeni Sept. 6, 2024, 2:49 p.m. UTC | #6
On 9/5/24 20:02, Paolo Abeni wrote:
> On 9/5/24 03:33, Jakub Kicinski wrote:
>> On Wed,  4 Sep 2024 15:53:39 +0200 Paolo Abeni wrote:
>>> +		net_shaper_set_real_num_tx_queues(dev, txq);
>>> +
>>>    		dev_qdisc_change_real_num_tx(dev, txq);
>>>    
>>>    		dev->real_num_tx_queues = txq;
>>
>> The dev->lock has to be taken here, around those three lines,
>> and then set / group must check QUEUE ids against
>> dev->real_num_tx_queues, no? Otherwise the work
>> net_shaper_set_real_num_tx_queues() does is prone to races?
> 
> Yes, I think such race exists, but I'm unsure that tacking the lock
> around the above code will be enough.
> 
> i.e. if the relevant devices has 16 channel queues the set() races with
> a channel reconf on different CPUs:
> 
> CPU 1						CPU 2
> 
> set_channels(8)
> 
> driver_set_channel()
> // actually change the number of queues to
> // 8, dev->real_num_tx_queues is still 16
> // dev->lock is not held yet because the
> // driver still has to call
> // netif_set_real_num_tx_queues()
> 						set(QUEUE_15,...)
> 						// will pass validation
> 						// but queue 15 does not
> 						// exist anymore
> 
> Acquiring dev->lock around set_channel() will not be enough: some driver
> change the channels number i.e. when enabling XDP.
> 
> I think/fear we need to replace the dev->lock with the rtnl lock to
> solve the race for good.

I forgot to mention there is another, easier, alternative: keep the max 
queue id check in the drivers. The driver will have to acquire and held 
in the shaper callbacks the relevant driver-specific lock - 'crit_lock', 
in the iavf case.

Would you be ok with such 2nd option?

Side note: I think the iavf should have to acquire such lock in the 
callbacks no matter what or access/write to the rings info could be racy.

Thanks,

Paolo
Jakub Kicinski Sept. 6, 2024, 2:56 p.m. UTC | #7
On Fri, 6 Sep 2024 16:49:32 +0200 Paolo Abeni wrote:
> I forgot to mention there is another, easier, alternative: keep the max 
> queue id check in the drivers. The driver will have to acquire and held 
> in the shaper callbacks the relevant driver-specific lock - 'crit_lock', 
> in the iavf case.
> 
> Would you be ok with such 2nd option?

I'd strongly prefer if you implemented what was suggested.

> Side note: I think the iavf should have to acquire such lock in the 
> callbacks no matter what or access/write to the rings info could be racy.
diff mbox series

Patch

diff --git a/net/core/dev.c b/net/core/dev.c
index 8615f16e8456..33629a9d0661 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2948,6 +2948,8 @@  int netif_set_real_num_tx_queues(struct net_device *dev, unsigned int txq)
 		if (dev->num_tc)
 			netif_setup_tc(dev, txq);
 
+		net_shaper_set_real_num_tx_queues(dev, txq);
+
 		dev_qdisc_change_real_num_tx(dev, txq);
 
 		dev->real_num_tx_queues = txq;
diff --git a/net/core/dev.h b/net/core/dev.h
index 13c558874af3..d3ea92949ff3 100644
--- a/net/core/dev.h
+++ b/net/core/dev.h
@@ -37,8 +37,12 @@  void dev_addr_check(struct net_device *dev);
 
 #if IS_ENABLED(CONFIG_NET_SHAPER)
 void net_shaper_flush_netdev(struct net_device *dev);
+void net_shaper_set_real_num_tx_queues(struct net_device *dev,
+				       unsigned int txq);
 #else
 static inline void net_shaper_flush_netdev(struct net_device *dev) {}
+static inline void net_shaper_set_real_num_tx_queues(struct net_device *dev,
+						     unsigned int txq) {}
 #endif
 
 /* sysctls not referred to from outside net/core/ */
diff --git a/net/shaper/shaper.c b/net/shaper/shaper.c
index 1255d532b36a..f6f5712d7b3c 100644
--- a/net/shaper/shaper.c
+++ b/net/shaper/shaper.c
@@ -1201,6 +1201,37 @@  void net_shaper_flush_netdev(struct net_device *dev)
 	net_shaper_flush(&binding);
 }
 
+void net_shaper_set_real_num_tx_queues(struct net_device *dev,
+				       unsigned int txq)
+{
+	struct net_shaper_hierarchy *hierarchy;
+	struct net_shaper_binding binding;
+	int i;
+
+	binding.type = NET_SHAPER_BINDING_TYPE_NETDEV;
+	binding.netdev = dev;
+	hierarchy = net_shaper_hierarchy(&binding);
+	if (!hierarchy)
+		return;
+
+	net_shaper_lock(&binding);
+
+	/* Take action only when decreasing the tx queue number. */
+	for (i = txq; i < dev->real_num_tx_queues; ++i) {
+		struct net_shaper_handle handle;
+		struct net_shaper *shaper;
+
+		handle.scope = NET_SHAPER_SCOPE_QUEUE;
+		handle.id = i;
+		shaper = net_shaper_lookup(&binding, &handle);
+		if (!shaper)
+			continue;
+
+		__net_shaper_delete(&binding, shaper, NULL);
+	}
+	net_shaper_unlock(&binding);
+}
+
 static int __init shaper_init(void)
 {
 	return genl_register_family(&net_shaper_nl_family);