Message ID | 4197C471DCF8714FBA1FE32565271C148FFFF4D3@ORSMSX103.amr.corp.intel.com |
---|---|
State | Not Applicable, archived |
Delegated to: | David Miller |
Headers | show |
On Wed, 19 Aug 2015, Patil, Kiran wrote:
> Acked-by: Kiran Patil <kiran.patil@intel.com>
Where's the call to preempt_disable() to prevent kernels with preemption
from making numa_node_id() invalid during this iteration?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 19 Aug 2015 17:18:15 -0700 (PDT) David Rientjes <rientjes@google.com> wrote: > On Wed, 19 Aug 2015, Patil, Kiran wrote: > > > Acked-by: Kiran Patil <kiran.patil@intel.com> > > Where's the call to preempt_disable() to prevent kernels with preemption > from making numa_node_id() invalid during this iteration? David asked this question twice, received no answer and now the patch is in the maintainer tree, destined for mainline. If I was asked this question I would respond The use of numa_mem_id() is racy and best-effort. If the unlikely race occurs, the memory allocation will occur on the wrong node, the overall result being very slightly suboptimal performance. The existing use of numa_node_id() suffers from the same issue. But I'm not the person proposing the patch. Please don't just ignore reviewer comments! -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 2015/10/9 4:20, Andrew Morton wrote: > On Wed, 19 Aug 2015 17:18:15 -0700 (PDT) David Rientjes <rientjes@google.com> wrote: > >> On Wed, 19 Aug 2015, Patil, Kiran wrote: >> >>> Acked-by: Kiran Patil <kiran.patil@intel.com> >> >> Where's the call to preempt_disable() to prevent kernels with preemption >> from making numa_node_id() invalid during this iteration? > > David asked this question twice, received no answer and now the patch > is in the maintainer tree, destined for mainline. > > If I was asked this question I would respond > > The use of numa_mem_id() is racy and best-effort. If the unlikely > race occurs, the memory allocation will occur on the wrong node, the > overall result being very slightly suboptimal performance. The > existing use of numa_node_id() suffers from the same issue. > > But I'm not the person proposing the patch. Please don't just ignore > reviewer comments! Hi Andrew, Apologize for the slow response due to personal reasons! And thanks for answering the question from David. To be honest, I didn't know how to answer this question before. Actually this question has puzzled me for a long time when dealing with memory hot-removal. For normal cases, it only causes sub-optimal memory allocation if schedule event happens between querying NUMA node id and calling alloc_pages_node(). But what happens if system run into following execution sequence? 1) node = numa_mem_id(); 2) memory hot-removal event triggers 2.1) remove affected memory 2.2) reset pgdat to zero if node becomes empty after memory removal 3) alloc_pages_node(), which may access zero-ed pgdat structure. I haven't found a mechanism to protect system from above sequence yet, so puzzled for a long time already:(. Does stop_machine() protect system from such a execution sequence? Thanks! Gerry > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 2015/10/09 14:52, Jiang Liu wrote: > On 2015/10/9 4:20, Andrew Morton wrote: >> On Wed, 19 Aug 2015 17:18:15 -0700 (PDT) David Rientjes <rientjes@google.com> wrote: >> >>> On Wed, 19 Aug 2015, Patil, Kiran wrote: >>> >>>> Acked-by: Kiran Patil <kiran.patil@intel.com> >>> >>> Where's the call to preempt_disable() to prevent kernels with preemption >>> from making numa_node_id() invalid during this iteration? >> >> David asked this question twice, received no answer and now the patch >> is in the maintainer tree, destined for mainline. >> >> If I was asked this question I would respond >> >> The use of numa_mem_id() is racy and best-effort. If the unlikely >> race occurs, the memory allocation will occur on the wrong node, the >> overall result being very slightly suboptimal performance. The >> existing use of numa_node_id() suffers from the same issue. >> >> But I'm not the person proposing the patch. Please don't just ignore >> reviewer comments! > Hi Andrew, > Apologize for the slow response due to personal reasons! > And thanks for answering the question from David. To be honest, > I didn't know how to answer this question before. Actually this > question has puzzled me for a long time when dealing with memory > hot-removal. For normal cases, it only causes sub-optimal memory > allocation if schedule event happens between querying NUMA node id > and calling alloc_pages_node(). But what happens if system run into > following execution sequence? > 1) node = numa_mem_id(); > 2) memory hot-removal event triggers > 2.1) remove affected memory > 2.2) reset pgdat to zero if node becomes empty after memory removal I'm sorry if I misunderstand something. After commit b0dc3a342af36f95a68fe229b8f0f73552c5ca08, there is no memset(). > 3) alloc_pages_node(), which may access zero-ed pgdat structure. ? > > I haven't found a mechanism to protect system from above sequence yet, > so puzzled for a long time already:(. Does stop_machine() protect > system from such a execution sequence? To access pgdat, a pgdat's zone should be on per-pgdat-zonelist. Now, __build_all_zonelists() is called under stop_machine(). That's the reason why you're asking what stop_machine() does. And, as you know, stop_machine() is not protecting anything. The caller may fallback into removed zone. Then, let's think. At first, please note "pgdat" is not removed (and cannot be removed), accessing pgdat's memory will not cause segmentation fault. Just contents are problem. At removal, zone's page related information and pgdat's page related information is cleared. alloc_pages uses zonelist/zoneref/cache to walk each zones without accessing pgdat itself. I think accessing zonelist is safe because it's an array updated by stop_machine(). So, the problem is alloc_pages() can work correctly even if zone contains no page. I think it should work. (Note: zones are included in pgdat. So, zeroing pgdat means zeroing zone and other structures. it will not work.) So, what problem you see now ? I'm sorry I can't chase old discusions. Thanks, -Kame -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 2015/10/9 17:08, Kamezawa Hiroyuki wrote: > On 2015/10/09 14:52, Jiang Liu wrote: >> On 2015/10/9 4:20, Andrew Morton wrote: >>> On Wed, 19 Aug 2015 17:18:15 -0700 (PDT) David Rientjes >>> <rientjes@google.com> wrote: >>> >>>> On Wed, 19 Aug 2015, Patil, Kiran wrote: >>>> >>>>> Acked-by: Kiran Patil <kiran.patil@intel.com> >>>> >>>> Where's the call to preempt_disable() to prevent kernels with >>>> preemption >>>> from making numa_node_id() invalid during this iteration? >>> >>> David asked this question twice, received no answer and now the patch >>> is in the maintainer tree, destined for mainline. >>> >>> If I was asked this question I would respond >>> >>> The use of numa_mem_id() is racy and best-effort. If the unlikely >>> race occurs, the memory allocation will occur on the wrong node, the >>> overall result being very slightly suboptimal performance. The >>> existing use of numa_node_id() suffers from the same issue. >>> >>> But I'm not the person proposing the patch. Please don't just ignore >>> reviewer comments! >> Hi Andrew, >> Apologize for the slow response due to personal reasons! >> And thanks for answering the question from David. To be honest, >> I didn't know how to answer this question before. Actually this >> question has puzzled me for a long time when dealing with memory >> hot-removal. For normal cases, it only causes sub-optimal memory >> allocation if schedule event happens between querying NUMA node id >> and calling alloc_pages_node(). But what happens if system run into >> following execution sequence? >> 1) node = numa_mem_id(); >> 2) memory hot-removal event triggers >> 2.1) remove affected memory >> 2.2) reset pgdat to zero if node becomes empty after memory removal > > I'm sorry if I misunderstand something. > After commit b0dc3a342af36f95a68fe229b8f0f73552c5ca08, there is no > memset(). Hi Kamezawa, Thanks for the information. The commit solved the issue what I was puzzling about. With this change applied, thing should work as expected. Seems it would be better to enhance __build_all_zonelists() to handle those offlined empty nodes too, but that really doesn't make to much difference:) Thanks for the info again! Thanks! Gerry -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c index 9a4f2bc70cd2..a8f618cb8eb0 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c +++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c @@ -1516,7 +1516,7 @@ static int i40e_clean_rx_irq_ps(struct i40e_ring *rx_ring, int budget) unsigned int total_rx_bytes = 0, total_rx_packets = 0; u16 rx_packet_len, rx_header_len, rx_sph, rx_hbo; u16 cleaned_count = I40E_DESC_UNUSED(rx_ring); - const int current_node = numa_node_id(); + const int current_node = numa_mem_id(); struct i40e_vsi *vsi = rx_ring->vsi; u16 i = rx_ring->next_to_clean; union i40e_rx_desc *rx_desc;