diff mbox

[V9,06/12] NUMA: Add Linux libnuma detection

Message ID 1377231003-2816-7-git-send-email-gaowanlong@cn.fujitsu.com
State New
Headers show

Commit Message

Wanlong Gao Aug. 23, 2013, 4:09 a.m. UTC
Add detection of libnuma (mostly contained in the numactl package)
to the configure script. Can be enabled or disabled on the command line,
default is use if available.

Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
---
 configure | 32 ++++++++++++++++++++++++++++++++
 1 file changed, 32 insertions(+)

Comments

Andrew Jones Aug. 23, 2013, 8:40 a.m. UTC | #1
----- Original Message -----
> Add detection of libnuma (mostly contained in the numactl package)
> to the configure script. Can be enabled or disabled on the command line,
> default is use if available.
> 
> Signed-off-by: Andre Przywara <andre.przywara@amd.com>
> Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>

Is this patch still necessary? I thought that dropping the
numa_num_configured_nodes() calls from patch 8/12 got rid
of the need for this library. Maybe I missed other uses?

drew
Wanlong Gao Aug. 26, 2013, 1:43 a.m. UTC | #2
On 08/23/2013 04:40 PM, Andrew Jones wrote:
> 
> 
> ----- Original Message -----
>> Add detection of libnuma (mostly contained in the numactl package)
>> to the configure script. Can be enabled or disabled on the command line,
>> default is use if available.
>>
>> Signed-off-by: Andre Przywara <andre.przywara@amd.com>
>> Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
> 
> Is this patch still necessary? I thought that dropping the
> numa_num_configured_nodes() calls from patch 8/12 got rid
> of the need for this library. Maybe I missed other uses?

Yes, in 08/12 we also use mbind(), and in 09/12 we use max_numa_node().

Thanks,
Wanlong Gao

> 
> drew
>
Andrew Jones Aug. 26, 2013, 7:46 a.m. UTC | #3
----- Original Message -----
> On 08/23/2013 04:40 PM, Andrew Jones wrote:
> > 
> > 
> > ----- Original Message -----
> >> Add detection of libnuma (mostly contained in the numactl package)
> >> to the configure script. Can be enabled or disabled on the command line,
> >> default is use if available.
> >>
> >> Signed-off-by: Andre Przywara <andre.przywara@amd.com>
> >> Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
> > 
> > Is this patch still necessary? I thought that dropping the
> > numa_num_configured_nodes() calls from patch 8/12 got rid
> > of the need for this library. Maybe I missed other uses?
> 
> Yes, in 08/12 we also use mbind(), 

You don't need a whole library for mbind(), it's a syscall. See syscall(2).

> and in 09/12 we use max_numa_node().

Really? I didn't see it there. And anyway, that goes back to our discussion
about setting qemu's MAX_NODES to whatever we think qemu should support,
and then just checking that we don't blow that limit whenever reading
host node info, i.e.

maxnode = 0;
while (host_nodes[maxnode] && maxnode < MAX_NODES)
  node_read(&info[maxnode++]);

type of a thing.

And, if there's a place you really need to know the current online number
of host nodes, then, like I said earlier, you should just go to sysfs
yourself. libnuma:numa_max_node() returns an int that it only initializes
at library load time, so it's not going to adapt to onlining/offlining.

drew

> 
> Thanks,
> Wanlong Gao
> 
> > 
> > drew
> > 
> 
> 
>
Wanlong Gao Aug. 26, 2013, 8:16 a.m. UTC | #4
On 08/26/2013 03:46 PM, Andrew Jones wrote:
>>> Is this patch still necessary? I thought that dropping the
>>> > > numa_num_configured_nodes() calls from patch 8/12 got rid
>>> > > of the need for this library. Maybe I missed other uses?
>> > 
>> > Yes, in 08/12 we also use mbind(), 
> You don't need a whole library for mbind(), it's a syscall. See syscall(2).
> 
>> > and in 09/12 we use max_numa_node().
> Really? I didn't see it there. And anyway, that goes back to our discussion
> about setting qemu's MAX_NODES to whatever we think qemu should support,
> and then just checking that we don't blow that limit whenever reading
> host node info, i.e.
> 
> maxnode = 0;
> while (host_nodes[maxnode] && maxnode < MAX_NODES)
>   node_read(&info[maxnode++]);
> 
> type of a thing.
> 
> And, if there's a place you really need to know the current online number
> of host nodes, then, like I said earlier, you should just go to sysfs
> yourself. libnuma:numa_max_node() returns an int that it only initializes
> at library load time, so it's not going to adapt to onlining/offlining.

OK, thank you.
Then I should define MPOL_* macros in QEMU and use mbind(2) syscall directly, right?

Thanks,
Wanlong Gao

> 
> drew
>
Andrew Jones Aug. 26, 2013, 8:43 a.m. UTC | #5
----- Original Message -----
> On 08/26/2013 03:46 PM, Andrew Jones wrote:
> >>> Is this patch still necessary? I thought that dropping the
> >>> > > numa_num_configured_nodes() calls from patch 8/12 got rid
> >>> > > of the need for this library. Maybe I missed other uses?
> >> > 
> >> > Yes, in 08/12 we also use mbind(),
> > You don't need a whole library for mbind(), it's a syscall. See syscall(2).
> > 
> >> > and in 09/12 we use max_numa_node().
> > Really? I didn't see it there. And anyway, that goes back to our discussion
> > about setting qemu's MAX_NODES to whatever we think qemu should support,
> > and then just checking that we don't blow that limit whenever reading
> > host node info, i.e.
> > 
> > maxnode = 0;
> > while (host_nodes[maxnode] && maxnode < MAX_NODES)
> >   node_read(&info[maxnode++]);
> > 
> > type of a thing.
> > 
> > And, if there's a place you really need to know the current online number
> > of host nodes, then, like I said earlier, you should just go to sysfs
> > yourself. libnuma:numa_max_node() returns an int that it only initializes
> > at library load time, so it's not going to adapt to onlining/offlining.
> 
> OK, thank you.
> Then I should define MPOL_* macros in QEMU and use mbind(2) syscall directly,
> right?

Hmm, yeah, that's too bad that numaif.h is part of libnuma, and not a more
general lib. Whether or not we want to redefine those symbols within
qemu, in order to avoid the dependency on installing numactl-devel, isn't
something I can answer. That's a better question for Anthony. Anthony? Paolo,
any opinions? Maybe we should pick up uapi/linux/mempolicy.h with the
linux-header synch script?

thanks,
drew

> 
> Thanks,
> Wanlong Gao
> 
> > 
> > drew
> > 
> 
>
Paolo Bonzini Aug. 28, 2013, 1:44 p.m. UTC | #6
Il 26/08/2013 10:43, Andrew Jones ha scritto:
> 
> ----- Original Message -----
>> > On 08/26/2013 03:46 PM, Andrew Jones wrote:
>>>>> > >>> Is this patch still necessary? I thought that dropping the
>>>>>>> > >>> > > numa_num_configured_nodes() calls from patch 8/12 got rid
>>>>>>> > >>> > > of the need for this library. Maybe I missed other uses?
>>>>> > >> > 
>>>>> > >> > Yes, in 08/12 we also use mbind(),
>>> > > You don't need a whole library for mbind(), it's a syscall. See syscall(2).
>>> > > 
>>>>> > >> > and in 09/12 we use max_numa_node().
>>> > > Really? I didn't see it there. And anyway, that goes back to our discussion
>>> > > about setting qemu's MAX_NODES to whatever we think qemu should support,
>>> > > and then just checking that we don't blow that limit whenever reading
>>> > > host node info, i.e.
>>> > > 
>>> > > maxnode = 0;
>>> > > while (host_nodes[maxnode] && maxnode < MAX_NODES)
>>> > >   node_read(&info[maxnode++]);
>>> > > 
>>> > > type of a thing.
>>> > > 
>>> > > And, if there's a place you really need to know the current online number
>>> > > of host nodes, then, like I said earlier, you should just go to sysfs
>>> > > yourself. libnuma:numa_max_node() returns an int that it only initializes
>>> > > at library load time, so it's not going to adapt to onlining/offlining.
>> > 
>> > OK, thank you.
>> > Then I should define MPOL_* macros in QEMU and use mbind(2) syscall directly,
>> > right?
> Hmm, yeah, that's too bad that numaif.h is part of libnuma, and not a more
> general lib. Whether or not we want to redefine those symbols within
> qemu, in order to avoid the dependency on installing numactl-devel, isn't
> something I can answer. That's a better question for Anthony. Anthony? Paolo,
> any opinions? Maybe we should pick up uapi/linux/mempolicy.h with the
> linux-header synch script?
> 

I think using libnuma is fine.  In principle this could be used on other
OSes than Linux, I think?

Paolo
Wanlong Gao Aug. 29, 2013, 2:22 a.m. UTC | #7
On 08/28/2013 09:44 PM, Paolo Bonzini wrote:
> Il 26/08/2013 10:43, Andrew Jones ha scritto:
>>
>> ----- Original Message -----
>>>> On 08/26/2013 03:46 PM, Andrew Jones wrote:
>>>>>>>>>> Is this patch still necessary? I thought that dropping the
>>>>>>>>>>>>>> numa_num_configured_nodes() calls from patch 8/12 got rid
>>>>>>>>>>>>>> of the need for this library. Maybe I missed other uses?
>>>>>>>>>>
>>>>>>>>>> Yes, in 08/12 we also use mbind(),
>>>>>> You don't need a whole library for mbind(), it's a syscall. See syscall(2).
>>>>>>
>>>>>>>>>> and in 09/12 we use max_numa_node().
>>>>>> Really? I didn't see it there. And anyway, that goes back to our discussion
>>>>>> about setting qemu's MAX_NODES to whatever we think qemu should support,
>>>>>> and then just checking that we don't blow that limit whenever reading
>>>>>> host node info, i.e.
>>>>>>
>>>>>> maxnode = 0;
>>>>>> while (host_nodes[maxnode] && maxnode < MAX_NODES)
>>>>>>   node_read(&info[maxnode++]);
>>>>>>
>>>>>> type of a thing.
>>>>>>
>>>>>> And, if there's a place you really need to know the current online number
>>>>>> of host nodes, then, like I said earlier, you should just go to sysfs
>>>>>> yourself. libnuma:numa_max_node() returns an int that it only initializes
>>>>>> at library load time, so it's not going to adapt to onlining/offlining.
>>>>
>>>> OK, thank you.
>>>> Then I should define MPOL_* macros in QEMU and use mbind(2) syscall directly,
>>>> right?
>> Hmm, yeah, that's too bad that numaif.h is part of libnuma, and not a more
>> general lib. Whether or not we want to redefine those symbols within
>> qemu, in order to avoid the dependency on installing numactl-devel, isn't
>> something I can answer. That's a better question for Anthony. Anthony? Paolo,
>> any opinions? Maybe we should pick up uapi/linux/mempolicy.h with the
>> linux-header synch script?
>>
> 
> I think using libnuma is fine.  In principle this could be used on other
> OSes than Linux, I think?

But seems that mbind(2) is Linux-specific syscall, right?

Thanks,
Wanlong Gao

> 
> Paolo
>
Andrew Jones Aug. 29, 2013, 8:15 a.m. UTC | #8
----- Original Message -----
> On 08/28/2013 09:44 PM, Paolo Bonzini wrote:
> > Il 26/08/2013 10:43, Andrew Jones ha scritto:
> >>
> >> ----- Original Message -----
> >>>> On 08/26/2013 03:46 PM, Andrew Jones wrote:
> >>>>>>>>>> Is this patch still necessary? I thought that dropping the
> >>>>>>>>>>>>>> numa_num_configured_nodes() calls from patch 8/12 got rid
> >>>>>>>>>>>>>> of the need for this library. Maybe I missed other uses?
> >>>>>>>>>>
> >>>>>>>>>> Yes, in 08/12 we also use mbind(),
> >>>>>> You don't need a whole library for mbind(), it's a syscall. See
> >>>>>> syscall(2).
> >>>>>>
> >>>>>>>>>> and in 09/12 we use max_numa_node().
> >>>>>> Really? I didn't see it there. And anyway, that goes back to our
> >>>>>> discussion
> >>>>>> about setting qemu's MAX_NODES to whatever we think qemu should
> >>>>>> support,
> >>>>>> and then just checking that we don't blow that limit whenever reading
> >>>>>> host node info, i.e.
> >>>>>>
> >>>>>> maxnode = 0;
> >>>>>> while (host_nodes[maxnode] && maxnode < MAX_NODES)
> >>>>>>   node_read(&info[maxnode++]);
> >>>>>>
> >>>>>> type of a thing.
> >>>>>>
> >>>>>> And, if there's a place you really need to know the current online
> >>>>>> number
> >>>>>> of host nodes, then, like I said earlier, you should just go to sysfs
> >>>>>> yourself. libnuma:numa_max_node() returns an int that it only
> >>>>>> initializes
> >>>>>> at library load time, so it's not going to adapt to
> >>>>>> onlining/offlining.
> >>>>
> >>>> OK, thank you.
> >>>> Then I should define MPOL_* macros in QEMU and use mbind(2) syscall
> >>>> directly,
> >>>> right?
> >> Hmm, yeah, that's too bad that numaif.h is part of libnuma, and not a more
> >> general lib. Whether or not we want to redefine those symbols within
> >> qemu, in order to avoid the dependency on installing numactl-devel, isn't
> >> something I can answer. That's a better question for Anthony. Anthony?
> >> Paolo,
> >> any opinions? Maybe we should pick up uapi/linux/mempolicy.h with the
> >> linux-header synch script?
> >>
> > 
> > I think using libnuma is fine.  In principle this could be used on other
> > OSes than Linux, I think?
> 
> But seems that mbind(2) is Linux-specific syscall, right?
> 

You would need to avoid directly calling mbind, i.e. use libnuma for all
numa related calls. Then, if libnuma were to support more OSes, qemu would
automatically (wrt to numa) as well. Your mbind() with libnuma would look
like this

numa_set_bind_policy(strict)
numa_tonodemask_memory(addr, size, nodemask)

The problem is that set_bind_policy only takes a bool, and thus only
allows two of the four possibly policies

MPOL_BIND        strict == 1
MPOL_PREFERRED   strict == 0

So, due to libnuma's policy setting limitations, and the fact it doesn't
currently support more OSes than Linux, then I prefer your current
series version that drops libnuma. If qemu will need to support NUMA on
another OS, then we can cross this bridge when we get there.

drew
Andrew Jones Aug. 29, 2013, 8:31 a.m. UTC | #9
----- Original Message -----
> 
> 
> ----- Original Message -----
> > On 08/28/2013 09:44 PM, Paolo Bonzini wrote:
> > > Il 26/08/2013 10:43, Andrew Jones ha scritto:
> > >>
> > >> ----- Original Message -----
> > >>>> On 08/26/2013 03:46 PM, Andrew Jones wrote:
> > >>>>>>>>>> Is this patch still necessary? I thought that dropping the
> > >>>>>>>>>>>>>> numa_num_configured_nodes() calls from patch 8/12 got rid
> > >>>>>>>>>>>>>> of the need for this library. Maybe I missed other uses?
> > >>>>>>>>>>
> > >>>>>>>>>> Yes, in 08/12 we also use mbind(),
> > >>>>>> You don't need a whole library for mbind(), it's a syscall. See
> > >>>>>> syscall(2).
> > >>>>>>
> > >>>>>>>>>> and in 09/12 we use max_numa_node().
> > >>>>>> Really? I didn't see it there. And anyway, that goes back to our
> > >>>>>> discussion
> > >>>>>> about setting qemu's MAX_NODES to whatever we think qemu should
> > >>>>>> support,
> > >>>>>> and then just checking that we don't blow that limit whenever
> > >>>>>> reading
> > >>>>>> host node info, i.e.
> > >>>>>>
> > >>>>>> maxnode = 0;
> > >>>>>> while (host_nodes[maxnode] && maxnode < MAX_NODES)
> > >>>>>>   node_read(&info[maxnode++]);
> > >>>>>>
> > >>>>>> type of a thing.
> > >>>>>>
> > >>>>>> And, if there's a place you really need to know the current online
> > >>>>>> number
> > >>>>>> of host nodes, then, like I said earlier, you should just go to
> > >>>>>> sysfs
> > >>>>>> yourself. libnuma:numa_max_node() returns an int that it only
> > >>>>>> initializes
> > >>>>>> at library load time, so it's not going to adapt to
> > >>>>>> onlining/offlining.
> > >>>>
> > >>>> OK, thank you.
> > >>>> Then I should define MPOL_* macros in QEMU and use mbind(2) syscall
> > >>>> directly,
> > >>>> right?
> > >> Hmm, yeah, that's too bad that numaif.h is part of libnuma, and not a
> > >> more
> > >> general lib. Whether or not we want to redefine those symbols within
> > >> qemu, in order to avoid the dependency on installing numactl-devel,
> > >> isn't
> > >> something I can answer. That's a better question for Anthony. Anthony?
> > >> Paolo,
> > >> any opinions? Maybe we should pick up uapi/linux/mempolicy.h with the
> > >> linux-header synch script?
> > >>
> > > 
> > > I think using libnuma is fine.  In principle this could be used on other
> > > OSes than Linux, I think?
> > 
> > But seems that mbind(2) is Linux-specific syscall, right?
> > 
> 
> You would need to avoid directly calling mbind, i.e. use libnuma for all
> numa related calls. Then, if libnuma were to support more OSes, qemu would
> automatically (wrt to numa) as well. Your mbind() with libnuma would look
> like this
> 
> numa_set_bind_policy(strict)
> numa_tonodemask_memory(addr, size, nodemask)
> 
> The problem is that set_bind_policy only takes a bool, and thus only
> allows two of the four possibly policies
> 
> MPOL_BIND        strict == 1
> MPOL_PREFERRED   strict == 0
> 

Ah, there is a way to get interleave policy

if (policy == interleave) {
   numa_interleave_memory(addr, size, nodemask)
} else {
   numa_set_bind_policy(strict)
   numa_tonodemask_memory(addr, size, nodemask)
}

a bit clunky. And I still don't see a way to select MPOL_DEFAULT, nor a way to
use any additional flags, such as MPOL_F_RELATIVE_NODES.


> So, due to libnuma's policy setting limitations, and the fact it doesn't
> currently support more OSes than Linux, then I prefer your current
> series version that drops libnuma. If qemu will need to support NUMA on
> another OS, then we can cross this bridge when we get there.
diff mbox

Patch

diff --git a/configure b/configure
index 18fa608..b82e89a 100755
--- a/configure
+++ b/configure
@@ -243,6 +243,7 @@  gtk=""
 gtkabi="2.0"
 tpm="no"
 libssh2=""
+numa=""
 
 # parse CC options first
 for opt do
@@ -945,6 +946,10 @@  for opt do
   ;;
   --enable-libssh2) libssh2="yes"
   ;;
+  --disable-numa) numa="no"
+  ;;
+  --enable-numa) numa="yes"
+  ;;
   *) echo "ERROR: unknown option $opt"; show_help="yes"
   ;;
   esac
@@ -1159,6 +1164,8 @@  echo "  --gcov=GCOV              use specified gcov [$gcov_tool]"
 echo "  --enable-tpm             enable TPM support"
 echo "  --disable-libssh2        disable ssh block device support"
 echo "  --enable-libssh2         enable ssh block device support"
+echo "  --disable-numa           disable libnuma support"
+echo "  --enable-numa            enable libnuma support"
 echo ""
 echo "NOTE: The object files are built at the place where configure is launched"
 exit 1
@@ -2412,6 +2419,27 @@  EOF
 fi
 
 ##########################################
+# libnuma probe
+
+if test "$numa" != "no" ; then
+  numa=no
+  cat > $TMPC << EOF
+#include <numa.h>
+int main(void) { return numa_available(); }
+EOF
+
+  if compile_prog "" "-lnuma" ; then
+    numa=yes
+    libs_softmmu="-lnuma $libs_softmmu"
+  else
+    if test "$numa" = "yes" ; then
+      feature_not_found "linux NUMA (install numactl?)"
+    fi
+    numa=no
+  fi
+fi
+
+##########################################
 # linux-aio probe
 
 if test "$linux_aio" != "no" ; then
@@ -3613,6 +3641,7 @@  echo "TPM support       $tpm"
 echo "libssh2 support   $libssh2"
 echo "TPM passthrough   $tpm_passthrough"
 echo "QOM debugging     $qom_cast_debug"
+echo "NUMA host support $numa"
 
 if test "$sdl_too_old" = "yes"; then
 echo "-> Your SDL version is too old - please upgrade to have SDL support"
@@ -3646,6 +3675,9 @@  echo "extra_cflags=$EXTRA_CFLAGS" >> $config_host_mak
 echo "extra_ldflags=$EXTRA_LDFLAGS" >> $config_host_mak
 echo "qemu_localedir=$qemu_localedir" >> $config_host_mak
 echo "libs_softmmu=$libs_softmmu" >> $config_host_mak
+if test "$numa" = "yes"; then
+  echo "CONFIG_NUMA=y" >> $config_host_mak
+fi
 
 echo "ARCH=$ARCH" >> $config_host_mak