Message ID | 1377231003-2816-7-git-send-email-gaowanlong@cn.fujitsu.com |
---|---|
State | New |
Headers | show |
----- Original Message ----- > Add detection of libnuma (mostly contained in the numactl package) > to the configure script. Can be enabled or disabled on the command line, > default is use if available. > > Signed-off-by: Andre Przywara <andre.przywara@amd.com> > Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com> Is this patch still necessary? I thought that dropping the numa_num_configured_nodes() calls from patch 8/12 got rid of the need for this library. Maybe I missed other uses? drew
On 08/23/2013 04:40 PM, Andrew Jones wrote: > > > ----- Original Message ----- >> Add detection of libnuma (mostly contained in the numactl package) >> to the configure script. Can be enabled or disabled on the command line, >> default is use if available. >> >> Signed-off-by: Andre Przywara <andre.przywara@amd.com> >> Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com> > > Is this patch still necessary? I thought that dropping the > numa_num_configured_nodes() calls from patch 8/12 got rid > of the need for this library. Maybe I missed other uses? Yes, in 08/12 we also use mbind(), and in 09/12 we use max_numa_node(). Thanks, Wanlong Gao > > drew >
----- Original Message ----- > On 08/23/2013 04:40 PM, Andrew Jones wrote: > > > > > > ----- Original Message ----- > >> Add detection of libnuma (mostly contained in the numactl package) > >> to the configure script. Can be enabled or disabled on the command line, > >> default is use if available. > >> > >> Signed-off-by: Andre Przywara <andre.przywara@amd.com> > >> Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com> > > > > Is this patch still necessary? I thought that dropping the > > numa_num_configured_nodes() calls from patch 8/12 got rid > > of the need for this library. Maybe I missed other uses? > > Yes, in 08/12 we also use mbind(), You don't need a whole library for mbind(), it's a syscall. See syscall(2). > and in 09/12 we use max_numa_node(). Really? I didn't see it there. And anyway, that goes back to our discussion about setting qemu's MAX_NODES to whatever we think qemu should support, and then just checking that we don't blow that limit whenever reading host node info, i.e. maxnode = 0; while (host_nodes[maxnode] && maxnode < MAX_NODES) node_read(&info[maxnode++]); type of a thing. And, if there's a place you really need to know the current online number of host nodes, then, like I said earlier, you should just go to sysfs yourself. libnuma:numa_max_node() returns an int that it only initializes at library load time, so it's not going to adapt to onlining/offlining. drew > > Thanks, > Wanlong Gao > > > > > drew > > > > >
On 08/26/2013 03:46 PM, Andrew Jones wrote: >>> Is this patch still necessary? I thought that dropping the >>> > > numa_num_configured_nodes() calls from patch 8/12 got rid >>> > > of the need for this library. Maybe I missed other uses? >> > >> > Yes, in 08/12 we also use mbind(), > You don't need a whole library for mbind(), it's a syscall. See syscall(2). > >> > and in 09/12 we use max_numa_node(). > Really? I didn't see it there. And anyway, that goes back to our discussion > about setting qemu's MAX_NODES to whatever we think qemu should support, > and then just checking that we don't blow that limit whenever reading > host node info, i.e. > > maxnode = 0; > while (host_nodes[maxnode] && maxnode < MAX_NODES) > node_read(&info[maxnode++]); > > type of a thing. > > And, if there's a place you really need to know the current online number > of host nodes, then, like I said earlier, you should just go to sysfs > yourself. libnuma:numa_max_node() returns an int that it only initializes > at library load time, so it's not going to adapt to onlining/offlining. OK, thank you. Then I should define MPOL_* macros in QEMU and use mbind(2) syscall directly, right? Thanks, Wanlong Gao > > drew >
----- Original Message ----- > On 08/26/2013 03:46 PM, Andrew Jones wrote: > >>> Is this patch still necessary? I thought that dropping the > >>> > > numa_num_configured_nodes() calls from patch 8/12 got rid > >>> > > of the need for this library. Maybe I missed other uses? > >> > > >> > Yes, in 08/12 we also use mbind(), > > You don't need a whole library for mbind(), it's a syscall. See syscall(2). > > > >> > and in 09/12 we use max_numa_node(). > > Really? I didn't see it there. And anyway, that goes back to our discussion > > about setting qemu's MAX_NODES to whatever we think qemu should support, > > and then just checking that we don't blow that limit whenever reading > > host node info, i.e. > > > > maxnode = 0; > > while (host_nodes[maxnode] && maxnode < MAX_NODES) > > node_read(&info[maxnode++]); > > > > type of a thing. > > > > And, if there's a place you really need to know the current online number > > of host nodes, then, like I said earlier, you should just go to sysfs > > yourself. libnuma:numa_max_node() returns an int that it only initializes > > at library load time, so it's not going to adapt to onlining/offlining. > > OK, thank you. > Then I should define MPOL_* macros in QEMU and use mbind(2) syscall directly, > right? Hmm, yeah, that's too bad that numaif.h is part of libnuma, and not a more general lib. Whether or not we want to redefine those symbols within qemu, in order to avoid the dependency on installing numactl-devel, isn't something I can answer. That's a better question for Anthony. Anthony? Paolo, any opinions? Maybe we should pick up uapi/linux/mempolicy.h with the linux-header synch script? thanks, drew > > Thanks, > Wanlong Gao > > > > > drew > > > >
Il 26/08/2013 10:43, Andrew Jones ha scritto: > > ----- Original Message ----- >> > On 08/26/2013 03:46 PM, Andrew Jones wrote: >>>>> > >>> Is this patch still necessary? I thought that dropping the >>>>>>> > >>> > > numa_num_configured_nodes() calls from patch 8/12 got rid >>>>>>> > >>> > > of the need for this library. Maybe I missed other uses? >>>>> > >> > >>>>> > >> > Yes, in 08/12 we also use mbind(), >>> > > You don't need a whole library for mbind(), it's a syscall. See syscall(2). >>> > > >>>>> > >> > and in 09/12 we use max_numa_node(). >>> > > Really? I didn't see it there. And anyway, that goes back to our discussion >>> > > about setting qemu's MAX_NODES to whatever we think qemu should support, >>> > > and then just checking that we don't blow that limit whenever reading >>> > > host node info, i.e. >>> > > >>> > > maxnode = 0; >>> > > while (host_nodes[maxnode] && maxnode < MAX_NODES) >>> > > node_read(&info[maxnode++]); >>> > > >>> > > type of a thing. >>> > > >>> > > And, if there's a place you really need to know the current online number >>> > > of host nodes, then, like I said earlier, you should just go to sysfs >>> > > yourself. libnuma:numa_max_node() returns an int that it only initializes >>> > > at library load time, so it's not going to adapt to onlining/offlining. >> > >> > OK, thank you. >> > Then I should define MPOL_* macros in QEMU and use mbind(2) syscall directly, >> > right? > Hmm, yeah, that's too bad that numaif.h is part of libnuma, and not a more > general lib. Whether or not we want to redefine those symbols within > qemu, in order to avoid the dependency on installing numactl-devel, isn't > something I can answer. That's a better question for Anthony. Anthony? Paolo, > any opinions? Maybe we should pick up uapi/linux/mempolicy.h with the > linux-header synch script? > I think using libnuma is fine. In principle this could be used on other OSes than Linux, I think? Paolo
On 08/28/2013 09:44 PM, Paolo Bonzini wrote: > Il 26/08/2013 10:43, Andrew Jones ha scritto: >> >> ----- Original Message ----- >>>> On 08/26/2013 03:46 PM, Andrew Jones wrote: >>>>>>>>>> Is this patch still necessary? I thought that dropping the >>>>>>>>>>>>>> numa_num_configured_nodes() calls from patch 8/12 got rid >>>>>>>>>>>>>> of the need for this library. Maybe I missed other uses? >>>>>>>>>> >>>>>>>>>> Yes, in 08/12 we also use mbind(), >>>>>> You don't need a whole library for mbind(), it's a syscall. See syscall(2). >>>>>> >>>>>>>>>> and in 09/12 we use max_numa_node(). >>>>>> Really? I didn't see it there. And anyway, that goes back to our discussion >>>>>> about setting qemu's MAX_NODES to whatever we think qemu should support, >>>>>> and then just checking that we don't blow that limit whenever reading >>>>>> host node info, i.e. >>>>>> >>>>>> maxnode = 0; >>>>>> while (host_nodes[maxnode] && maxnode < MAX_NODES) >>>>>> node_read(&info[maxnode++]); >>>>>> >>>>>> type of a thing. >>>>>> >>>>>> And, if there's a place you really need to know the current online number >>>>>> of host nodes, then, like I said earlier, you should just go to sysfs >>>>>> yourself. libnuma:numa_max_node() returns an int that it only initializes >>>>>> at library load time, so it's not going to adapt to onlining/offlining. >>>> >>>> OK, thank you. >>>> Then I should define MPOL_* macros in QEMU and use mbind(2) syscall directly, >>>> right? >> Hmm, yeah, that's too bad that numaif.h is part of libnuma, and not a more >> general lib. Whether or not we want to redefine those symbols within >> qemu, in order to avoid the dependency on installing numactl-devel, isn't >> something I can answer. That's a better question for Anthony. Anthony? Paolo, >> any opinions? Maybe we should pick up uapi/linux/mempolicy.h with the >> linux-header synch script? >> > > I think using libnuma is fine. In principle this could be used on other > OSes than Linux, I think? But seems that mbind(2) is Linux-specific syscall, right? Thanks, Wanlong Gao > > Paolo >
----- Original Message ----- > On 08/28/2013 09:44 PM, Paolo Bonzini wrote: > > Il 26/08/2013 10:43, Andrew Jones ha scritto: > >> > >> ----- Original Message ----- > >>>> On 08/26/2013 03:46 PM, Andrew Jones wrote: > >>>>>>>>>> Is this patch still necessary? I thought that dropping the > >>>>>>>>>>>>>> numa_num_configured_nodes() calls from patch 8/12 got rid > >>>>>>>>>>>>>> of the need for this library. Maybe I missed other uses? > >>>>>>>>>> > >>>>>>>>>> Yes, in 08/12 we also use mbind(), > >>>>>> You don't need a whole library for mbind(), it's a syscall. See > >>>>>> syscall(2). > >>>>>> > >>>>>>>>>> and in 09/12 we use max_numa_node(). > >>>>>> Really? I didn't see it there. And anyway, that goes back to our > >>>>>> discussion > >>>>>> about setting qemu's MAX_NODES to whatever we think qemu should > >>>>>> support, > >>>>>> and then just checking that we don't blow that limit whenever reading > >>>>>> host node info, i.e. > >>>>>> > >>>>>> maxnode = 0; > >>>>>> while (host_nodes[maxnode] && maxnode < MAX_NODES) > >>>>>> node_read(&info[maxnode++]); > >>>>>> > >>>>>> type of a thing. > >>>>>> > >>>>>> And, if there's a place you really need to know the current online > >>>>>> number > >>>>>> of host nodes, then, like I said earlier, you should just go to sysfs > >>>>>> yourself. libnuma:numa_max_node() returns an int that it only > >>>>>> initializes > >>>>>> at library load time, so it's not going to adapt to > >>>>>> onlining/offlining. > >>>> > >>>> OK, thank you. > >>>> Then I should define MPOL_* macros in QEMU and use mbind(2) syscall > >>>> directly, > >>>> right? > >> Hmm, yeah, that's too bad that numaif.h is part of libnuma, and not a more > >> general lib. Whether or not we want to redefine those symbols within > >> qemu, in order to avoid the dependency on installing numactl-devel, isn't > >> something I can answer. That's a better question for Anthony. Anthony? > >> Paolo, > >> any opinions? Maybe we should pick up uapi/linux/mempolicy.h with the > >> linux-header synch script? > >> > > > > I think using libnuma is fine. In principle this could be used on other > > OSes than Linux, I think? > > But seems that mbind(2) is Linux-specific syscall, right? > You would need to avoid directly calling mbind, i.e. use libnuma for all numa related calls. Then, if libnuma were to support more OSes, qemu would automatically (wrt to numa) as well. Your mbind() with libnuma would look like this numa_set_bind_policy(strict) numa_tonodemask_memory(addr, size, nodemask) The problem is that set_bind_policy only takes a bool, and thus only allows two of the four possibly policies MPOL_BIND strict == 1 MPOL_PREFERRED strict == 0 So, due to libnuma's policy setting limitations, and the fact it doesn't currently support more OSes than Linux, then I prefer your current series version that drops libnuma. If qemu will need to support NUMA on another OS, then we can cross this bridge when we get there. drew
----- Original Message ----- > > > ----- Original Message ----- > > On 08/28/2013 09:44 PM, Paolo Bonzini wrote: > > > Il 26/08/2013 10:43, Andrew Jones ha scritto: > > >> > > >> ----- Original Message ----- > > >>>> On 08/26/2013 03:46 PM, Andrew Jones wrote: > > >>>>>>>>>> Is this patch still necessary? I thought that dropping the > > >>>>>>>>>>>>>> numa_num_configured_nodes() calls from patch 8/12 got rid > > >>>>>>>>>>>>>> of the need for this library. Maybe I missed other uses? > > >>>>>>>>>> > > >>>>>>>>>> Yes, in 08/12 we also use mbind(), > > >>>>>> You don't need a whole library for mbind(), it's a syscall. See > > >>>>>> syscall(2). > > >>>>>> > > >>>>>>>>>> and in 09/12 we use max_numa_node(). > > >>>>>> Really? I didn't see it there. And anyway, that goes back to our > > >>>>>> discussion > > >>>>>> about setting qemu's MAX_NODES to whatever we think qemu should > > >>>>>> support, > > >>>>>> and then just checking that we don't blow that limit whenever > > >>>>>> reading > > >>>>>> host node info, i.e. > > >>>>>> > > >>>>>> maxnode = 0; > > >>>>>> while (host_nodes[maxnode] && maxnode < MAX_NODES) > > >>>>>> node_read(&info[maxnode++]); > > >>>>>> > > >>>>>> type of a thing. > > >>>>>> > > >>>>>> And, if there's a place you really need to know the current online > > >>>>>> number > > >>>>>> of host nodes, then, like I said earlier, you should just go to > > >>>>>> sysfs > > >>>>>> yourself. libnuma:numa_max_node() returns an int that it only > > >>>>>> initializes > > >>>>>> at library load time, so it's not going to adapt to > > >>>>>> onlining/offlining. > > >>>> > > >>>> OK, thank you. > > >>>> Then I should define MPOL_* macros in QEMU and use mbind(2) syscall > > >>>> directly, > > >>>> right? > > >> Hmm, yeah, that's too bad that numaif.h is part of libnuma, and not a > > >> more > > >> general lib. Whether or not we want to redefine those symbols within > > >> qemu, in order to avoid the dependency on installing numactl-devel, > > >> isn't > > >> something I can answer. That's a better question for Anthony. Anthony? > > >> Paolo, > > >> any opinions? Maybe we should pick up uapi/linux/mempolicy.h with the > > >> linux-header synch script? > > >> > > > > > > I think using libnuma is fine. In principle this could be used on other > > > OSes than Linux, I think? > > > > But seems that mbind(2) is Linux-specific syscall, right? > > > > You would need to avoid directly calling mbind, i.e. use libnuma for all > numa related calls. Then, if libnuma were to support more OSes, qemu would > automatically (wrt to numa) as well. Your mbind() with libnuma would look > like this > > numa_set_bind_policy(strict) > numa_tonodemask_memory(addr, size, nodemask) > > The problem is that set_bind_policy only takes a bool, and thus only > allows two of the four possibly policies > > MPOL_BIND strict == 1 > MPOL_PREFERRED strict == 0 > Ah, there is a way to get interleave policy if (policy == interleave) { numa_interleave_memory(addr, size, nodemask) } else { numa_set_bind_policy(strict) numa_tonodemask_memory(addr, size, nodemask) } a bit clunky. And I still don't see a way to select MPOL_DEFAULT, nor a way to use any additional flags, such as MPOL_F_RELATIVE_NODES. > So, due to libnuma's policy setting limitations, and the fact it doesn't > currently support more OSes than Linux, then I prefer your current > series version that drops libnuma. If qemu will need to support NUMA on > another OS, then we can cross this bridge when we get there.
diff --git a/configure b/configure index 18fa608..b82e89a 100755 --- a/configure +++ b/configure @@ -243,6 +243,7 @@ gtk="" gtkabi="2.0" tpm="no" libssh2="" +numa="" # parse CC options first for opt do @@ -945,6 +946,10 @@ for opt do ;; --enable-libssh2) libssh2="yes" ;; + --disable-numa) numa="no" + ;; + --enable-numa) numa="yes" + ;; *) echo "ERROR: unknown option $opt"; show_help="yes" ;; esac @@ -1159,6 +1164,8 @@ echo " --gcov=GCOV use specified gcov [$gcov_tool]" echo " --enable-tpm enable TPM support" echo " --disable-libssh2 disable ssh block device support" echo " --enable-libssh2 enable ssh block device support" +echo " --disable-numa disable libnuma support" +echo " --enable-numa enable libnuma support" echo "" echo "NOTE: The object files are built at the place where configure is launched" exit 1 @@ -2412,6 +2419,27 @@ EOF fi ########################################## +# libnuma probe + +if test "$numa" != "no" ; then + numa=no + cat > $TMPC << EOF +#include <numa.h> +int main(void) { return numa_available(); } +EOF + + if compile_prog "" "-lnuma" ; then + numa=yes + libs_softmmu="-lnuma $libs_softmmu" + else + if test "$numa" = "yes" ; then + feature_not_found "linux NUMA (install numactl?)" + fi + numa=no + fi +fi + +########################################## # linux-aio probe if test "$linux_aio" != "no" ; then @@ -3613,6 +3641,7 @@ echo "TPM support $tpm" echo "libssh2 support $libssh2" echo "TPM passthrough $tpm_passthrough" echo "QOM debugging $qom_cast_debug" +echo "NUMA host support $numa" if test "$sdl_too_old" = "yes"; then echo "-> Your SDL version is too old - please upgrade to have SDL support" @@ -3646,6 +3675,9 @@ echo "extra_cflags=$EXTRA_CFLAGS" >> $config_host_mak echo "extra_ldflags=$EXTRA_LDFLAGS" >> $config_host_mak echo "qemu_localedir=$qemu_localedir" >> $config_host_mak echo "libs_softmmu=$libs_softmmu" >> $config_host_mak +if test "$numa" = "yes"; then + echo "CONFIG_NUMA=y" >> $config_host_mak +fi echo "ARCH=$ARCH" >> $config_host_mak