diff mbox series

[v2,3/7] util: Introduce ThreadContext user-creatable object

Message ID 20221010091117.88603-4-david@redhat.com
State New
Headers show
Series hostmem: NUMA-aware memory preallocation using ThreadContext | expand

Commit Message

David Hildenbrand Oct. 10, 2022, 9:11 a.m. UTC
Setting the CPU affinity of QEMU threads is a bit problematic, because
QEMU doesn't always have permissions to set the CPU affinity itself,
for example, with seccomp after initialized by QEMU:
    -sandbox enable=on,resourcecontrol=deny

General information about CPU affinities can be found in the man page of
taskset:
    CPU affinity is a scheduler property that "bonds" a process to a given
    set of CPUs on the system. The Linux scheduler will honor the given CPU
    affinity and the process will not run on any other CPUs.

While upper layers are already aware of how to handle CPU affinities for
long-lived threads like iothreads or vcpu threads, especially short-lived
threads, as used for memory-backend preallocation, are more involved to
handle. These threads are created on demand and upper layers are not even
able to identify and configure them.

Introduce the concept of a ThreadContext, that is essentially a thread
used for creating new threads. All threads created via that context
thread inherit the configured CPU affinity. Consequently, it's
sufficient to create a ThreadContext and configure it once, and have all
threads created via that ThreadContext inherit the same CPU affinity.

The CPU affinity of a ThreadContext can be configured two ways:

(1) Obtaining the thread id via the "thread-id" property and setting the
    CPU affinity manually.

(2) Setting the "cpu-affinity" property and letting QEMU try set the
    CPU affinity itself. This will fail if QEMU doesn't have permissions
    to do so anymore after seccomp was initialized.

A simple QEMU example to set the CPU affinity to CPU 0,1,6,7 would be:
    qemu-system-x86_64 -S \
      -object thread-context,id=tc1,cpu-affinity=0-1,cpu-affinity=6-7

And we can query it via HMP/QMP:
    (qemu) qom-get tc1 cpu-affinity
    [
        0,
        1,
        6,
        7
    ]

But note that due to dynamic library loading this example will not work
before we actually make use of thread_context_create_thread() in QEMU
code, because the type will otherwise not get registered.

A ThreadContext can be reused, simply by reconfiguring the CPU affinity.

Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 include/qemu/thread-context.h |  57 +++++++
 qapi/qom.json                 |  17 +++
 util/meson.build              |   1 +
 util/oslib-posix.c            |   1 +
 util/thread-context.c         | 278 ++++++++++++++++++++++++++++++++++
 5 files changed, 354 insertions(+)
 create mode 100644 include/qemu/thread-context.h
 create mode 100644 util/thread-context.c

Comments

Markus Armbruster Oct. 11, 2022, 5:47 a.m. UTC | #1
David Hildenbrand <david@redhat.com> writes:

> Setting the CPU affinity of QEMU threads is a bit problematic, because
> QEMU doesn't always have permissions to set the CPU affinity itself,
> for example, with seccomp after initialized by QEMU:
>     -sandbox enable=on,resourcecontrol=deny
>
> General information about CPU affinities can be found in the man page of
> taskset:
>     CPU affinity is a scheduler property that "bonds" a process to a given
>     set of CPUs on the system. The Linux scheduler will honor the given CPU
>     affinity and the process will not run on any other CPUs.
>
> While upper layers are already aware of how to handle CPU affinities for
> long-lived threads like iothreads or vcpu threads, especially short-lived
> threads, as used for memory-backend preallocation, are more involved to
> handle. These threads are created on demand and upper layers are not even
> able to identify and configure them.
>
> Introduce the concept of a ThreadContext, that is essentially a thread
> used for creating new threads. All threads created via that context
> thread inherit the configured CPU affinity. Consequently, it's
> sufficient to create a ThreadContext and configure it once, and have all
> threads created via that ThreadContext inherit the same CPU affinity.
>
> The CPU affinity of a ThreadContext can be configured two ways:
>
> (1) Obtaining the thread id via the "thread-id" property and setting the
>     CPU affinity manually.
>
> (2) Setting the "cpu-affinity" property and letting QEMU try set the
>     CPU affinity itself. This will fail if QEMU doesn't have permissions
>     to do so anymore after seccomp was initialized.
>
> A simple QEMU example to set the CPU affinity to CPU 0,1,6,7 would be:
>     qemu-system-x86_64 -S \
>       -object thread-context,id=tc1,cpu-affinity=0-1,cpu-affinity=6-7
>
> And we can query it via HMP/QMP:
>     (qemu) qom-get tc1 cpu-affinity
>     [
>         0,
>         1,
>         6,
>         7
>     ]
>
> But note that due to dynamic library loading this example will not work
> before we actually make use of thread_context_create_thread() in QEMU
> code, because the type will otherwise not get registered.

What do you mean exactly by "not work"?  It's not "CLI option or HMP
command fails":

    $ upstream-qemu -S -display none -nodefaults -monitor stdio -object thread-context,id=tc1,cpu-affinity=0-1,cpu-affinity=6-7
    QEMU 7.1.50 monitor - type 'help' for more information
    (qemu) qom-get tc1 cpu-affinity
    [
        0,
        1,
        6,
        7
    ]
    (qemu) info cpus
    * CPU #0: thread_id=1670613

Even though the affinities refer to nonexistent CPUs :)

> A ThreadContext can be reused, simply by reconfiguring the CPU affinity.

So, when a thread is created, its affinity comes from its thread context
(if any).  When I later change the context's affinity, it does *not*
affect existing threads, only future ones.  Correct?

> Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  include/qemu/thread-context.h |  57 +++++++
>  qapi/qom.json                 |  17 +++
>  util/meson.build              |   1 +
>  util/oslib-posix.c            |   1 +
>  util/thread-context.c         | 278 ++++++++++++++++++++++++++++++++++
>  5 files changed, 354 insertions(+)
>  create mode 100644 include/qemu/thread-context.h
>  create mode 100644 util/thread-context.c
>
> diff --git a/include/qemu/thread-context.h b/include/qemu/thread-context.h
> new file mode 100644
> index 0000000000..2ebd6b7fe1
> --- /dev/null
> +++ b/include/qemu/thread-context.h
> @@ -0,0 +1,57 @@
> +/*
> + * QEMU Thread Context
> + *
> + * Copyright Red Hat Inc., 2022
> + *
> + * Authors:
> + *  David Hildenbrand <david@redhat.com>
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + */
> +
> +#ifndef SYSEMU_THREAD_CONTEXT_H
> +#define SYSEMU_THREAD_CONTEXT_H
> +
> +#include "qapi/qapi-types-machine.h"
> +#include "qemu/thread.h"
> +#include "qom/object.h"
> +
> +#define TYPE_THREAD_CONTEXT "thread-context"
> +OBJECT_DECLARE_TYPE(ThreadContext, ThreadContextClass,
> +                    THREAD_CONTEXT)
> +
> +struct ThreadContextClass {
> +    ObjectClass parent_class;
> +};
> +
> +struct ThreadContext {
> +    /* private */
> +    Object parent;
> +
> +    /* private */
> +    unsigned int thread_id;
> +    QemuThread thread;
> +
> +    /* Semaphore to wait for context thread action. */
> +    QemuSemaphore sem;
> +    /* Semaphore to wait for action in context thread. */
> +    QemuSemaphore sem_thread;
> +    /* Mutex to synchronize requests. */
> +    QemuMutex mutex;
> +
> +    /* Commands for the thread to execute. */
> +    int thread_cmd;
> +    void *thread_cmd_data;
> +
> +    /* CPU affinity bitmap used for initialization. */
> +    unsigned long *init_cpu_bitmap;
> +    int init_cpu_nbits;
> +};
> +
> +void thread_context_create_thread(ThreadContext *tc, QemuThread *thread,
> +                                  const char *name,
> +                                  void *(*start_routine)(void *), void *arg,
> +                                  int mode);
> +
> +#endif /* SYSEMU_THREAD_CONTEXT_H */
> diff --git a/qapi/qom.json b/qapi/qom.json
> index 80dd419b39..67d47f4051 100644
> --- a/qapi/qom.json
> +++ b/qapi/qom.json
> @@ -830,6 +830,21 @@
>              'reduced-phys-bits': 'uint32',
>              '*kernel-hashes': 'bool' } }
>  
> +##
> +# @ThreadContextProperties:
> +#
> +# Properties for thread context objects.
> +#
> +# @cpu-affinity: the list of CPU numbers used as CPU affinity for all threads
> +#                created in the thread context (default: QEMU main thread
> +#                affinity)

Another ignorant question: is the QEMU main thread affinity fixed or
configurable?  If configurable, how?

> +#
> +# Since: 7.2
> +##
> +{ 'struct': 'ThreadContextProperties',
> +  'data': { '*cpu-affinity': ['uint16'] } }
> +
> +
>  ##
>  # @ObjectType:
>  #
> @@ -882,6 +897,7 @@
>      { 'name': 'secret_keyring',
>        'if': 'CONFIG_SECRET_KEYRING' },
>      'sev-guest',
> +    'thread-context',
>      's390-pv-guest',
>      'throttle-group',
>      'tls-creds-anon',
> @@ -948,6 +964,7 @@
>        'secret_keyring':             { 'type': 'SecretKeyringProperties',
>                                        'if': 'CONFIG_SECRET_KEYRING' },
>        'sev-guest':                  'SevGuestProperties',
> +      'thread-context':             'ThreadContextProperties',
>        'throttle-group':             'ThrottleGroupProperties',
>        'tls-creds-anon':             'TlsCredsAnonProperties',
>        'tls-creds-psk':              'TlsCredsPskProperties',

[...]
David Hildenbrand Oct. 11, 2022, 7:53 a.m. UTC | #2
>> But note that due to dynamic library loading this example will not work
>> before we actually make use of thread_context_create_thread() in QEMU
>> code, because the type will otherwise not get registered.
> 
> What do you mean exactly by "not work"?  It's not "CLI option or HMP
> command fails":
> 

For me, if I compile patch #1-#3 only, I get:

$ ./build/qemu-system-x86_64 -S -display none -nodefaults -monitor stdio 
-object thread-context,id=tc1,cpu-affinity=0-1,cpu-affinity=6-7
qemu-system-x86_64: invalid object type: thread-context


Reason is that, without a call to thread_context_create_thread(), we 
won't trigger type_init(thread_context_register_types) and consequently, 
the type won't be registered.

Is it really different in your environment? Maybe it depends on the QEMU 
config?

>      $ upstream-qemu -S -display none -nodefaults -monitor stdio -object thread-context,id=tc1,cpu-affinity=0-1,cpu-affinity=6-7
>      QEMU 7.1.50 monitor - type 'help' for more information
>      (qemu) qom-get tc1 cpu-affinity
>      [
>          0,
>          1,
>          6,
>          7
>      ]
>      (qemu) info cpus
>      * CPU #0: thread_id=1670613
> 
> Even though the affinities refer to nonexistent CPUs :)

CPU affinities are CPU numbers on your system (host), not QEMU vCPU 
numbers. I could talk about physical CPU numbers in the doc here, 
although I am not sure if that really helps. What about "host CPU 
numbers" and in patch #4 "host node numbers"?

Seems to match what we document for @MemoryBackendProperties: 
"@host-nodes: the list of NUMA host nodes to bind the memory to"



But unrelated to that, pthread_setaffinity_np() won't bail out on CPUs 
that are currently not available in the host -- because one might 
online/hotplug them later. It only bails out if none of the CPUs is 
currently available in the host:

https://man7.org/linux/man-pages/man3/pthread_setaffinity_np.3.html


        EINVAL (pthread_setaffinity_np()) The affinity bit mask mask
               contains no processors that are currently physically on
               the system and permitted to the thread according to any
               restrictions that may be imposed by the "cpuset" mechanism
               described in cpuset(7).

It will bail out on CPUs that cannot be available in the host though, 
because it's impossible due to the kernel config:


        EINVAL (pthread_setaffinity_np()) cpuset specified a CPU that was
               outside the set supported by the kernel.  (The kernel
               configuration option CONFIG_NR_CPUS defines the range of
               the set supported by the kernel data type used to
               represent CPU sets.)


> 
>> A ThreadContext can be reused, simply by reconfiguring the CPU affinity.
> 
> So, when a thread is created, its affinity comes from its thread context
> (if any).  When I later change the context's affinity, it does *not*
> affect existing threads, only future ones.  Correct?

Yes, that's the current state.

> 
>> Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>> ---
>>   include/qemu/thread-context.h |  57 +++++++
>>   qapi/qom.json                 |  17 +++
>>   util/meson.build              |   1 +
>>   util/oslib-posix.c            |   1 +
>>   util/thread-context.c         | 278 ++++++++++++++++++++++++++++++++++
>>   5 files changed, 354 insertions(+)
>>   create mode 100644 include/qemu/thread-context.h
>>   create mode 100644 util/thread-context.c
>>
>> diff --git a/include/qemu/thread-context.h b/include/qemu/thread-context.h
>> new file mode 100644
>> index 0000000000..2ebd6b7fe1
>> --- /dev/null
>> +++ b/include/qemu/thread-context.h
>> @@ -0,0 +1,57 @@
>> +/*
>> + * QEMU Thread Context
>> + *
>> + * Copyright Red Hat Inc., 2022
>> + *
>> + * Authors:
>> + *  David Hildenbrand <david@redhat.com>
>> + *
>> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
>> + * See the COPYING file in the top-level directory.
>> + */
>> +
>> +#ifndef SYSEMU_THREAD_CONTEXT_H
>> +#define SYSEMU_THREAD_CONTEXT_H
>> +
>> +#include "qapi/qapi-types-machine.h"
>> +#include "qemu/thread.h"
>> +#include "qom/object.h"
>> +
>> +#define TYPE_THREAD_CONTEXT "thread-context"
>> +OBJECT_DECLARE_TYPE(ThreadContext, ThreadContextClass,
>> +                    THREAD_CONTEXT)
>> +
>> +struct ThreadContextClass {
>> +    ObjectClass parent_class;
>> +};
>> +
>> +struct ThreadContext {
>> +    /* private */
>> +    Object parent;
>> +
>> +    /* private */
>> +    unsigned int thread_id;
>> +    QemuThread thread;
>> +
>> +    /* Semaphore to wait for context thread action. */
>> +    QemuSemaphore sem;
>> +    /* Semaphore to wait for action in context thread. */
>> +    QemuSemaphore sem_thread;
>> +    /* Mutex to synchronize requests. */
>> +    QemuMutex mutex;
>> +
>> +    /* Commands for the thread to execute. */
>> +    int thread_cmd;
>> +    void *thread_cmd_data;
>> +
>> +    /* CPU affinity bitmap used for initialization. */
>> +    unsigned long *init_cpu_bitmap;
>> +    int init_cpu_nbits;
>> +};
>> +
>> +void thread_context_create_thread(ThreadContext *tc, QemuThread *thread,
>> +                                  const char *name,
>> +                                  void *(*start_routine)(void *), void *arg,
>> +                                  int mode);
>> +
>> +#endif /* SYSEMU_THREAD_CONTEXT_H */
>> diff --git a/qapi/qom.json b/qapi/qom.json
>> index 80dd419b39..67d47f4051 100644
>> --- a/qapi/qom.json
>> +++ b/qapi/qom.json
>> @@ -830,6 +830,21 @@
>>               'reduced-phys-bits': 'uint32',
>>               '*kernel-hashes': 'bool' } }
>>   
>> +##
>> +# @ThreadContextProperties:
>> +#
>> +# Properties for thread context objects.
>> +#
>> +# @cpu-affinity: the list of CPU numbers used as CPU affinity for all threads
>> +#                created in the thread context (default: QEMU main thread
>> +#                affinity)
> 
> Another ignorant question: is the QEMU main thread affinity fixed or
> configurable?  If configurable, how?

AFAIK, it's only configurable externally, for example, via "virsh 
emulatorpin". There is no QEMU interface to adjust that (because it 
wouldn't work).

Libvirt will essentially trigger "taskset" on the emulator thread to 
change its CPU affinity.
Markus Armbruster Oct. 12, 2022, 8:02 a.m. UTC | #3
David Hildenbrand <david@redhat.com> writes:

>>> But note that due to dynamic library loading this example will not work
>>> before we actually make use of thread_context_create_thread() in QEMU
>>> code, because the type will otherwise not get registered.
>> 
>> What do you mean exactly by "not work"?  It's not "CLI option or HMP
>> command fails":
>
> For me, if I compile patch #1-#3 only, I get:
>
> $ ./build/qemu-system-x86_64 -S -display none -nodefaults -monitor stdio -object thread-context,id=tc1,cpu-affinity=0-1,cpu-affinity=6-7
> qemu-system-x86_64: invalid object type: thread-context
>
>
> Reason is that, without a call to thread_context_create_thread(), we won't trigger type_init(thread_context_register_types) and consequently, 
> the type won't be registered.
>
> Is it really different in your environment? Maybe it depends on the QEMU config?

I just tested again, and get the same result as you.  I figure my
previous test was with the complete series.

PATCH 5 appears to make it work.  Suggest to say something like "The
commit after next will make this work".

>>      $ upstream-qemu -S -display none -nodefaults -monitor stdio -object thread-context,id=tc1,cpu-affinity=0-1,cpu-affinity=6-7
>>      QEMU 7.1.50 monitor - type 'help' for more information
>>      (qemu) qom-get tc1 cpu-affinity
>>      [
>>          0,
>>          1,
>>          6,
>>          7
>>      ]
>>      (qemu) info cpus
>>      * CPU #0: thread_id=1670613
>> 
>> Even though the affinities refer to nonexistent CPUs :)
>
> CPU affinities are CPU numbers on your system (host), not QEMU vCPU numbers. I could talk about physical CPU numbers in the doc here, 
> although I am not sure if that really helps. What about "host CPU numbers" and in patch #4 "host node numbers"?

I think this would reduce the confusion opportunities for noobs like me.

> Seems to match what we document for @MemoryBackendProperties: "@host-nodes: the list of NUMA host nodes to bind the memory to"

Even better.

> But unrelated to that, pthread_setaffinity_np() won't bail out on CPUs that are currently not available in the host -- because one might 
> online/hotplug them later. It only bails out if none of the CPUs is currently available in the host:
>
> https://man7.org/linux/man-pages/man3/pthread_setaffinity_np.3.html
>
>
>        EINVAL (pthread_setaffinity_np()) The affinity bit mask mask
>               contains no processors that are currently physically on
>               the system and permitted to the thread according to any
>               restrictions that may be imposed by the "cpuset" mechanism
>               described in cpuset(7).
>
> It will bail out on CPUs that cannot be available in the host though, because it's impossible due to the kernel config:
>
>
>        EINVAL (pthread_setaffinity_np()) cpuset specified a CPU that was
>               outside the set supported by the kernel.  (The kernel
>               configuration option CONFIG_NR_CPUS defines the range of
>               the set supported by the kernel data type used to
>               represent CPU sets.)
>
>
>>> A ThreadContext can be reused, simply by reconfiguring the CPU affinity.
>> 
>> So, when a thread is created, its affinity comes from its thread context
>> (if any).  When I later change the context's affinity, it does *not*
>> affect existing threads, only future ones.  Correct?
>
> Yes, that's the current state.

Thanks!

>>> Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
>>> Signed-off-by: David Hildenbrand <david@redhat.com>

[...]

>>> diff --git a/qapi/qom.json b/qapi/qom.json
>>> index 80dd419b39..67d47f4051 100644
>>> --- a/qapi/qom.json
>>> +++ b/qapi/qom.json
>>> @@ -830,6 +830,21 @@
>>>               'reduced-phys-bits': 'uint32',
>>>               '*kernel-hashes': 'bool' } }
>>>   +##
>>> +# @ThreadContextProperties:
>>> +#
>>> +# Properties for thread context objects.
>>> +#
>>> +# @cpu-affinity: the list of CPU numbers used as CPU affinity for all threads
>>> +#                created in the thread context (default: QEMU main thread
>>> +#                affinity)
>>
>> Another ignorant question: is the QEMU main thread affinity fixed or
>> configurable?  If configurable, how?
>
> AFAIK, it's only configurable externally, for example, via "virsh emulatorpin". There is no QEMU interface to adjust that (because it 
> wouldn't work).
>
> Libvirt will essentially trigger "taskset" on the emulator thread to change its CPU affinity.

I see.

QAPI schema
Acked-by: Markus Armbruster <armbru@redhat.com>
David Hildenbrand Oct. 12, 2022, 8:19 a.m. UTC | #4
Thanks Markus!

> I just tested again, and get the same result as you.  I figure my
> previous test was with the complete series.
> 
> PATCH 5 appears to make it work.  Suggest to say something like "The
> commit after next will make this work".

I'll phrase it like " We'll wire this up next to make it work."

[...]

>>> So, when a thread is created, its affinity comes from its thread context
>>> (if any).  When I later change the context's affinity, it does *not*
>>> affect existing threads, only future ones.  Correct?
>>
>> Yes, that's the current state.
> 
> Thanks!
> 

I'm adding

"Note that the CPU affinity of previously created threads will not get 
adjusted."

and

"In general, the interface behaves like pthread_setaffinity_np(): host 
CPU numbers that are currently not available are ignored; only host CPU 
numbers that are impossible with the current kernel will fail. If the 
list of host CPU numbers does not include a single CPU that is 
available, setting the CPU affinity will fail."
Markus Armbruster Oct. 12, 2022, 10:23 a.m. UTC | #5
David Hildenbrand <david@redhat.com> writes:

> Thanks Markus!
>
>> I just tested again, and get the same result as you.  I figure my
>> previous test was with the complete series.
>> PATCH 5 appears to make it work.  Suggest to say something like "The
>> commit after next will make this work".
>
> I'll phrase it like " We'll wire this up next to make it work."

Works for me!

> [...]
>
>>>> So, when a thread is created, its affinity comes from its thread context
>>>> (if any).  When I later change the context's affinity, it does *not*
>>>> affect existing threads, only future ones.  Correct?
>>>
>>> Yes, that's the current state.
>> 
>> Thanks!
>
> I'm adding
>
> "Note that the CPU affinity of previously created threads will not get adjusted."
>
> and
>
> "In general, the interface behaves like pthread_setaffinity_np(): host CPU numbers that are currently not available are ignored; only host CPU 
> numbers that are impossible with the current kernel will fail. If the list of host CPU numbers does not include a single CPU that is 
> available, setting the CPU affinity will fail."

This is one of the reasons why reviewing your work is such a pleasure:
not only do you answer my questions with clarity and patience, you
proactively improve your patches before I can even think to ask.

Thank you!
David Hildenbrand Oct. 12, 2022, 12:27 p.m. UTC | #6
>>> Thanks!
>>
>> I'm adding
>>
>> "Note that the CPU affinity of previously created threads will not get adjusted."
>>
>> and
>>
>> "In general, the interface behaves like pthread_setaffinity_np(): host CPU numbers that are currently not available are ignored; only host CPU
>> numbers that are impossible with the current kernel will fail. If the list of host CPU numbers does not include a single CPU that is
>> available, setting the CPU affinity will fail."
> 
> This is one of the reasons why reviewing your work is such a pleasure:
> not only do you answer my questions with clarity and patience, you
> proactively improve your patches before I can even think to ask.
> 
> Thank you!

Thanks a lot Markus -- I have to say that your reviews are extremely 
helpful! You ask just the right questions that make one realize which 
parts of the documentation need improvement!
diff mbox series

Patch

diff --git a/include/qemu/thread-context.h b/include/qemu/thread-context.h
new file mode 100644
index 0000000000..2ebd6b7fe1
--- /dev/null
+++ b/include/qemu/thread-context.h
@@ -0,0 +1,57 @@ 
+/*
+ * QEMU Thread Context
+ *
+ * Copyright Red Hat Inc., 2022
+ *
+ * Authors:
+ *  David Hildenbrand <david@redhat.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#ifndef SYSEMU_THREAD_CONTEXT_H
+#define SYSEMU_THREAD_CONTEXT_H
+
+#include "qapi/qapi-types-machine.h"
+#include "qemu/thread.h"
+#include "qom/object.h"
+
+#define TYPE_THREAD_CONTEXT "thread-context"
+OBJECT_DECLARE_TYPE(ThreadContext, ThreadContextClass,
+                    THREAD_CONTEXT)
+
+struct ThreadContextClass {
+    ObjectClass parent_class;
+};
+
+struct ThreadContext {
+    /* private */
+    Object parent;
+
+    /* private */
+    unsigned int thread_id;
+    QemuThread thread;
+
+    /* Semaphore to wait for context thread action. */
+    QemuSemaphore sem;
+    /* Semaphore to wait for action in context thread. */
+    QemuSemaphore sem_thread;
+    /* Mutex to synchronize requests. */
+    QemuMutex mutex;
+
+    /* Commands for the thread to execute. */
+    int thread_cmd;
+    void *thread_cmd_data;
+
+    /* CPU affinity bitmap used for initialization. */
+    unsigned long *init_cpu_bitmap;
+    int init_cpu_nbits;
+};
+
+void thread_context_create_thread(ThreadContext *tc, QemuThread *thread,
+                                  const char *name,
+                                  void *(*start_routine)(void *), void *arg,
+                                  int mode);
+
+#endif /* SYSEMU_THREAD_CONTEXT_H */
diff --git a/qapi/qom.json b/qapi/qom.json
index 80dd419b39..67d47f4051 100644
--- a/qapi/qom.json
+++ b/qapi/qom.json
@@ -830,6 +830,21 @@ 
             'reduced-phys-bits': 'uint32',
             '*kernel-hashes': 'bool' } }
 
+##
+# @ThreadContextProperties:
+#
+# Properties for thread context objects.
+#
+# @cpu-affinity: the list of CPU numbers used as CPU affinity for all threads
+#                created in the thread context (default: QEMU main thread
+#                affinity)
+#
+# Since: 7.2
+##
+{ 'struct': 'ThreadContextProperties',
+  'data': { '*cpu-affinity': ['uint16'] } }
+
+
 ##
 # @ObjectType:
 #
@@ -882,6 +897,7 @@ 
     { 'name': 'secret_keyring',
       'if': 'CONFIG_SECRET_KEYRING' },
     'sev-guest',
+    'thread-context',
     's390-pv-guest',
     'throttle-group',
     'tls-creds-anon',
@@ -948,6 +964,7 @@ 
       'secret_keyring':             { 'type': 'SecretKeyringProperties',
                                       'if': 'CONFIG_SECRET_KEYRING' },
       'sev-guest':                  'SevGuestProperties',
+      'thread-context':             'ThreadContextProperties',
       'throttle-group':             'ThrottleGroupProperties',
       'tls-creds-anon':             'TlsCredsAnonProperties',
       'tls-creds-psk':              'TlsCredsPskProperties',
diff --git a/util/meson.build b/util/meson.build
index 5e282130df..e97cd2d779 100644
--- a/util/meson.build
+++ b/util/meson.build
@@ -1,4 +1,5 @@ 
 util_ss.add(files('osdep.c', 'cutils.c', 'unicode.c', 'qemu-timer-common.c'))
+util_ss.add(files('thread-context.c'))
 if not config_host_data.get('CONFIG_ATOMIC64')
   util_ss.add(files('atomic64.c'))
 endif
diff --git a/util/oslib-posix.c b/util/oslib-posix.c
index 905cbc27cc..28305cdea3 100644
--- a/util/oslib-posix.c
+++ b/util/oslib-posix.c
@@ -42,6 +42,7 @@ 
 #include "qemu/cutils.h"
 #include "qemu/compiler.h"
 #include "qemu/units.h"
+#include "qemu/thread-context.h"
 
 #ifdef CONFIG_LINUX
 #include <sys/syscall.h>
diff --git a/util/thread-context.c b/util/thread-context.c
new file mode 100644
index 0000000000..c921905396
--- /dev/null
+++ b/util/thread-context.c
@@ -0,0 +1,278 @@ 
+/*
+ * QEMU Thread Context
+ *
+ * Copyright Red Hat Inc., 2022
+ *
+ * Authors:
+ *  David Hildenbrand <david@redhat.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/thread-context.h"
+#include "qapi/error.h"
+#include "qapi/qapi-builtin-visit.h"
+#include "qapi/visitor.h"
+#include "qemu/config-file.h"
+#include "qapi/qapi-builtin-visit.h"
+#include "qom/object_interfaces.h"
+#include "qemu/module.h"
+#include "qemu/bitmap.h"
+
+enum {
+    TC_CMD_NONE = 0,
+    TC_CMD_STOP,
+    TC_CMD_NEW,
+};
+
+typedef struct ThreadContextCmdNew {
+    QemuThread *thread;
+    const char *name;
+    void *(*start_routine)(void *);
+    void *arg;
+    int mode;
+} ThreadContextCmdNew;
+
+static void *thread_context_run(void *opaque)
+{
+    ThreadContext *tc = opaque;
+
+    tc->thread_id = qemu_get_thread_id();
+    qemu_sem_post(&tc->sem);
+
+    while (true) {
+        /*
+         * Threads inherit the CPU affinity of the creating thread. For this
+         * reason, we create new (especially short-lived) threads from our
+         * persistent context thread.
+         *
+         * Especially when QEMU is not allowed to set the affinity itself,
+         * management tools can simply set the affinity of the context thread
+         * after creating the context, to have new threads created via
+         * the context inherit the CPU affinity automatically.
+         */
+        switch (tc->thread_cmd) {
+        case TC_CMD_NONE:
+            break;
+        case TC_CMD_STOP:
+            tc->thread_cmd = TC_CMD_NONE;
+            qemu_sem_post(&tc->sem);
+            return NULL;
+        case TC_CMD_NEW: {
+            ThreadContextCmdNew *cmd_new = tc->thread_cmd_data;
+
+            qemu_thread_create(cmd_new->thread, cmd_new->name,
+                               cmd_new->start_routine, cmd_new->arg,
+                               cmd_new->mode);
+            tc->thread_cmd = TC_CMD_NONE;
+            tc->thread_cmd_data = NULL;
+            qemu_sem_post(&tc->sem);
+            break;
+        }
+        default:
+            g_assert_not_reached();
+        }
+        qemu_sem_wait(&tc->sem_thread);
+    }
+}
+
+static void thread_context_set_cpu_affinity(Object *obj, Visitor *v,
+                                            const char *name, void *opaque,
+                                            Error **errp)
+{
+    ThreadContext *tc = THREAD_CONTEXT(obj);
+    uint16List *l, *host_cpus = NULL;
+    unsigned long *bitmap = NULL;
+    int nbits = 0, ret;
+    Error *err = NULL;
+
+    visit_type_uint16List(v, name, &host_cpus, &err);
+    if (err) {
+        error_propagate(errp, err);
+        return;
+    }
+
+    if (!host_cpus) {
+        error_setg(errp, "CPU list is empty");
+        goto out;
+    }
+
+    for (l = host_cpus; l; l = l->next) {
+        nbits = MAX(nbits, l->value + 1);
+    }
+    bitmap = bitmap_new(nbits);
+    for (l = host_cpus; l; l = l->next) {
+        set_bit(l->value, bitmap);
+    }
+
+    if (tc->thread_id != -1) {
+        /*
+         * Note: we won't be adjusting the affinity of any thread that is still
+         * around, but only the affinity of the context thread.
+         */
+        ret = qemu_thread_set_affinity(&tc->thread, bitmap, nbits);
+        if (ret) {
+            error_setg(errp, "Setting CPU affinity failed: %s", strerror(ret));
+        }
+    } else {
+        tc->init_cpu_bitmap = bitmap;
+        bitmap = NULL;
+        tc->init_cpu_nbits = nbits;
+    }
+out:
+    g_free(bitmap);
+    qapi_free_uint16List(host_cpus);
+}
+
+static void thread_context_get_cpu_affinity(Object *obj, Visitor *v,
+                                            const char *name, void *opaque,
+                                            Error **errp)
+{
+    unsigned long *bitmap, nbits, value;
+    ThreadContext *tc = THREAD_CONTEXT(obj);
+    uint16List *host_cpus = NULL;
+    uint16List **tail = &host_cpus;
+    int ret;
+
+    if (tc->thread_id == -1) {
+        error_setg(errp, "Object not initialized yet");
+        return;
+    }
+
+    ret = qemu_thread_get_affinity(&tc->thread, &bitmap, &nbits);
+    if (ret) {
+        error_setg(errp, "Getting CPU affinity failed: %s", strerror(ret));
+        return;
+    }
+
+    value = find_first_bit(bitmap, nbits);
+    while (value < nbits) {
+        QAPI_LIST_APPEND(tail, value);
+
+        value = find_next_bit(bitmap, nbits, value + 1);
+    }
+    g_free(bitmap);
+
+    visit_type_uint16List(v, name, &host_cpus, errp);
+    qapi_free_uint16List(host_cpus);
+}
+
+static void thread_context_get_thread_id(Object *obj, Visitor *v,
+                                         const char *name, void *opaque,
+                                         Error **errp)
+{
+    ThreadContext *tc = THREAD_CONTEXT(obj);
+    uint64_t value = tc->thread_id;
+
+    visit_type_uint64(v, name, &value, errp);
+}
+
+static void thread_context_instance_complete(UserCreatable *uc, Error **errp)
+{
+    ThreadContext *tc = THREAD_CONTEXT(uc);
+    char *thread_name;
+    int ret;
+
+    thread_name = g_strdup_printf("TC %s",
+                               object_get_canonical_path_component(OBJECT(uc)));
+    qemu_thread_create(&tc->thread, thread_name, thread_context_run, tc,
+                       QEMU_THREAD_JOINABLE);
+    g_free(thread_name);
+
+    /* Wait until initialization of the thread is done. */
+    while (tc->thread_id == -1) {
+        qemu_sem_wait(&tc->sem);
+    }
+
+    if (tc->init_cpu_bitmap) {
+        ret = qemu_thread_set_affinity(&tc->thread, tc->init_cpu_bitmap,
+                                       tc->init_cpu_nbits);
+        if (ret) {
+            error_setg(errp, "Setting CPU affinity failed: %s", strerror(ret));
+        }
+        g_free(tc->init_cpu_bitmap);
+        tc->init_cpu_bitmap = NULL;
+    }
+}
+
+static void thread_context_class_init(ObjectClass *oc, void *data)
+{
+    UserCreatableClass *ucc = USER_CREATABLE_CLASS(oc);
+
+    ucc->complete = thread_context_instance_complete;
+    object_class_property_add(oc, "thread-id", "int",
+                              thread_context_get_thread_id, NULL, NULL,
+                              NULL);
+    object_class_property_add(oc, "cpu-affinity", "int",
+                              thread_context_get_cpu_affinity,
+                              thread_context_set_cpu_affinity, NULL, NULL);
+}
+
+static void thread_context_instance_init(Object *obj)
+{
+    ThreadContext *tc = THREAD_CONTEXT(obj);
+
+    tc->thread_id = -1;
+    qemu_sem_init(&tc->sem, 0);
+    qemu_sem_init(&tc->sem_thread, 0);
+    qemu_mutex_init(&tc->mutex);
+}
+
+static void thread_context_instance_finalize(Object *obj)
+{
+    ThreadContext *tc = THREAD_CONTEXT(obj);
+
+    if (tc->thread_id != -1) {
+        tc->thread_cmd = TC_CMD_STOP;
+        qemu_sem_post(&tc->sem_thread);
+        qemu_thread_join(&tc->thread);
+    }
+    qemu_sem_destroy(&tc->sem);
+    qemu_sem_destroy(&tc->sem_thread);
+    qemu_mutex_destroy(&tc->mutex);
+}
+
+static const TypeInfo thread_context_info = {
+    .name = TYPE_THREAD_CONTEXT,
+    .parent = TYPE_OBJECT,
+    .class_init = thread_context_class_init,
+    .instance_size = sizeof(ThreadContext),
+    .instance_init = thread_context_instance_init,
+    .instance_finalize = thread_context_instance_finalize,
+    .interfaces = (InterfaceInfo[]) {
+        { TYPE_USER_CREATABLE },
+        { }
+    }
+};
+
+static void thread_context_register_types(void)
+{
+    type_register_static(&thread_context_info);
+}
+type_init(thread_context_register_types)
+
+void thread_context_create_thread(ThreadContext *tc, QemuThread *thread,
+                                  const char *name,
+                                  void *(*start_routine)(void *), void *arg,
+                                  int mode)
+{
+    ThreadContextCmdNew data = {
+        .thread = thread,
+        .name = name,
+        .start_routine = start_routine,
+        .arg = arg,
+        .mode = mode,
+    };
+
+    qemu_mutex_lock(&tc->mutex);
+    tc->thread_cmd = TC_CMD_NEW;
+    tc->thread_cmd_data = &data;
+    qemu_sem_post(&tc->sem_thread);
+
+    while (tc->thread_cmd != TC_CMD_NONE) {
+        qemu_sem_wait(&tc->sem);
+    }
+    qemu_mutex_unlock(&tc->mutex);
+}