@@ -42,6 +42,7 @@ Program types
.. toctree::
:maxdepth: 1
+ prog_cgroup_sockopt
prog_cgroup_sysctl
prog_flow_dissector
new file mode 100644
@@ -0,0 +1,72 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+============================
+BPF_PROG_TYPE_CGROUP_SOCKOPT
+============================
+
+``BPF_PROG_TYPE_CGROUP_SOCKOPT`` program type can be attached to two
+cgroup hooks:
+
+* ``BPF_CGROUP_GETSOCKOPT`` - called every time process executes ``getsockopt``
+ system call.
+* ``BPF_CGROUP_SETSOCKOPT`` - called every time process executes ``setsockopt``
+ system call.
+
+The context (``struct bpf_sockopt``) has associated socket (``sk``) and
+all input arguments: ``level``, ``optname``, ``optval`` and ``optlen``.
+
+BPF_CGROUP_SETSOCKOPT
+=====================
+
+``BPF_CGROUP_SETSOCKOPT`` has a read-only context and this hook has
+access to cgroup and socket local storage.
+
+BPF_CGROUP_GETSOCKOPT
+=====================
+
+``BPF_CGROUP_GETSOCKOPT`` has to fill in ``optval`` and adjust
+``optlen`` accordingly. Input ``optlen`` contains the maximum length
+of data that can be returned to the userspace. In other words, BPF
+program can't increase ``optlen``, it can only decrease it.
+
+Return Type
+===========
+
+* ``0`` - reject the syscall, ``EPERM`` will be returned to the userspace.
+* ``1`` - success: after returning from the BPF hook, kernel will also
+ handle this socket option.
+* ``2`` - success: after returning from the BPF hook, kernel will _not_
+ handle this socket option; control will be returned to the userspace
+ instead.
+
+Cgroup Inheritance
+==================
+
+Suppose, there is the following cgroup hierarchy where each cgroup
+has BPF_CGROUP_GETSOCKOPT attached at each level with
+BPF_F_ALLOW_MULTI flag::
+
+ A (root)
+ \
+ B
+ \
+ C
+
+When the application calls getsockopt syscall from the cgroup C,
+the programs are executed from the bottom up: C, B, A. As long as
+BPF programs in the chain return 1, the execution continues. If
+some program in the C, B, A chain returns 2 (bypass kernel) or
+0 (EPERM), the control is immediately passed passed back to the
+userspace. This is in contrast with any existing per-cgroup BPF
+hook where all programs are called, even if some of them return
+0 (EPERM).
+
+In the example above, if C returns 1 (continue) and then B returns
+0 (EPERM) or 2 (bypass kernel), the program attached to A will _not_
+be executed.
+
+Example
+=======
+
+See ``tools/testing/selftests/bpf/progs/sockopt_sk.c`` for an example
+of BPF program that handles socket options.
Provide user documentation about sockopt prog type and cgroup hooks. v6: * describe cgroup chaining, add example v2: * use return code 2 for kernel bypass Cc: Martin Lau <kafai@fb.com> Signed-off-by: Stanislav Fomichev <sdf@google.com> --- Documentation/bpf/index.rst | 1 + Documentation/bpf/prog_cgroup_sockopt.rst | 72 +++++++++++++++++++++++ 2 files changed, 73 insertions(+) create mode 100644 Documentation/bpf/prog_cgroup_sockopt.rst