mbox series

[v2,0/5] Add support for GE SUNH hot-pluggable connector (was: "drm: add support for hot-pluggable bridges")

Message ID 20240510-hotplug-drm-bridge-v2-0-ec32f2c66d56@bootlin.com
Headers show
Series Add support for GE SUNH hot-pluggable connector (was: "drm: add support for hot-pluggable bridges") | expand

Message

Luca Ceresoli May 10, 2024, 7:10 a.m. UTC
Hello,

this series aims at supporting a Linux device with a connector to
physically add and remove an add-on to/from the main device to augment its
features at runtime, using device tree overlays.

This is the v2 of "drm: add support for hot-pluggable bridges" [0] which
was however more limited in scope, covering only the DRM aspects. This new
series also takes a different approach to the DRM bridge instantiation.

[0] https://lore.kernel.org/all/20240326-hotplug-drm-bridge-v1-0-4b51b5eb75d5@bootlin.com/

Use case
========

This series targets a professional product (GE SUNH) that is composed of a
"main" part running on battery, with the main SoC and able to work
autonomously with limited features, and an optional "add-on" that enables
more features by adding more hardware peripherals, some of which are on
non-discoverable busses such as I2C and MIPI DSI.

The add-on can be connected and disconnected at runtime at any moment by
the end user, and add-on features need to be enabled and disabled
automatically at runtime.

The add-on has status pins that are connected to GPIOs on the main board,
allowing the CPU to detect add-on insertion and removal. It also has a
reset GPIO allowign to reset all peripherals on the add-on at once.

The features provided by the add-on include a display and a battery charger
to recharge the battery of the main part. The display on the add-on has an
LVDS input but the connector between the base and the add-on has a MIPI DSI
bus, so a DSI-to-LVDS bridge is present on the add-on.

Different add-on models can be connected to the main part, and for this a
model ID is stored in the add-on itself so the software running on the CPU
on the main part knows which non-discoverable hardware to probe.

Overall approach
================

Device tree overlays appear as the most natural solution to support the
addition and removal of devices from a running system.

Several features are missing from the mainline Linux kernel in order to
support this use case:

 1. runtime (un)loading of device tree overlays is not supported
 2. if enabled, overlay (un)loading exposes several bugs
 3. the DRM subsystem assumes video bridges are non-removable

This series targets items 1 and 3. Item 2 is being handled separately (see
"Device tree overlay issues" below).

Device tree representation and connector driver
===============================================

The device tree description we propose involves 3 main parts.

1: the main (fixed) device tree

The main device tree describes the connector itself along with the status
and reset GPIOs. It also provides a 'ports' node to describe how the two
sides of the MIPI DSI bus connect to each other. So now, differently from
v1, there is no standalone representation of the DRM bridge because it is
not really a hardware component. Here is how the connector is represented
in the fixed part of the device tree:

    / {
        #include <dt-bindings/gpio/gpio.h>

        addon_connector: addon-connector {
            compatible = "ge,sunh-addon-connector";
            reset-gpios = <&gpio1 1 GPIO_ACTIVE_LOW>;
            plugged-gpios = <&gpio1 2 GPIO_ACTIVE_LOW>;
            powergood-gpios = <&gpio1 3 GPIO_ACTIVE_HIGH>;

            ports {
                #address-cells = <1>;
                #size-cells = <0>;

                port@0 {
                    reg = <0>;

                    hotplug_conn_dsi_in: endpoint {
                        remote-endpoint = <&previous_bridge_out>;
                    };
                };

                port@1 {
                    reg = <1>;

                    hotplug_conn_dsi_out: endpoint {
                        // remote-endpoint to be added by overlay
                    };
                };
            };
        };
    };

The connector has a specific compatible string, and this series adds a
driver supporting it. This driver uses the deivce tree overlay loading and
unloading facilities already implemented by the kernel but not currently
exposed. The driver detects the connection status from the GPIOs and reacts
to a connection event by loading a first overlay (the "base" overlay).

2: the "base" overlay

The "base" overlay describes the common components that are required to
read the model ID. These are identical for all add-on models, thus only one
"base" overlay is needed:

    &i2c1 {
        #address-cells = <1>;
        #size-cells = <0>;

        eeprom@50 {
            compatible = "atmel,24c64";
            reg = <0x50>;

            nvmem-layout {
                compatible = "fixed-layout";
                #address-cells = <1>;
                #size-cells = <1>;

                addon_model_id: addon-model-id@400 {
                    reg = <0x400 0x1>;
                };
            };
        };
    };

    &addon_connector {
        nvmem-cells = <&addon_model_id>;
        nvmem-cell-names = "id";
    };

Here "i2c1" is an I2C bus whose adapter is on the main part (possibly with
some clients) but whose lines extend through the connector to the add-on
where the EEPROM is (possibly with more clients). The EEPROM holds the
model ID of each add-on, using always the same I2C address and memory
offset.

The '&addon_connector' section provides the "glue" to describe how the
NVMEM cell can be accessed via the connector. This allows the connector
driver to read the model ID.

3: the "add-on-specific" overlay

Based on the model ID, the connector driver loads the second overlay, which
describes all the add-on hardware not yet described by the base
overlay. This overlay is model-specific.

    &hotplug_conn_dsi_out {
        remote-endpoint = <&addon_bridge_in>;
    };

    &i2c1 {
        #address-cells = <1>;
        #size-cells = <0>;

        dsi-lvds-bridge@42 {
            compatible = "some-video-bridge";
            reg = <0x42>;

            ports {
                #address-cells = <1>;
                #size-cells = <0>;

                port@0 {
                    reg = <0>;

                    addon_bridge_in: endpoint {
                        remote-endpoint = <&hotplug_conn_dsi_out>;
                        data-lanes = <1 2 3 4>;
                    };
                };

                port@1 {
                    reg = <1>;

                    addon_bridge_out: endpoint {
                        remote-endpoint = <&panel_in>;
                    };
                };
            };
        };
    };

    &{/}
    {
    panel {
        ports {
            #address-cells = <1>;
            #size-cells = <0>;

            port@0{
                reg = <0>;

                panel_in: endpoint {
                    remote-endpoint = <&addon_bridge_out>;
                };
            };
        };
    };

The '&hotplug_conn_dsi_out' section fills the port@1 section that was
initially missing, in order to point to the continuation of the MIPI DSI
line. The rest of this overlay just completes the video pipeline. In the
'&i2c1' section, another I2C client is added to a bus, just as done for the
EEPROM.

After these steps, the add-on is fully described and working on the
system. When the status GPIOs report a disconnection, the overlays are
unloaded in reverse order.

DRM hotplug bridge driver
=========================

DRM natively supports pipelines whose display can be removed, but all the
components preceding it (all the display controller and any bridges) are
assumed to be fixed and cannot be plugged, removed or modified at runtime.

This series adds support for DRM pipelines having a removable part after
the encoder, thus also allowing bridges to be removed and reconnected at
runtime, possibly with different components.

This picture summarizes the  DRM structure implemented by this series:

 .------------------------.
 |   DISPLAY CONTROLLER   |
 | .---------.   .------. |
 | | ENCODER |<--| CRTC | |
 | '---------'   '------' |
 '------|-----------------'
        |
        |DSI            HOTPLUG
        V              CONNECTOR
   .---------.        .--.    .-.        .---------.         .-------.
   | 0 to N  |        | _|   _| |        | 1 to N  |         |       |
   | BRIDGES |--DSI-->||_   |_  |--DSI-->| BRIDGES |--LVDS-->| PANEL |
   |         |        |  |    | |        |         |         |       |
   '---------'        '--'    '-'        '---------'         '-------'

 [--- fixed components --]  [----------- removable add-on -----------]

Fixed components include:

 * all components up to the DRM encoder, usually part of the SoC
 * optionally some bridges, in the SoC and/or as external chips

Components on the removable add-on include:

 * one or more bridges
 * a fixed connector (not one natively supporting hotplug such as HDMI)
 * the panel

The video bus is MIPI DSI in the example and in the implementation provided
by this series, but the implementation is meant to allow generalization to
other video busses without native hotplug support, such as parallel video
and LVDS.

Note that the term "connector" in this context has nothing to do with the
"DRM connector" abstraction already present in the DRM subsystem (struct
drm_connector).

In order to support the above use case while being reasonably generic and
self-contained, the implementation is based on the introduction of the
"hotplug-bridge". The hotplug-bridge implementation is as transparent as
possible. In a nutshell, it decouples the preceding and the following
components: the upstream components will believe that the pipeline is
always there, whil the downstream components will undergo a normal
probe/remove process on insertion/removal as if the entire DRM pipeline
were being added/removed. This means all the other DRM components can be
implemented normally, without having to even be aware of hot-plugging.

The hotplug-bridge it is based on a few self-contained additions to
drm_bridge and drm_encoder, which are provided as individual patches in
this series, and does not require any modifications to other bridges.
However the implementation has some tricky aspects that make it more
complex than in an ideal design. See the patch adding the driver for the
details.

Outstanding bugs and issues
===========================

Unsurprisingly, enabling device tree overlay loading/unloading at runtime
is exposing a number of issues. While testing this work and another,
unrelated work also using overlay insertion/removal [1] we ancountered
several and we are working on solving them one by one. Here is a list of
the issues for which we have sent some patches:

 1. ERROR: remove_proc_entry: removing non-empty directory 'irq/233', leaking at least 'gpiomon'
    - https://lore.kernel.org/all/20240227113426.253232-1-herve.codina@bootlin.com/
 2. leds: gpio: Add devlink between the leds-gpio device and the gpio used
    - https://lore.kernel.org/all/20240220133950.138452-1-herve.codina@bootlin.com/
 3. kobject: 'gpiochip8' ((____ptrval____)): is not initialized, yet kobject_get() is being called.
    - https://lore.kernel.org/all/CAGETcx_YjRzA0joyESsgk=XJKBqqFD7YZeSwKu1a1deo-EyeKw@mail.gmail.com/

Overlay removal is also known for memory leaks of some properties (the
"deadprops" list). This is being examined for a proper solution.

An issue related to devlink appeared since commit 782bfd03c3ae ("of:
property: Improve finding the supplier of a remote-endpoint property"),
merged in v6.8-rc5, as reported in:

 https://lore.kernel.org/all/20240223171849.10f9901d@booty

This is on my todo list as well. The current workaround is reverting
782bfd03c3ae.

Finally, a known issue is related to NVMEM. The connector driver uses NVMEM
notifiers to be notified of cell addition. This works reliably assuming the
cell to be available when the notification fucntion is called, but this
does not happen. Two alternative patches have been sent that would fix
this:

 - https://lore.kernel.org/all/20240130160021.70ddef92@xps-13/
 - https://lore.kernel.org/all/20231229-nvmem-cell-add-notification-v1-1-8d8b426be9f9@bootlin.com/

There was no activity recently about this. Continuing this work in on my
todo list.

[1] https://lore.kernel.org/all/20240430083730.134918-1-herve.codina@bootlin.com

That's all
==========

Thanks for you patience in reading this!

Luca

Changes in v2:
- Added bindings and driver for ge,sunh-addon-connector
- Removed bindings for the hotplug-video-connector, this is now represented
  in DT as part of the ge,sunh-addon-connector
- Various monior improvements to the DRM hotplug-bridge driver
- Link to v1: https://lore.kernel.org/r/20240326-hotplug-drm-bridge-v1-0-4b51b5eb75d5@bootlin.com

Co-developed-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>
Signed-off-by: Luca Ceresoli <luca.ceresoli@bootlin.com>
---
Luca Ceresoli (4):
      dt-bindings: connector: add GE SUNH hotplug addon connector
      drm/encoder: add drm_encoder_cleanup_from()
      drm/bridge: hotplug-bridge: add driver to support hot-pluggable DSI bridges
      misc: add ge-addon-connector driver

Paul Kocialkowski (1):
      drm/bridge: add bridge notifier to be notified of bridge addition and removal

 .../connector/ge,sunh-addon-connector.yaml         | 197 +++++++
 MAINTAINERS                                        |  11 +
 drivers/gpu/drm/bridge/Kconfig                     |  15 +
 drivers/gpu/drm/bridge/Makefile                    |   1 +
 drivers/gpu/drm/bridge/hotplug-bridge.c            | 577 +++++++++++++++++++++
 drivers/gpu/drm/drm_bridge.c                       |  35 ++
 drivers/gpu/drm/drm_encoder.c                      |  21 +
 drivers/misc/Kconfig                               |  15 +
 drivers/misc/Makefile                              |   1 +
 drivers/misc/ge-sunh-connector.c                   | 464 +++++++++++++++++
 include/drm/drm_bridge.h                           |  19 +
 include/drm/drm_encoder.h                          |   1 +
 12 files changed, 1357 insertions(+)
---
base-commit: 5e3810587f43a24d2c568b7e08fcff7ce05d71a9
change-id: 20240319-hotplug-drm-bridge-16b86e67fe92

Best regards,

Comments

Greg Kroah-Hartman May 10, 2024, 7:55 a.m. UTC | #1
On Fri, May 10, 2024 at 09:10:41AM +0200, Luca Ceresoli wrote:
> Add a driver to support the runtime hot-pluggable add-on connector on the
> GE SUNH device. This connector allows connecting and disconnecting an
> add-on to/from the main device to augment its features. Connection and
> disconnection can happen at runtime at any moment without notice.
> 
> Different add-on models can be connected, and each has an EEPROM with a
> model identifier at a fixed address.
> 
> The add-on hardware is added and removed using device tree overlay loading
> and unloading.
> 
> Co-developed-by: Herve Codina <herve.codina@bootlin.com>
> Signed-off-by: Herve Codina <herve.codina@bootlin.com>
> Signed-off-by: Luca Ceresoli <luca.ceresoli@bootlin.com>
> 
> ---
> 
> This commit is new in v2.
> ---
>  MAINTAINERS                      |   1 +
>  drivers/misc/Kconfig             |  15 ++
>  drivers/misc/Makefile            |   1 +
>  drivers/misc/ge-sunh-connector.c | 464 +++++++++++++++++++++++++++++++++++++++
>  4 files changed, 481 insertions(+)
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 672c26372c92..0bdb4fc496b8 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -9905,6 +9905,7 @@ F:	drivers/iio/pressure/mprls0025pa*
>  HOTPLUG CONNECTOR FOR GE SUNH ADDONS
>  M:	Luca Ceresoli <luca.ceresoli@bootlin.com>
>  S:	Maintained
> +F:	drivers/misc/ge-sunh-connector.c
>  F:	Documentation/devicetree/bindings/connector/ge,sunh-addon-connector.yaml
>  
>  HP BIOSCFG DRIVER
> diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig
> index 4fb291f0bf7c..99ef2eccbbaa 100644
> --- a/drivers/misc/Kconfig
> +++ b/drivers/misc/Kconfig
> @@ -574,6 +574,21 @@ config NSM
>  	  To compile this driver as a module, choose M here.
>  	  The module will be called nsm.
>  
> +config GE_SUNH_CONNECTOR
> +	tristate "GE SUNH hotplug add-on connector"
> +	depends on OF
> +	select OF_OVERLAY
> +	select FW_LOADER
> +	select NVMEM
> +	select DRM_HOTPLUG_BRIDGE

Can these be depends instead of select?  'select' causes dependencies
that are hard, if not almost impossible, to detect at times why
something is being enabled.

> +	help
> +	  Driver for the runtime hot-pluggable add-on connector on the GE SUNH
> +	  device. This connector allows connecting and disconnecting an add-on
> +	  to/from the main device to augment its features. Connection and
> +	  disconnection can be done at runtime at any moment without
> +	  notice. Different add-on models can be connected, and each has an EEPROM
> +	  with a model identifier at a fixed address.

Module name?


> +static void sunh_conn_reset(struct sunh_conn *conn, bool keep_reset)
> +{
> +	dev_dbg(conn->dev, "reset\n");

ftrace is your friend.

> +static int sunh_conn_handle_event(struct sunh_conn *conn, bool plugged)
> +{
> +	int err;
> +
> +	if (plugged == conn->plugged)
> +		return 0;
> +
> +	dev_info(conn->dev, "%s\n", plugged ? "connected" : "disconnected");

Please remove debugging code from stuff you want to see merged.

Same for all dev_info() calls here, when drivers work properly, they are
quiet.

thanks,

greg k-h
Arnd Bergmann May 10, 2024, 10:24 a.m. UTC | #2
On Fri, May 10, 2024, at 09:55, Greg Kroah-Hartman wrote:
> On Fri, May 10, 2024 at 09:10:41AM +0200, Luca Ceresoli wrote:
>>  
>> +config GE_SUNH_CONNECTOR
>> +	tristate "GE SUNH hotplug add-on connector"
>> +	depends on OF
>> +	select OF_OVERLAY
>> +	select FW_LOADER
>> +	select NVMEM
>> +	select DRM_HOTPLUG_BRIDGE
>
> Can these be depends instead of select?  'select' causes dependencies
> that are hard, if not almost impossible, to detect at times why
> something is being enabled.

I think FW_LOADER needs to be 'select' since it is normally
a hidden symbol and gets selected by its users, all the other
ones should be 'depends on'.

    Arnd
Luca Ceresoli May 10, 2024, 10:54 a.m. UTC | #3
Hello Greg,

thanks for reviewing.

On Fri, 10 May 2024 08:55:29 +0100
Greg Kroah-Hartman <gregkh@linuxfoundation.org> wrote:

> On Fri, May 10, 2024 at 09:10:41AM +0200, Luca Ceresoli wrote:
> > Add a driver to support the runtime hot-pluggable add-on connector on the
> > GE SUNH device. This connector allows connecting and disconnecting an
> > add-on to/from the main device to augment its features. Connection and
> > disconnection can happen at runtime at any moment without notice.
> > 
> > Different add-on models can be connected, and each has an EEPROM with a
> > model identifier at a fixed address.
> > 
> > The add-on hardware is added and removed using device tree overlay loading
> > and unloading.
> > 
> > Co-developed-by: Herve Codina <herve.codina@bootlin.com>
> > Signed-off-by: Herve Codina <herve.codina@bootlin.com>
> > Signed-off-by: Luca Ceresoli <luca.ceresoli@bootlin.com>
> > 
> > ---
> > 
> > This commit is new in v2.
> > ---
> >  MAINTAINERS                      |   1 +
> >  drivers/misc/Kconfig             |  15 ++
> >  drivers/misc/Makefile            |   1 +
> >  drivers/misc/ge-sunh-connector.c | 464 +++++++++++++++++++++++++++++++++++++++
> >  4 files changed, 481 insertions(+)
> > 
> > diff --git a/MAINTAINERS b/MAINTAINERS
> > index 672c26372c92..0bdb4fc496b8 100644
> > --- a/MAINTAINERS
> > +++ b/MAINTAINERS
> > @@ -9905,6 +9905,7 @@ F:	drivers/iio/pressure/mprls0025pa*
> >  HOTPLUG CONNECTOR FOR GE SUNH ADDONS
> >  M:	Luca Ceresoli <luca.ceresoli@bootlin.com>
> >  S:	Maintained
> > +F:	drivers/misc/ge-sunh-connector.c
> >  F:	Documentation/devicetree/bindings/connector/ge,sunh-addon-connector.yaml
> >  
> >  HP BIOSCFG DRIVER
> > diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig
> > index 4fb291f0bf7c..99ef2eccbbaa 100644
> > --- a/drivers/misc/Kconfig
> > +++ b/drivers/misc/Kconfig
> > @@ -574,6 +574,21 @@ config NSM
> >  	  To compile this driver as a module, choose M here.
> >  	  The module will be called nsm.
> >  
> > +config GE_SUNH_CONNECTOR
> > +	tristate "GE SUNH hotplug add-on connector"
> > +	depends on OF
> > +	select OF_OVERLAY
> > +	select FW_LOADER
> > +	select NVMEM
> > +	select DRM_HOTPLUG_BRIDGE  
> 
> Can these be depends instead of select?  'select' causes dependencies
> that are hard, if not almost impossible, to detect at times why
> something is being enabled.

(see reply to Arnd's follow-up e-mail for this)

> > +	help
> > +	  Driver for the runtime hot-pluggable add-on connector on the GE SUNH
> > +	  device. This connector allows connecting and disconnecting an add-on
> > +	  to/from the main device to augment its features. Connection and
> > +	  disconnection can be done at runtime at any moment without
> > +	  notice. Different add-on models can be connected, and each has an EEPROM
> > +	  with a model identifier at a fixed address.  
> 
> Module name?

OK, will add.

> > +static void sunh_conn_reset(struct sunh_conn *conn, bool keep_reset)
> > +{
> > +	dev_dbg(conn->dev, "reset\n");  
> 
> ftrace is your friend.

ACK.

> > +static int sunh_conn_handle_event(struct sunh_conn *conn, bool plugged)
> > +{
> > +	int err;
> > +
> > +	if (plugged == conn->plugged)
> > +		return 0;
> > +
> > +	dev_info(conn->dev, "%s\n", plugged ? "connected" : "disconnected");  
> 
> Please remove debugging code from stuff you want to see merged.
> 
> Same for all dev_info() calls here, when drivers work properly, they are
> quiet.

While agree for other dev_info() calls, this one seems quite similar in
principle to the link up/down messages that get logged by the MII code
at [0]:

  [347229.872315] asix 1-1.3.2:1.0 enx000cf616fecb: link up, 100Mbps,
  full-duplex, lpa 0xC5E1 [347229.920449] asix 1-1.3.2:1.0 enx000cf616fecb: link down

In my case it is logging that a removable part of the hardware has been
added or removed, which appears useful. Do you think it make sense in
this scenario?

Luca

[0] https://elixir.bootlin.com/linux/v6.8.9/source/drivers/net/mii.c#L557
Luca Ceresoli May 10, 2024, 10:54 a.m. UTC | #4
Hello Greg, Arnd,

On Fri, 10 May 2024 12:24:06 +0200
"Arnd Bergmann" <arnd@arndb.de> wrote:

> On Fri, May 10, 2024, at 09:55, Greg Kroah-Hartman wrote:
> > On Fri, May 10, 2024 at 09:10:41AM +0200, Luca Ceresoli wrote:  
> >>  
> >> +config GE_SUNH_CONNECTOR
> >> +	tristate "GE SUNH hotplug add-on connector"
> >> +	depends on OF
> >> +	select OF_OVERLAY
> >> +	select FW_LOADER
> >> +	select NVMEM
> >> +	select DRM_HOTPLUG_BRIDGE  
> >
> > Can these be depends instead of select?  'select' causes dependencies
> > that are hard, if not almost impossible, to detect at times why
> > something is being enabled.  
> 
> I think FW_LOADER needs to be 'select' since it is normally
> a hidden symbol and gets selected by its users, all the other
> ones should be 'depends on'.

I see, makes sense.

And as you pointed that out, I realize perhaps DRM_HOTPLUG_BRIDGE could
become a hidden symbol as it's not expected to be used alone.

Luca
Arnd Bergmann May 10, 2024, 10:57 a.m. UTC | #5
On Fri, May 10, 2024, at 12:54, Luca Ceresoli wrote:
> On Fri, 10 May 2024 12:24:06 +0200 "Arnd Bergmann" <arnd@arndb.de> wrote:
>> On Fri, May 10, 2024, at 09:55, Greg Kroah-Hartman wrote:
>> > On Fri, May 10, 2024 at 09:10:41AM +0200, Luca Ceresoli wrote:  
>> >>  
>> >> +config GE_SUNH_CONNECTOR
>> >> +	tristate "GE SUNH hotplug add-on connector"
>> >> +	depends on OF
>> >> +	select OF_OVERLAY
>> >> +	select FW_LOADER
>> >> +	select NVMEM
>> >> +	select DRM_HOTPLUG_BRIDGE  
>> >
>> > Can these be depends instead of select?  'select' causes dependencies
>> > that are hard, if not almost impossible, to detect at times why
>> > something is being enabled.  
>> 
>> I think FW_LOADER needs to be 'select' since it is normally
>> a hidden symbol and gets selected by its users, all the other
>> ones should be 'depends on'.
>
> I see, makes sense.
>
> And as you pointed that out, I realize perhaps DRM_HOTPLUG_BRIDGE could
> become a hidden symbol as it's not expected to be used alone.

It's slightly easier to keep it as a visible symbol
with 'depends on' though, since otherwise you have to
add 'depends on' statments for anything that DRM_HOTPLUG_BRIDGE
in turn depends on, most notably DRM itself.

     Arnd
Greg Kroah-Hartman May 10, 2024, 11:03 a.m. UTC | #6
On Fri, May 10, 2024 at 12:54:17PM +0200, Luca Ceresoli wrote:
> > > +static int sunh_conn_handle_event(struct sunh_conn *conn, bool plugged)
> > > +{
> > > +	int err;
> > > +
> > > +	if (plugged == conn->plugged)
> > > +		return 0;
> > > +
> > > +	dev_info(conn->dev, "%s\n", plugged ? "connected" : "disconnected");  
> > 
> > Please remove debugging code from stuff you want to see merged.
> > 
> > Same for all dev_info() calls here, when drivers work properly, they are
> > quiet.
> 
> While agree for other dev_info() calls, this one seems quite similar in
> principle to the link up/down messages that get logged by the MII code
> at [0]:
> 
>   [347229.872315] asix 1-1.3.2:1.0 enx000cf616fecb: link up, 100Mbps,
>   full-duplex, lpa 0xC5E1 [347229.920449] asix 1-1.3.2:1.0 enx000cf616fecb: link down
> 
> In my case it is logging that a removable part of the hardware has been
> added or removed, which appears useful. Do you think it make sense in
> this scenario?

Nope, sorry, again, when drivers are working properly, they should be
quiet otherwise they just fill up the log with unneeded messages.

thanks,

greg k-h
Luca Ceresoli May 10, 2024, 3:32 p.m. UTC | #7
Hi Greg, Arnd,

On Fri, 10 May 2024 12:57:24 +0200
"Arnd Bergmann" <arnd@arndb.de> wrote:

> On Fri, May 10, 2024, at 12:54, Luca Ceresoli wrote:
> > On Fri, 10 May 2024 12:24:06 +0200 "Arnd Bergmann" <arnd@arndb.de> wrote:  
> >> On Fri, May 10, 2024, at 09:55, Greg Kroah-Hartman wrote:  
> >> > On Fri, May 10, 2024 at 09:10:41AM +0200, Luca Ceresoli wrote:    
> >> >>  
> >> >> +config GE_SUNH_CONNECTOR
> >> >> +	tristate "GE SUNH hotplug add-on connector"
> >> >> +	depends on OF
> >> >> +	select OF_OVERLAY
> >> >> +	select FW_LOADER
> >> >> +	select NVMEM
> >> >> +	select DRM_HOTPLUG_BRIDGE    
> >> >
> >> > Can these be depends instead of select?  'select' causes dependencies
> >> > that are hard, if not almost impossible, to detect at times why
> >> > something is being enabled.    
> >> 
> >> I think FW_LOADER needs to be 'select' since it is normally
> >> a hidden symbol and gets selected by its users, all the other
> >> ones should be 'depends on'.  
> >
> > I see, makes sense.
> >
> > And as you pointed that out, I realize perhaps DRM_HOTPLUG_BRIDGE could
> > become a hidden symbol as it's not expected to be used alone.  
> 
> It's slightly easier to keep it as a visible symbol
> with 'depends on' though, since otherwise you have to
> add 'depends on' statments for anything that DRM_HOTPLUG_BRIDGE
> in turn depends on, most notably DRM itself.

I see, sure. Thanks both, changes applied locally.

Luca
Rob Herring May 10, 2024, 4:44 p.m. UTC | #8
On Fri, May 10, 2024 at 09:10:36AM +0200, Luca Ceresoli wrote:
> Hello,
> 
> this series aims at supporting a Linux device with a connector to
> physically add and remove an add-on to/from the main device to augment its
> features at runtime, using device tree overlays.
> 
> This is the v2 of "drm: add support for hot-pluggable bridges" [0] which
> was however more limited in scope, covering only the DRM aspects. This new
> series also takes a different approach to the DRM bridge instantiation.
> 
> [0] https://lore.kernel.org/all/20240326-hotplug-drm-bridge-v1-0-4b51b5eb75d5@bootlin.com/
> 
> Use case
> ========
> 
> This series targets a professional product (GE SUNH) that is composed of a
> "main" part running on battery, with the main SoC and able to work
> autonomously with limited features, and an optional "add-on" that enables
> more features by adding more hardware peripherals, some of which are on
> non-discoverable busses such as I2C and MIPI DSI.
> 
> The add-on can be connected and disconnected at runtime at any moment by
> the end user, and add-on features need to be enabled and disabled
> automatically at runtime.
> 
> The add-on has status pins that are connected to GPIOs on the main board,
> allowing the CPU to detect add-on insertion and removal. It also has a
> reset GPIO allowign to reset all peripherals on the add-on at once.
> 
> The features provided by the add-on include a display and a battery charger
> to recharge the battery of the main part. The display on the add-on has an
> LVDS input but the connector between the base and the add-on has a MIPI DSI
> bus, so a DSI-to-LVDS bridge is present on the add-on.
> 
> Different add-on models can be connected to the main part, and for this a
> model ID is stored in the add-on itself so the software running on the CPU
> on the main part knows which non-discoverable hardware to probe.
> 
> Overall approach
> ================
> 
> Device tree overlays appear as the most natural solution to support the
> addition and removal of devices from a running system.
> 
> Several features are missing from the mainline Linux kernel in order to
> support this use case:
> 
>  1. runtime (un)loading of device tree overlays is not supported

Not true. Device specific applying of overlays has been supported 
since we merged DT overlay support. What's not supported is a general 
purpose interface to userspace to change any part of the DT at any point 
in time.

>  2. if enabled, overlay (un)loading exposes several bugs

Hence why there is no general purpose interface.

>  3. the DRM subsystem assumes video bridges are non-removable

Rob
Luca Ceresoli May 14, 2024, 5:11 p.m. UTC | #9
Hello Rob,

On Fri, 10 May 2024 11:44:49 -0500
Rob Herring <robh@kernel.org> wrote:

> On Fri, May 10, 2024 at 09:10:36AM +0200, Luca Ceresoli wrote:

[...]

> > Overall approach
> > ================
> > 
> > Device tree overlays appear as the most natural solution to support the
> > addition and removal of devices from a running system.
> > 
> > Several features are missing from the mainline Linux kernel in order to
> > support this use case:
> > 
> >  1. runtime (un)loading of device tree overlays is not supported  
> 
> Not true. Device specific applying of overlays has been supported 
> since we merged DT overlay support. What's not supported is a general 
> purpose interface to userspace to change any part of the DT at any point 
> in time.

I think I should replace "supported" with "exposed [to user space]". In
other words, there is no user of of_overlay_fdt_apply() /
of_overlay_remove*() in mainline Linux, except in unittest. Would it be
a correct rewording?

> >  2. if enabled, overlay (un)loading exposes several bugs  
> 
> Hence why there is no general purpose interface.

Absolutely.

Best regards,
Luca
Daniel Vetter May 16, 2024, 1:22 p.m. UTC | #10
Apologies for missing v1 ...

On Fri, May 10, 2024 at 09:10:36AM +0200, Luca Ceresoli wrote:
> DRM hotplug bridge driver
> =========================
> 
> DRM natively supports pipelines whose display can be removed, but all the
> components preceding it (all the display controller and any bridges) are
> assumed to be fixed and cannot be plugged, removed or modified at runtime.
> 
> This series adds support for DRM pipelines having a removable part after
> the encoder, thus also allowing bridges to be removed and reconnected at
> runtime, possibly with different components.
> 
> This picture summarizes the  DRM structure implemented by this series:
> 
>  .------------------------.
>  |   DISPLAY CONTROLLER   |
>  | .---------.   .------. |
>  | | ENCODER |<--| CRTC | |
>  | '---------'   '------' |
>  '------|-----------------'
>         |
>         |DSI            HOTPLUG
>         V              CONNECTOR
>    .---------.        .--.    .-.        .---------.         .-------.
>    | 0 to N  |        | _|   _| |        | 1 to N  |         |       |
>    | BRIDGES |--DSI-->||_   |_  |--DSI-->| BRIDGES |--LVDS-->| PANEL |
>    |         |        |  |    | |        |         |         |       |
>    '---------'        '--'    '-'        '---------'         '-------'
> 
>  [--- fixed components --]  [----------- removable add-on -----------]
> 
> Fixed components include:
> 
>  * all components up to the DRM encoder, usually part of the SoC
>  * optionally some bridges, in the SoC and/or as external chips
> 
> Components on the removable add-on include:
> 
>  * one or more bridges
>  * a fixed connector (not one natively supporting hotplug such as HDMI)
>  * the panel

So I think at a high level this design approach makes sense, but the
implementation needs some serious thought. One big thing upfront though,
we need to have a clear plan for the overlay hotunload issues, otherwise
trying to make drm bridges hotpluggable makes no sense to me. Hotunload is
very, very tricky, full of lifetime issues, and those need to be sorted
out first or we're just trying to build a castle on quicksand.

For bridges itself I don't think the current locking works. You're trying
to really cleverly hide it all behind a normal-looking bridge driver, but
there's many things beyond that which will blow up if bridges just
disappear. Most importantly the bridge states part of an atomic update.

Now in drm we have drm_connector as the only hotunpluggable thing, and it
took years to sort out all the issues. I think we should either model the
bridge hotunplug locking after that, or just outright reuse the connector
locking and lifetime rules. I much prefer the latter personally.

Anyway the big issues:

- We need to refcount the hotpluggable bridges, because software (like
  atomic state updates) might hang onto pointers for longer than the
  bridge physically exists. Assuming that you can all tear it down
  synchronously will not work.

  If we reuse connector locking/lifetime then we could put the
  hotpluggable part of the bridge chain into the drm_connector, since that
  already has refcounting as needed. It would mean that finding the next
  bridge in the chain becomes a lot more tricky though. With that model
  we'd create a new connector every time the bridge is hotplugged, which I
  think is also the cleaner model (because you might plug in a hdmi
  connector after a panel, so things like the connector type change).
  
- No notifiers please. The create a locking mess with inversions, and
  especially for hotunplug they create the illusion that you can
  synchronously keep up to date with hardware state. That's not possible.
  Fundamentally all bridge drivers which might be hotunplugged need to be
  able to cope with the hardware disappearing any momemnt.

  Most likely changes/fixes we need to make overlay hotunload work will
  impact how exactly this works all ...

  Also note that the entire dance around correctly stopping userspace from
  doing modesets on, see all the relevant changes in
  update_connector_routing(). Relying on hotplugging connectors will sort
  out a lot of these issues in a consistent way.

- Related to this: You're not allowed to shut down hardware behind the
  user's back with drm_atomic_helper_shutdown. We've tried that approach
  with dp mst, it really pisses off userspace when a page_flip that it
  expected to work doesn't work.

- There's also the design aspect that in atomic, only atomic_check is
  allowed to fail, atomic_commit must succeed, even when the hardware is
  gone. Using connectors and their refcounting should help with that.

- Somewhat aside, but I noticed that the bridge->atomic_reset is in
  drm_bridge_attach, and that's kinda the wrong place. It should be in
  drm_mode_config_reset, like all the other ->atomic_reset hooks. That
  would make it a lot clearer that we need to figure out who/when
  ->atomic_reset should be called for hotplugged bridges, maybe as part of
  connector registration when the entire bridge and it's new connector is
  assembled?

- Finally this very much means we need to rethink who/how the connector
  for a bridge is created. The new design is that the main driver creates
  this connector, once the entire bridge exists. But with hotplugging this
  gets a lot more complicated, so we might want to extract a pile of that
  encoder related code from drivers (same way dp mst helpers take care of
  connector creation too, it's just too much of a mess otherwise).

  The current bridge chaining infrastructure requires a lot of hand-rolled
  code in each bridge driver and the encoder, so that might be a good
  thing anyway.

- Finally I think the entire bridge hotplug infrastructure should be
  irrespective of the underlying bus. Which means for the mipi dsi case we
  might also want to look into what's missing to make mipi dsi
  hotunpluggable, at least for the case where it's a proper driver. I
  think we should ignore the old bridge model where driver's stitched it
  all toghether using the component framework, in my opinion that approach
  should be deprecated.

- Finally I think we should have a lot of safety checks, like only bridges
  which declare themselve to be hotunplug safe should be allowed as a part
  of the hotpluggable bridge chain part. All others must still be attached
  before the entire driver is registered with drm_dev_register.

  Or that we only allow bridges with the NO_CONNECTOR flag for
  drm_bridge_attach.

There's probably a pile more fundamental issues I've missed, but this
should get a good discussion started.
-Sima
Luca Ceresoli May 20, 2024, 12:01 p.m. UTC | #11
Hello Daniel,

On Thu, 16 May 2024 15:22:01 +0200
Daniel Vetter <daniel@ffwll.ch> wrote:

> Apologies for missing v1 ...
> 
> On Fri, May 10, 2024 at 09:10:36AM +0200, Luca Ceresoli wrote:
> > DRM hotplug bridge driver
> > =========================
> > 
> > DRM natively supports pipelines whose display can be removed, but all the
> > components preceding it (all the display controller and any bridges) are
> > assumed to be fixed and cannot be plugged, removed or modified at runtime.
> > 
> > This series adds support for DRM pipelines having a removable part after
> > the encoder, thus also allowing bridges to be removed and reconnected at
> > runtime, possibly with different components.
> > 
> > This picture summarizes the  DRM structure implemented by this series:
> > 
> >  .------------------------.
> >  |   DISPLAY CONTROLLER   |
> >  | .---------.   .------. |
> >  | | ENCODER |<--| CRTC | |
> >  | '---------'   '------' |
> >  '------|-----------------'
> >         |
> >         |DSI            HOTPLUG
> >         V              CONNECTOR
> >    .---------.        .--.    .-.        .---------.         .-------.
> >    | 0 to N  |        | _|   _| |        | 1 to N  |         |       |
> >    | BRIDGES |--DSI-->||_   |_  |--DSI-->| BRIDGES |--LVDS-->| PANEL |
> >    |         |        |  |    | |        |         |         |       |
> >    '---------'        '--'    '-'        '---------'         '-------'
> > 
> >  [--- fixed components --]  [----------- removable add-on -----------]
> > 
> > Fixed components include:
> > 
> >  * all components up to the DRM encoder, usually part of the SoC
> >  * optionally some bridges, in the SoC and/or as external chips
> > 
> > Components on the removable add-on include:
> > 
> >  * one or more bridges
> >  * a fixed connector (not one natively supporting hotplug such as HDMI)
> >  * the panel  
> 
> So I think at a high level this design approach makes sense,

Good starting point :)

> but the
> implementation needs some serious thought. One big thing upfront though,
> we need to have a clear plan for the overlay hotunload issues, otherwise
> trying to make drm bridges hotpluggable makes no sense to me. Hotunload is
> very, very tricky, full of lifetime issues, and those need to be sorted
> out first or we're just trying to build a castle on quicksand.
> 
> For bridges itself I don't think the current locking works. You're trying
> to really cleverly hide it all behind a normal-looking bridge driver, but
> there's many things beyond that which will blow up if bridges just
> disappear. Most importantly the bridge states part of an atomic update.

Surely possible as atomic updates are definitely not stimulated in my
use case. Can you recommend any testing tools to be able to trigger any
issues?

The main setups I used for my testing so far are 'modetest -s' for my
daily work and a simple weston setup to periodically test a complete
user space stack.

> Now in drm we have drm_connector as the only hotunpluggable thing, and it
> took years to sort out all the issues. I think we should either model the
> bridge hotunplug locking after that, or just outright reuse the connector
> locking and lifetime rules. I much prefer the latter personally.
> 
> Anyway the big issues:
> 
> - We need to refcount the hotpluggable bridges, because software (like
>   atomic state updates) might hang onto pointers for longer than the
>   bridge physically exists. Assuming that you can all tear it down
>   synchronously will not work.
> 
>   If we reuse connector locking/lifetime then we could put the
>   hotpluggable part of the bridge chain into the drm_connector, since that
>   already has refcounting as needed. It would mean that finding the next
>   bridge in the chain becomes a lot more tricky though. With that model
>   we'd create a new connector every time the bridge is hotplugged, which I
>   think is also the cleaner model (because you might plug in a hdmi
>   connector after a panel, so things like the connector type change).

I have been investigating the option of adding/removing a connector
based on hot-plug/unplug events initially, see my reply to Maxime after
v1 [0]:

> The first approach is based on removing the drm_connector. My laptop
> uses the i915 driver, and I have observed that attaching/removing a
> USB-C dock with an HDMI connector connected to a monitor, a new
> drm_connector appears/disappears for the card. User space gets notified
> and the external monitor is enabled/disabled, just the way a desktop
> user would expect, so this is possible. I had a look at the driver but
> how this magic happens was not clear to me honestly.

So if you could point to where/how this is done, I would certainly
reconsider that.

> - No notifiers please. The create a locking mess with inversions, and
>   especially for hotunplug they create the illusion that you can
>   synchronously keep up to date with hardware state. That's not possible.
>   Fundamentally all bridge drivers which might be hotunplugged need to be
>   able to cope with the hardware disappearing any momemnt.

Can you clarify this point? I'm sorry I fail to understand the
relationship between the use of notifiers and the ability of bridge
drivers to cope with hardware disappearing.

In this patch, the hotplug-bridge uses notifiers to be informed when
the following bridge is disappearing: which other way would you suggest
to make the hotplug-bridge aware of that? OTOH the hotplug-bridge is
not meant to disappear, and it has no actual hardware that can go away,
being just a mechanical connector.

On the opposite side, the following bridges are physically removable
and so their drivers will have to be fixed (if needed) to work when
removal happens, but I don't see how that relates to the DRM core
emitting a notification of such bridges being removed.

> - Related to this: You're not allowed to shut down hardware behind the
>   user's back with drm_atomic_helper_shutdown. We've tried that approach
>   with dp mst, it really pisses off userspace when a page_flip that it
>   expected to work doesn't work.
> 
> - There's also the design aspect that in atomic, only atomic_check is
>   allowed to fail, atomic_commit must succeed, even when the hardware is
>   gone. Using connectors and their refcounting should help with that.

IIUC here you are suggesting again to remove the connector instead of
marking it "disconnected". So, as above, pointers on how that is
achieved would be helpful.

> - Somewhat aside, but I noticed that the bridge->atomic_reset is in
>   drm_bridge_attach, and that's kinda the wrong place. It should be in
>   drm_mode_config_reset, like all the other ->atomic_reset hooks. That
>   would make it a lot clearer that we need to figure out who/when
>   ->atomic_reset should be called for hotplugged bridges, maybe as part of  
>   connector registration when the entire bridge and it's new connector is
>   assembled?
> 
> - Finally this very much means we need to rethink who/how the connector
>   for a bridge is created. The new design is that the main driver creates
>   this connector, once the entire bridge exists. But with hotplugging this
>   gets a lot more complicated, so we might want to extract a pile of that
>   encoder related code from drivers (same way dp mst helpers take care of
>   connector creation too, it's just too much of a mess otherwise).
> 
>   The current bridge chaining infrastructure requires a lot of hand-rolled
>   code in each bridge driver and the encoder, so that might be a good
>   thing anyway.
> 
> - Finally I think the entire bridge hotplug infrastructure should be
>   irrespective of the underlying bus. Which means for the mipi dsi case we
>   might also want to look into what's missing to make mipi dsi
>   hotunpluggable, at least for the case where it's a proper driver. I
>   think we should ignore the old bridge model where driver's stitched it
>   all toghether using the component framework, in my opinion that approach
>   should be deprecated.

The DSI side was one of my headaches on writing this driver, and I
must say I dislike the code in hotplug_bridge_dsi_attach(), with the
need for an initial "dummy" attach and detach the first time. At first
sight I think we need a .update_format callback in struct
mipi_dsi_host_ops so the DSI host is aware of a format change.

Right now there are DSI host drivers which keep a copy of the struct
mipi_dsi_device pointer and read the format from there when starting a
stream (e.g. the tc358768.c driver [1]). That somewhat provides a way
to react to format changes, but keeping a pointer is bad when the DSI
device hot-unplugs, and the new format won't be seen until a
disable/enable cycle. So a new callback looks more robust overall.

Any opinion about this?

> - Finally I think we should have a lot of safety checks, like only bridges
>   which declare themselve to be hotunplug safe should be allowed as a part
>   of the hotpluggable bridge chain part. All others must still be attached
>   before the entire driver is registered with drm_dev_register.

I'm fine with the principle of a "HOTPLUG" flag.

I wonder how that should be implemented, though. Based on my (surely
simplistic) understanding, right now bridges can be both added and
removed, but only in a specific sequence:

 1. add all bridges
 2. use the card
 3. remove all bridges
 4. EOF

We'd need to change to allow:

 1. add fixed bridges (including hotplug-bridge)
 2. add bridges on removable add-on
 3. use card
 4. remove bridges on removable add-on
 5. optionally goto 2
 6. remove fixed bridges (including hotplug-bridge)
 7. EOF

One naïve idea is that the DRM core keeps a flag whenever any fixed
bridge (no HOTLPUG flag) is removed and when that happens forbid adding
bridges as we are now at line 5.

Aside for tons of subtleties I'm surely missing, does this look a
proper approach? I'd not be surprised if there is something better and
more solid.

>   Or that we only allow bridges with the NO_CONNECTOR flag for
>   drm_bridge_attach.
> 
> There's probably a pile more fundamental issues I've missed, but this
> should get a good discussion started.

Sure, I think it has.

Bottom line, I'm clearly not seeing some issues that need to be
considered, and that do not show up for my use case. Based on my
limited DRM knowledge, this was expected, and I'm glad to work on those
issues with some practical indications about the path forward.

Luca

[0] https://lore.kernel.org/all/20240327170849.0c14728d@booty/
[1] https://elixir.bootlin.com/linux/v6.9.1/source/drivers/gpu/drm/bridge/tc358768.c#L435
Daniel Vetter May 21, 2024, 12:01 p.m. UTC | #12
On Mon, May 20, 2024 at 02:01:48PM +0200, Luca Ceresoli wrote:
> Hello Daniel,
> 
> On Thu, 16 May 2024 15:22:01 +0200
> Daniel Vetter <daniel@ffwll.ch> wrote:
> 
> > Apologies for missing v1 ...
> > 
> > On Fri, May 10, 2024 at 09:10:36AM +0200, Luca Ceresoli wrote:
> > > DRM hotplug bridge driver
> > > =========================
> > > 
> > > DRM natively supports pipelines whose display can be removed, but all the
> > > components preceding it (all the display controller and any bridges) are
> > > assumed to be fixed and cannot be plugged, removed or modified at runtime.
> > > 
> > > This series adds support for DRM pipelines having a removable part after
> > > the encoder, thus also allowing bridges to be removed and reconnected at
> > > runtime, possibly with different components.
> > > 
> > > This picture summarizes the  DRM structure implemented by this series:
> > > 
> > >  .------------------------.
> > >  |   DISPLAY CONTROLLER   |
> > >  | .---------.   .------. |
> > >  | | ENCODER |<--| CRTC | |
> > >  | '---------'   '------' |
> > >  '------|-----------------'
> > >         |
> > >         |DSI            HOTPLUG
> > >         V              CONNECTOR
> > >    .---------.        .--.    .-.        .---------.         .-------.
> > >    | 0 to N  |        | _|   _| |        | 1 to N  |         |       |
> > >    | BRIDGES |--DSI-->||_   |_  |--DSI-->| BRIDGES |--LVDS-->| PANEL |
> > >    |         |        |  |    | |        |         |         |       |
> > >    '---------'        '--'    '-'        '---------'         '-------'
> > > 
> > >  [--- fixed components --]  [----------- removable add-on -----------]
> > > 
> > > Fixed components include:
> > > 
> > >  * all components up to the DRM encoder, usually part of the SoC
> > >  * optionally some bridges, in the SoC and/or as external chips
> > > 
> > > Components on the removable add-on include:
> > > 
> > >  * one or more bridges
> > >  * a fixed connector (not one natively supporting hotplug such as HDMI)
> > >  * the panel  
> > 
> > So I think at a high level this design approach makes sense,
> 
> Good starting point :)
> 
> > but the
> > implementation needs some serious thought. One big thing upfront though,
> > we need to have a clear plan for the overlay hotunload issues, otherwise
> > trying to make drm bridges hotpluggable makes no sense to me. Hotunload is
> > very, very tricky, full of lifetime issues, and those need to be sorted
> > out first or we're just trying to build a castle on quicksand.
> > 
> > For bridges itself I don't think the current locking works. You're trying
> > to really cleverly hide it all behind a normal-looking bridge driver, but
> > there's many things beyond that which will blow up if bridges just
> > disappear. Most importantly the bridge states part of an atomic update.
> 
> Surely possible as atomic updates are definitely not stimulated in my
> use case. Can you recommend any testing tools to be able to trigger any
> issues?

Uh really hard ... You'd need to create an atomic commit that's blocked on
a sync_file in-fence (so that you can extend the race window). And then
hotunplug the bridge chain _before_ you signal that fence.

That's not going to cover all possible races, but at least a large chunk
of the really big ones.

> The main setups I used for my testing so far are 'modetest -s' for my
> daily work and a simple weston setup to periodically test a complete
> user space stack.
> 
> > Now in drm we have drm_connector as the only hotunpluggable thing, and it
> > took years to sort out all the issues. I think we should either model the
> > bridge hotunplug locking after that, or just outright reuse the connector
> > locking and lifetime rules. I much prefer the latter personally.
> > 
> > Anyway the big issues:
> > 
> > - We need to refcount the hotpluggable bridges, because software (like
> >   atomic state updates) might hang onto pointers for longer than the
> >   bridge physically exists. Assuming that you can all tear it down
> >   synchronously will not work.
> > 
> >   If we reuse connector locking/lifetime then we could put the
> >   hotpluggable part of the bridge chain into the drm_connector, since that
> >   already has refcounting as needed. It would mean that finding the next
> >   bridge in the chain becomes a lot more tricky though. With that model
> >   we'd create a new connector every time the bridge is hotplugged, which I
> >   think is also the cleaner model (because you might plug in a hdmi
> >   connector after a panel, so things like the connector type change).
> 
> I have been investigating the option of adding/removing a connector
> based on hot-plug/unplug events initially, see my reply to Maxime after
> v1 [0]:
> 
> > The first approach is based on removing the drm_connector. My laptop
> > uses the i915 driver, and I have observed that attaching/removing a
> > USB-C dock with an HDMI connector connected to a monitor, a new
> > drm_connector appears/disappears for the card. User space gets notified
> > and the external monitor is enabled/disabled, just the way a desktop
> > user would expect, so this is possible. I had a look at the driver but
> > how this magic happens was not clear to me honestly.
> 
> So if you could point to where/how this is done, I would certainly
> reconsider that.

Right now only the dp mst code uses hotplug/unplug (like you've observed
in your testing with i915, usb-c docks are generally all dp mst). For code
reading it might be best to start with the i915 dp mst code and then go
through the helpers.

> > - No notifiers please. The create a locking mess with inversions, and
> >   especially for hotunplug they create the illusion that you can
> >   synchronously keep up to date with hardware state. That's not possible.
> >   Fundamentally all bridge drivers which might be hotunplugged need to be
> >   able to cope with the hardware disappearing any momemnt.
> 
> Can you clarify this point? I'm sorry I fail to understand the
> relationship between the use of notifiers and the ability of bridge
> drivers to cope with hardware disappearing.
> 
> In this patch, the hotplug-bridge uses notifiers to be informed when
> the following bridge is disappearing: which other way would you suggest
> to make the hotplug-bridge aware of that? OTOH the hotplug-bridge is
> not meant to disappear, and it has no actual hardware that can go away,
> being just a mechanical connector.

Yeah you need code to handle that. My point is that using a notifier is
the wrong design, because the notifier has it's own locking. Which tends
to get in the way.

Instead I think that code should be directly in core bridge code (I don't
see the benefit of having this entirely in a separate driver), using drm
locking to make sure there's no races.

> On the opposite side, the following bridges are physically removable
> and so their drivers will have to be fixed (if needed) to work when
> removal happens, but I don't see how that relates to the DRM core
> emitting a notification of such bridges being removed.

Yeah it's not directly related, just my experience that people assume
notifiers provide you more synchronization and race-preventation than they
really do. So it's better to hand-roll to make it all really explicit.

> > - Related to this: You're not allowed to shut down hardware behind the
> >   user's back with drm_atomic_helper_shutdown. We've tried that approach
> >   with dp mst, it really pisses off userspace when a page_flip that it
> >   expected to work doesn't work.
> > 
> > - There's also the design aspect that in atomic, only atomic_check is
> >   allowed to fail, atomic_commit must succeed, even when the hardware is
> >   gone. Using connectors and their refcounting should help with that.
> 
> IIUC here you are suggesting again to remove the connector instead of
> marking it "disconnected". So, as above, pointers on how that is
> achieved would be helpful.

See dp mst code. It's complex unfortunately, so there's some reading
involved :-/
>
> > - Somewhat aside, but I noticed that the bridge->atomic_reset is in
> >   drm_bridge_attach, and that's kinda the wrong place. It should be in
> >   drm_mode_config_reset, like all the other ->atomic_reset hooks. That
> >   would make it a lot clearer that we need to figure out who/when
> >   ->atomic_reset should be called for hotplugged bridges, maybe as part of  
> >   connector registration when the entire bridge and it's new connector is
> >   assembled?
> > 
> > - Finally this very much means we need to rethink who/how the connector
> >   for a bridge is created. The new design is that the main driver creates
> >   this connector, once the entire bridge exists. But with hotplugging this
> >   gets a lot more complicated, so we might want to extract a pile of that
> >   encoder related code from drivers (same way dp mst helpers take care of
> >   connector creation too, it's just too much of a mess otherwise).
> > 
> >   The current bridge chaining infrastructure requires a lot of hand-rolled
> >   code in each bridge driver and the encoder, so that might be a good
> >   thing anyway.
> > 
> > - Finally I think the entire bridge hotplug infrastructure should be
> >   irrespective of the underlying bus. Which means for the mipi dsi case we
> >   might also want to look into what's missing to make mipi dsi
> >   hotunpluggable, at least for the case where it's a proper driver. I
> >   think we should ignore the old bridge model where driver's stitched it
> >   all toghether using the component framework, in my opinion that approach
> >   should be deprecated.
> 
> The DSI side was one of my headaches on writing this driver, and I
> must say I dislike the code in hotplug_bridge_dsi_attach(), with the
> need for an initial "dummy" attach and detach the first time. At first
> sight I think we need a .update_format callback in struct
> mipi_dsi_host_ops so the DSI host is aware of a format change.
> 
> Right now there are DSI host drivers which keep a copy of the struct
> mipi_dsi_device pointer and read the format from there when starting a
> stream (e.g. the tc358768.c driver [1]). That somewhat provides a way
> to react to format changes, but keeping a pointer is bad when the DSI
> device hot-unplugs, and the new format won't be seen until a
> disable/enable cycle. So a new callback looks more robust overall.
> 
> Any opinion about this?

Yeah I don't like the code either.

What might help for prototyping is if you start with a hotpluggeable
bridge where the bridge is a i2c device. That way I think we should be
able to benefit from the driver bind/unbind code that exists already.
Although there's going to be issues with i2c hotunplug too in i2c core
code.

Then lift whatever we learn there to our dsi code. But essentially I think
we should model the driver core model a lot more, so I guess you could use
any hotunplug capable bus as a template. Usb might be closest, since
that's also a packet/message based interface, mostly at least?

> > - Finally I think we should have a lot of safety checks, like only bridges
> >   which declare themselve to be hotunplug safe should be allowed as a part
> >   of the hotpluggable bridge chain part. All others must still be attached
> >   before the entire driver is registered with drm_dev_register.
> 
> I'm fine with the principle of a "HOTPLUG" flag.
> 
> I wonder how that should be implemented, though. Based on my (surely
> simplistic) understanding, right now bridges can be both added and
> removed, but only in a specific sequence:
> 
>  1. add all bridges
>  2. use the card
>  3. remove all bridges
>  4. EOF
> 
> We'd need to change to allow:
> 
>  1. add fixed bridges (including hotplug-bridge)
>  2. add bridges on removable add-on
>  3. use card
>  4. remove bridges on removable add-on
>  5. optionally goto 2
>  6. remove fixed bridges (including hotplug-bridge)
>  7. EOF
> 
> One naïve idea is that the DRM core keeps a flag whenever any fixed
> bridge (no HOTLPUG flag) is removed and when that happens forbid adding
> bridges as we are now at line 5.

If a fixed bridge is removed while the driver is still in use (i.e. before
drm_dev_unregister is called), that's a driver bug. Would be good to catch
that, which is why I think a lot of all the bridge hotplug handling should
be in core bridge code and not the separate hotplug driver, so that we can
enforce all these constraints.

Also conceptually 3 can happen before 2 (but also before), and that's how
dp mst works too. It does add all kinds of complications though ...

> Aside for tons of subtleties I'm surely missing, does this look a
> proper approach? I'd not be surprised if there is something better and
> more solid.

Yeah roughly. If you look through dp mst code then that also adds all
kinds of structures (since dp mst is a routed network really), not just
the connectors. They also all come with refcounts (because the network is
a tree), but like I said I think for your case we can avoid the per-bridge
refcounts by relying on the connector refcount we have already.

But I might be overlooking something, and we need refcounting for each
bridge like dp mst also needs refcounting for all its internal structures
that map the entire hotpluggable display chain. If you want to read some
dp mst code, these internal structures are ports/branches with struct
drm_dp_mst_branch/port.

> >   Or that we only allow bridges with the NO_CONNECTOR flag for
> >   drm_bridge_attach.
> > 
> > There's probably a pile more fundamental issues I've missed, but this
> > should get a good discussion started.
> 
> Sure, I think it has.
> 
> Bottom line, I'm clearly not seeing some issues that need to be
> considered, and that do not show up for my use case. Based on my
> limited DRM knowledge, this was expected, and I'm glad to work on those
> issues with some practical indications about the path forward.

Yeah for learning I think dp mst is best. It's a bit complex, but since
you have a dock you should be able to play around and experiment with the
code with some real hardware.

That should help to get a bit a feel for the complexity, since lots of
opportunities for you to ask why/how locking/refcounting is used and
against which potential issue they protect.

Cheers, Sima
Luca Ceresoli June 5, 2024, 2:47 p.m. UTC | #13
Hello Sima,

thanks again for your comments. Not yet enough for me to clearly focus
what you are suggesting, but getting closer. See my questions below.

On Tue, 21 May 2024 14:01:19 +0200
Daniel Vetter <daniel@ffwll.ch> wrote:

> On Mon, May 20, 2024 at 02:01:48PM +0200, Luca Ceresoli wrote:
> > Hello Daniel,
> > 
> > On Thu, 16 May 2024 15:22:01 +0200
> > Daniel Vetter <daniel@ffwll.ch> wrote:
> >   
> > > Apologies for missing v1 ...
> > > 
> > > On Fri, May 10, 2024 at 09:10:36AM +0200, Luca Ceresoli wrote:  
> > > > DRM hotplug bridge driver
> > > > =========================
> > > > 
> > > > DRM natively supports pipelines whose display can be removed, but all the
> > > > components preceding it (all the display controller and any bridges) are
> > > > assumed to be fixed and cannot be plugged, removed or modified at runtime.
> > > > 
> > > > This series adds support for DRM pipelines having a removable part after
> > > > the encoder, thus also allowing bridges to be removed and reconnected at
> > > > runtime, possibly with different components.
> > > > 
> > > > This picture summarizes the  DRM structure implemented by this series:
> > > > 
> > > >  .------------------------.
> > > >  |   DISPLAY CONTROLLER   |
> > > >  | .---------.   .------. |
> > > >  | | ENCODER |<--| CRTC | |
> > > >  | '---------'   '------' |
> > > >  '------|-----------------'
> > > >         |
> > > >         |DSI            HOTPLUG
> > > >         V              CONNECTOR
> > > >    .---------.        .--.    .-.        .---------.         .-------.
> > > >    | 0 to N  |        | _|   _| |        | 1 to N  |         |       |
> > > >    | BRIDGES |--DSI-->||_   |_  |--DSI-->| BRIDGES |--LVDS-->| PANEL |
> > > >    |         |        |  |    | |        |         |         |       |
> > > >    '---------'        '--'    '-'        '---------'         '-------'
> > > > 
> > > >  [--- fixed components --]  [----------- removable add-on -----------]
> > > > 
> > > > Fixed components include:
> > > > 
> > > >  * all components up to the DRM encoder, usually part of the SoC
> > > >  * optionally some bridges, in the SoC and/or as external chips
> > > > 
> > > > Components on the removable add-on include:
> > > > 
> > > >  * one or more bridges
> > > >  * a fixed connector (not one natively supporting hotplug such as HDMI)
> > > >  * the panel    
> > > 
> > > So I think at a high level this design approach makes sense,  
> > 
> > Good starting point :)
> >   
> > > but the
> > > implementation needs some serious thought. One big thing upfront though,
> > > we need to have a clear plan for the overlay hotunload issues, otherwise
> > > trying to make drm bridges hotpluggable makes no sense to me. Hotunload is
> > > very, very tricky, full of lifetime issues, and those need to be sorted
> > > out first or we're just trying to build a castle on quicksand.
> > > 
> > > For bridges itself I don't think the current locking works. You're trying
> > > to really cleverly hide it all behind a normal-looking bridge driver, but
> > > there's many things beyond that which will blow up if bridges just
> > > disappear. Most importantly the bridge states part of an atomic update.  
> > 
> > Surely possible as atomic updates are definitely not stimulated in my
> > use case. Can you recommend any testing tools to be able to trigger any
> > issues?  
> 
> Uh really hard ... You'd need to create an atomic commit that's blocked on
> a sync_file in-fence (so that you can extend the race window). And then
> hotunplug the bridge chain _before_ you signal that fence.
> 
> That's not going to cover all possible races, but at least a large chunk
> of the really big ones.
> 
> > The main setups I used for my testing so far are 'modetest -s' for my
> > daily work and a simple weston setup to periodically test a complete
> > user space stack.
> >   
> > > Now in drm we have drm_connector as the only hotunpluggable thing, and it
> > > took years to sort out all the issues. I think we should either model the
> > > bridge hotunplug locking after that, or just outright reuse the connector
> > > locking and lifetime rules. I much prefer the latter personally.
> > > 
> > > Anyway the big issues:
> > > 
> > > - We need to refcount the hotpluggable bridges, because software (like
> > >   atomic state updates) might hang onto pointers for longer than the
> > >   bridge physically exists. Assuming that you can all tear it down
> > >   synchronously will not work.
> > > 
> > >   If we reuse connector locking/lifetime then we could put the
> > >   hotpluggable part of the bridge chain into the drm_connector, since that
> > >   already has refcounting as needed. It would mean that finding the next
> > >   bridge in the chain becomes a lot more tricky though. With that model
> > >   we'd create a new connector every time the bridge is hotplugged, which I
> > >   think is also the cleaner model (because you might plug in a hdmi
> > >   connector after a panel, so things like the connector type change).  
> > 
> > I have been investigating the option of adding/removing a connector
> > based on hot-plug/unplug events initially, see my reply to Maxime after
> > v1 [0]:
> >   
> > > The first approach is based on removing the drm_connector. My laptop
> > > uses the i915 driver, and I have observed that attaching/removing a
> > > USB-C dock with an HDMI connector connected to a monitor, a new
> > > drm_connector appears/disappears for the card. User space gets notified
> > > and the external monitor is enabled/disabled, just the way a desktop
> > > user would expect, so this is possible. I had a look at the driver but
> > > how this magic happens was not clear to me honestly.  
> > 
> > So if you could point to where/how this is done, I would certainly
> > reconsider that.  
> 
> Right now only the dp mst code uses hotplug/unplug (like you've observed
> in your testing with i915, usb-c docks are generally all dp mst). For code
> reading it might be best to start with the i915 dp mst code and then go
> through the helpers.
> 
> > > - No notifiers please. The create a locking mess with inversions, and
> > >   especially for hotunplug they create the illusion that you can
> > >   synchronously keep up to date with hardware state. That's not possible.
> > >   Fundamentally all bridge drivers which might be hotunplugged need to be
> > >   able to cope with the hardware disappearing any momemnt.  
> > 
> > Can you clarify this point? I'm sorry I fail to understand the
> > relationship between the use of notifiers and the ability of bridge
> > drivers to cope with hardware disappearing.
> > 
> > In this patch, the hotplug-bridge uses notifiers to be informed when
> > the following bridge is disappearing: which other way would you suggest
> > to make the hotplug-bridge aware of that? OTOH the hotplug-bridge is
> > not meant to disappear, and it has no actual hardware that can go away,
> > being just a mechanical connector.  
> 
> Yeah you need code to handle that. My point is that using a notifier is
> the wrong design, because the notifier has it's own locking. Which tends
> to get in the way.
> 
> Instead I think that code should be directly in core bridge code (I don't
> see the benefit of having this entirely in a separate driver), using drm
> locking to make sure there's no races.
> 
> > On the opposite side, the following bridges are physically removable
> > and so their drivers will have to be fixed (if needed) to work when
> > removal happens, but I don't see how that relates to the DRM core
> > emitting a notification of such bridges being removed.  
> 
> Yeah it's not directly related, just my experience that people assume
> notifiers provide you more synchronization and race-preventation than they
> really do. So it's better to hand-roll to make it all really explicit.
> 
> > > - Related to this: You're not allowed to shut down hardware behind the
> > >   user's back with drm_atomic_helper_shutdown. We've tried that approach
> > >   with dp mst, it really pisses off userspace when a page_flip that it
> > >   expected to work doesn't work.
> > > 
> > > - There's also the design aspect that in atomic, only atomic_check is
> > >   allowed to fail, atomic_commit must succeed, even when the hardware is
> > >   gone. Using connectors and their refcounting should help with that.  
> > 
> > IIUC here you are suggesting again to remove the connector instead of
> > marking it "disconnected". So, as above, pointers on how that is
> > achieved would be helpful.  
> 
> See dp mst code. It's complex unfortunately, so there's some reading
> involved :-/
> >  
> > > - Somewhat aside, but I noticed that the bridge->atomic_reset is in
> > >   drm_bridge_attach, and that's kinda the wrong place. It should be in
> > >   drm_mode_config_reset, like all the other ->atomic_reset hooks. That
> > >   would make it a lot clearer that we need to figure out who/when  
> > >   ->atomic_reset should be called for hotplugged bridges, maybe as part of    
> > >   connector registration when the entire bridge and it's new connector is
> > >   assembled?
> > > 
> > > - Finally this very much means we need to rethink who/how the connector
> > >   for a bridge is created. The new design is that the main driver creates
> > >   this connector, once the entire bridge exists. But with hotplugging this
> > >   gets a lot more complicated, so we might want to extract a pile of that
> > >   encoder related code from drivers (same way dp mst helpers take care of
> > >   connector creation too, it's just too much of a mess otherwise).
> > > 
> > >   The current bridge chaining infrastructure requires a lot of hand-rolled
> > >   code in each bridge driver and the encoder, so that might be a good
> > >   thing anyway.
> > > 
> > > - Finally I think the entire bridge hotplug infrastructure should be
> > >   irrespective of the underlying bus. Which means for the mipi dsi case we
> > >   might also want to look into what's missing to make mipi dsi
> > >   hotunpluggable, at least for the case where it's a proper driver. I
> > >   think we should ignore the old bridge model where driver's stitched it
> > >   all toghether using the component framework, in my opinion that approach
> > >   should be deprecated.  
> > 
> > The DSI side was one of my headaches on writing this driver, and I
> > must say I dislike the code in hotplug_bridge_dsi_attach(), with the
> > need for an initial "dummy" attach and detach the first time. At first
> > sight I think we need a .update_format callback in struct
> > mipi_dsi_host_ops so the DSI host is aware of a format change.
> > 
> > Right now there are DSI host drivers which keep a copy of the struct
> > mipi_dsi_device pointer and read the format from there when starting a
> > stream (e.g. the tc358768.c driver [1]). That somewhat provides a way
> > to react to format changes, but keeping a pointer is bad when the DSI
> > device hot-unplugs, and the new format won't be seen until a
> > disable/enable cycle. So a new callback looks more robust overall.
> > 
> > Any opinion about this?  
> 
> Yeah I don't like the code either.
> 
> What might help for prototyping is if you start with a hotpluggeable
> bridge where the bridge is a i2c device. That way I think we should be
> able to benefit from the driver bind/unbind code that exists already.
> Although there's going to be issues with i2c hotunplug too in i2c core
> code.

Good news: the bridge here is an I2C device (ti-sn65dsi84, dsi-to-lvds).

> Then lift whatever we learn there to our dsi code. But essentially I think
> we should model the driver core model a lot more, so I guess you could use
> any hotunplug capable bus as a template. Usb might be closest, since
> that's also a packet/message based interface, mostly at least?
> 
> > > - Finally I think we should have a lot of safety checks, like only bridges
> > >   which declare themselve to be hotunplug safe should be allowed as a part
> > >   of the hotpluggable bridge chain part. All others must still be attached
> > >   before the entire driver is registered with drm_dev_register.  
> > 
> > I'm fine with the principle of a "HOTPLUG" flag.
> > 
> > I wonder how that should be implemented, though. Based on my (surely
> > simplistic) understanding, right now bridges can be both added and
> > removed, but only in a specific sequence:
> > 
> >  1. add all bridges
> >  2. use the card
> >  3. remove all bridges
> >  4. EOF
> > 
> > We'd need to change to allow:
> > 
> >  1. add fixed bridges (including hotplug-bridge)
> >  2. add bridges on removable add-on
> >  3. use card
> >  4. remove bridges on removable add-on
> >  5. optionally goto 2
> >  6. remove fixed bridges (including hotplug-bridge)
> >  7. EOF
> > 
> > One naïve idea is that the DRM core keeps a flag whenever any fixed
> > bridge (no HOTLPUG flag) is removed and when that happens forbid adding
> > bridges as we are now at line 5.  
> 
> If a fixed bridge is removed while the driver is still in use (i.e. before
> drm_dev_unregister is called), that's a driver bug. Would be good to catch
> that, which is why I think a lot of all the bridge hotplug handling should
> be in core bridge code and not the separate hotplug driver, so that we can
> enforce all these constraints.
> 
> Also conceptually 3 can happen before 2 (but also before), and that's how
> dp mst works too. It does add all kinds of complications though ...
> 
> > Aside for tons of subtleties I'm surely missing, does this look a
> > proper approach? I'd not be surprised if there is something better and
> > more solid.  
> 
> Yeah roughly. If you look through dp mst code then that also adds all
> kinds of structures (since dp mst is a routed network really), not just
> the connectors. They also all come with refcounts (because the network is
> a tree), but like I said I think for your case we can avoid the per-bridge
> refcounts by relying on the connector refcount we have already.
> 
> But I might be overlooking something, and we need refcounting for each
> bridge like dp mst also needs refcounting for all its internal structures
> that map the entire hotpluggable display chain. If you want to read some
> dp mst code, these internal structures are ports/branches with struct
> drm_dp_mst_branch/port.
> 
> > >   Or that we only allow bridges with the NO_CONNECTOR flag for
> > >   drm_bridge_attach.
> > > 
> > > There's probably a pile more fundamental issues I've missed, but this
> > > should get a good discussion started.  
> > 
> > Sure, I think it has.
> > 
> > Bottom line, I'm clearly not seeing some issues that need to be
> > considered, and that do not show up for my use case. Based on my
> > limited DRM knowledge, this was expected, and I'm glad to work on those
> > issues with some practical indications about the path forward.  
> 
> Yeah for learning I think dp mst is best. It's a bit complex, but since
> you have a dock you should be able to play around and experiment with the
> code with some real hardware.
> 
> That should help to get a bit a feel for the complexity, since lots of
> opportunities for you to ask why/how locking/refcounting is used and
> against which potential issue they protect.

I had an initial quick look at DP MSI and it looks like a huge amount
code where it's going to take forever to find my way without guidance.
Keep in mind there is no DP hardware involved in this whole work: as I
see it, DP MST is just one user (possibly the only one currently) of
runtime connector insertion and removal.

This is my understanding so far:

 1. DP MST creates/destroys connectors at runtime, based on MST protocol
    info the MST devices use to announce themselves
 2. the DRM core has no idea of how these connectors relate to each
    other (e.g. DP-1 is a child of DP-0, just another unrelated
    connector); only the drivers know, and implement this leveraging the
    dp_mst helpers
 3. DP MST does not involve DRM bridges: everything after the "first"
    connector (USB-C for docks) is just "DP MST stuff"

Point 3 is problematic as my primary need is exactly to have a
removable bridge. Besides, its creation is not under my driver control,
being done via device tree, and there could even be more than 1 bridge,
or even no bridges at all on the removable side.

I'd appreciate your guidance especially about:

 * test atomic change to expose any lifetime bugs, as you suggested
 * which are the exact APIs to handle connector insertion/removal (as
   done by DP MST)
 * comments about the three points above, especially point 3

Pointers to code or other specific indications that lead me to the
exact code would be very appreciated, and definitely save me days of
digging into complex code without a precise knowledge about where to go.

Luca
Luca Ceresoli Aug. 23, 2024, 10:39 a.m. UTC | #14
Hello Sima,

these days I started looking in more detail into some of the topics you
had mentioned in your v2 review. I have questions about those I have
investigated, see below.

On Tue, 21 May 2024 14:01:19 +0200
Daniel Vetter <daniel@ffwll.ch> wrote:

> On Mon, May 20, 2024 at 02:01:48PM +0200, Luca Ceresoli wrote:
> > Hello Daniel,
> > 
> > On Thu, 16 May 2024 15:22:01 +0200
> > Daniel Vetter <daniel@ffwll.ch> wrote:
> >   
> > > Apologies for missing v1 ...
> > > 
> > > On Fri, May 10, 2024 at 09:10:36AM +0200, Luca Ceresoli wrote:  
> > > > DRM hotplug bridge driver
> > > > =========================
> > > > 
> > > > DRM natively supports pipelines whose display can be removed, but all the
> > > > components preceding it (all the display controller and any bridges) are
> > > > assumed to be fixed and cannot be plugged, removed or modified at runtime.
> > > > 
> > > > This series adds support for DRM pipelines having a removable part after
> > > > the encoder, thus also allowing bridges to be removed and reconnected at
> > > > runtime, possibly with different components.
> > > > 
> > > > This picture summarizes the  DRM structure implemented by this series:
> > > > 
> > > >  .------------------------.
> > > >  |   DISPLAY CONTROLLER   |
> > > >  | .---------.   .------. |
> > > >  | | ENCODER |<--| CRTC | |
> > > >  | '---------'   '------' |
> > > >  '------|-----------------'
> > > >         |
> > > >         |DSI            HOTPLUG
> > > >         V              CONNECTOR
> > > >    .---------.        .--.    .-.        .---------.         .-------.
> > > >    | 0 to N  |        | _|   _| |        | 1 to N  |         |       |
> > > >    | BRIDGES |--DSI-->||_   |_  |--DSI-->| BRIDGES |--LVDS-->| PANEL |
> > > >    |         |        |  |    | |        |         |         |       |
> > > >    '---------'        '--'    '-'        '---------'         '-------'
> > > > 
> > > >  [--- fixed components --]  [----------- removable add-on -----------]
> > > > 
> > > > Fixed components include:
> > > > 
> > > >  * all components up to the DRM encoder, usually part of the SoC
> > > >  * optionally some bridges, in the SoC and/or as external chips
> > > > 
> > > > Components on the removable add-on include:
> > > > 
> > > >  * one or more bridges
> > > >  * a fixed connector (not one natively supporting hotplug such as HDMI)
> > > >  * the panel    
> > > 
> > > So I think at a high level this design approach makes sense,  
> > 
> > Good starting point :)
> >   
> > > but the
> > > implementation needs some serious thought. One big thing upfront though,
> > > we need to have a clear plan for the overlay hotunload issues, otherwise
> > > trying to make drm bridges hotpluggable makes no sense to me. Hotunload is
> > > very, very tricky, full of lifetime issues, and those need to be sorted
> > > out first or we're just trying to build a castle on quicksand.
> > > 
> > > For bridges itself I don't think the current locking works. You're trying
> > > to really cleverly hide it all behind a normal-looking bridge driver, but
> > > there's many things beyond that which will blow up if bridges just
> > > disappear. Most importantly the bridge states part of an atomic update.  
> > 
> > Surely possible as atomic updates are definitely not stimulated in my
> > use case. Can you recommend any testing tools to be able to trigger any
> > issues?  
> 
> Uh really hard ... You'd need to create an atomic commit that's blocked on
> a sync_file in-fence (so that you can extend the race window). And then
> hotunplug the bridge chain _before_ you signal that fence.
> 
> That's not going to cover all possible races, but at least a large chunk
> of the really big ones.
> 
> > The main setups I used for my testing so far are 'modetest -s' for my
> > daily work and a simple weston setup to periodically test a complete
> > user space stack.
> >   
> > > Now in drm we have drm_connector as the only hotunpluggable thing, and it
> > > took years to sort out all the issues. I think we should either model the
> > > bridge hotunplug locking after that, or just outright reuse the connector
> > > locking and lifetime rules. I much prefer the latter personally.
> > > 
> > > Anyway the big issues:
> > > 
> > > - We need to refcount the hotpluggable bridges, because software (like
> > >   atomic state updates) might hang onto pointers for longer than the
> > >   bridge physically exists. Assuming that you can all tear it down
> > >   synchronously will not work.
> > > 
> > >   If we reuse connector locking/lifetime then we could put the
> > >   hotpluggable part of the bridge chain into the drm_connector, since that
> > >   already has refcounting as needed. It would mean that finding the next
> > >   bridge in the chain becomes a lot more tricky though. With that model
> > >   we'd create a new connector every time the bridge is hotplugged, which I
> > >   think is also the cleaner model (because you might plug in a hdmi
> > >   connector after a panel, so things like the connector type change).

So, based on your dp__mst hint I added connector creation/removal in the
v3 I sent a few days ago. All other aspects in your list are unchanged
from the v2 you have reviewed.

Now I'm trying to tackle some other of the items you mentioned here,
and locking/lifetime is the hardest one for me to understand at the
moment.

If I get you right, you suggest making all the removable bridges
"owned" by the connector that gets hotplugged, the reason being that
locking and lifetime issues for the hotplug connectors have already
been sorted out, while that has not been done for bridges.

Correct?

Assuming the above is correct, I'm wondering whether this would be a
correct design or rather a shortcut to leverage the connector locking
instead of implementing bridge locking/lifetime. Note I'm not
criticizing, I'm really asking myself this question and I'd like to
know your position about that.

Again about putting the removable bridges inside the hotplug connector,
I'm trying to understand how that may happen in the device tree case.
The hot-pluggable bridge is populated starting from the DT code
(through i2c-core-of in my case being an I2C chip), and that code is
not aware that it is being instantiating a DRM device. So there is
nothing we can do before the bridge probe is called. The bridge probe
then does know much about any connectors. It also does not know about
hotplug: this is a design decision I made to make the regular bridge
drivers reusable without edits for the hotplug case, but it can be
waived if needed.

So the only way I currently to move the bridge inside the connector is
to catch it in the hotplug-bridge driver, when it gets notified about
the new bridge having appeared (was a notifier, I'm changing that, see
below). So the hotplug-bridge would need a function line
drm_connector_add_bridge(conn, br) to make the association.

Is this the direction of development you had in mind?

> > I have been investigating the option of adding/removing a connector
> > based on hot-plug/unplug events initially, see my reply to Maxime after
> > v1 [0]:
> >   
> > > The first approach is based on removing the drm_connector. My laptop
> > > uses the i915 driver, and I have observed that attaching/removing a
> > > USB-C dock with an HDMI connector connected to a monitor, a new
> > > drm_connector appears/disappears for the card. User space gets notified
> > > and the external monitor is enabled/disabled, just the way a desktop
> > > user would expect, so this is possible. I had a look at the driver but
> > > how this magic happens was not clear to me honestly.  
> > 
> > So if you could point to where/how this is done, I would certainly
> > reconsider that.  
> 
> Right now only the dp mst code uses hotplug/unplug (like you've observed
> in your testing with i915, usb-c docks are generally all dp mst). For code
> reading it might be best to start with the i915 dp mst code and then go
> through the helpers.
> 
> > > - No notifiers please. The create a locking mess with inversions, and
> > >   especially for hotunplug they create the illusion that you can
> > >   synchronously keep up to date with hardware state. That's not possible.
> > >   Fundamentally all bridge drivers which might be hotunplugged need to be
> > >   able to cope with the hardware disappearing any momemnt.  
> > 
> > Can you clarify this point? I'm sorry I fail to understand the
> > relationship between the use of notifiers and the ability of bridge
> > drivers to cope with hardware disappearing.
> > 
> > In this patch, the hotplug-bridge uses notifiers to be informed when
> > the following bridge is disappearing: which other way would you suggest
> > to make the hotplug-bridge aware of that? OTOH the hotplug-bridge is
> > not meant to disappear, and it has no actual hardware that can go away,
> > being just a mechanical connector.  
> 
> Yeah you need code to handle that. My point is that using a notifier is
> the wrong design, because the notifier has it's own locking. Which tends
> to get in the way.

I went into this subject to see how drm_bridge_add() could inform the
interested bridges using a DRM-specific mechanism instead of standard
notifiers.

However I think to inform *the interested* bridges, at least in the
device tree case, there is a fundamental issue: DRM core has no idea
about the topology. Definitely not at drm_bridge_add() time, way before
drm_bridge_attach() where the 'previous' pointer introduces a minimum
of topology awareness in the DRM core.

Option 1 is to iterate over all the ports and endpoints and for every
remote-endpoint pointing to a bridge, inform the remote bridge about
that via a new optional callback in drm_bridge_funcs. That means likely
informing more bridges than needed, so when they get informed the
bridges will still have to check whether they are interested or not.

Pseudocode:

  void drm_bridge_add(struct drm_bridge *new_bridge)
  {
     // existing code unchanged

+    if (bridge->of_node) {
+       for_each_remote_endpoint(bridge->of_node) {
+          br = of_drm_find_bridge(remote_endpoint);
+          if (br && br->funcs.bridge_event_notify)
+              br->funcs->bridge_event_notify(br, DRM_EVENT_BRIDGE_ADD,
+                                             new_bridge);
+       }
+    }
  }

That means informing both upstream and downstream bridges, which could
even be useful, but anyway there is no reliable way to pick only the
sink or source ports. It also implies having lots of of_*() calls which
iterate over the tree, which is not optimal, but it happens only at
add/removal so it's fine I guess.

Option 2 is be to just inform all the bridges (using the global
bridge_list). Pros and cons:

 * less of_*() calls to crawl around the remote-endpoints
 * simpler code
 * more bridges than needed would be informed, could be be an issue
   if many implement the .bridge_event_notify()
 * notifies even in the non-OF case too, not sure it's useful

What are your thoughts about these two options?

> Instead I think that code should be directly in core bridge code (I don't
> see the benefit of having this entirely in a separate driver), using drm
> locking to make sure there's no races.

Not sure I got what you mean here. Which one of the following?

 1. The entire hotplug-bridge driver should not exist, and the DRM
    core should handle it all (seems not doable, this driver has lots of
    device-specific details)
 2. The hotplug-driver should exist, but the code to attach/detach the
    pipeline on hotplug/unplug should be moved to the core, with the
    hotplug-driver providing callbacks for the device-specific aspects
 3. Same as 2, but additionally the hotplug-bridge should become a
    connector driver (hotplug-connector.c?) -- not sure it can decouple
    the two sides without a bridge however
 4. None of the above

> > On the opposite side, the following bridges are physically removable
> > and so their drivers will have to be fixed (if needed) to work when
> > removal happens, but I don't see how that relates to the DRM core
> > emitting a notification of such bridges being removed.  
> 
> Yeah it's not directly related, just my experience that people assume
> notifiers provide you more synchronization and race-preventation than they
> really do. So it's better to hand-roll to make it all really explicit.
> 
> > > - Related to this: You're not allowed to shut down hardware behind the
> > >   user's back with drm_atomic_helper_shutdown. We've tried that approach
> > >   with dp mst, it really pisses off userspace when a page_flip that it
> > >   expected to work doesn't work.
> > > 
> > > - There's also the design aspect that in atomic, only atomic_check is
> > >   allowed to fail, atomic_commit must succeed, even when the hardware is
> > >   gone. Using connectors and their refcounting should help with that.  
> > 
> > IIUC here you are suggesting again to remove the connector instead of
> > marking it "disconnected". So, as above, pointers on how that is
> > achieved would be helpful.  
> 
> See dp mst code. It's complex unfortunately, so there's some reading
> involved :-/
> >  
> > > - Somewhat aside, but I noticed that the bridge->atomic_reset is in
> > >   drm_bridge_attach, and that's kinda the wrong place. It should be in
> > >   drm_mode_config_reset, like all the other ->atomic_reset hooks. That
> > >   would make it a lot clearer that we need to figure out who/when  
> > >   ->atomic_reset should be called for hotplugged bridges, maybe as part of    
> > >   connector registration when the entire bridge and it's new connector is
> > >   assembled?
> > > 
> > > - Finally this very much means we need to rethink who/how the connector
> > >   for a bridge is created. The new design is that the main driver creates
> > >   this connector, once the entire bridge exists. But with hotplugging this
> > >   gets a lot more complicated, so we might want to extract a pile of that
> > >   encoder related code from drivers (same way dp mst helpers take care of
> > >   connector creation too, it's just too much of a mess otherwise).
> > > 
> > >   The current bridge chaining infrastructure requires a lot of hand-rolled
> > >   code in each bridge driver and the encoder, so that might be a good
> > >   thing anyway.
> > > 
> > > - Finally I think the entire bridge hotplug infrastructure should be
> > >   irrespective of the underlying bus. Which means for the mipi dsi case we
> > >   might also want to look into what's missing to make mipi dsi
> > >   hotunpluggable, at least for the case where it's a proper driver. I
> > >   think we should ignore the old bridge model where driver's stitched it
> > >   all toghether using the component framework, in my opinion that approach
> > >   should be deprecated.  
> > 
> > The DSI side was one of my headaches on writing this driver, and I
> > must say I dislike the code in hotplug_bridge_dsi_attach(), with the
> > need for an initial "dummy" attach and detach the first time. At first
> > sight I think we need a .update_format callback in struct
> > mipi_dsi_host_ops so the DSI host is aware of a format change.
> > 
> > Right now there are DSI host drivers which keep a copy of the struct
> > mipi_dsi_device pointer and read the format from there when starting a
> > stream (e.g. the tc358768.c driver [1]). That somewhat provides a way
> > to react to format changes, but keeping a pointer is bad when the DSI
> > device hot-unplugs, and the new format won't be seen until a
> > disable/enable cycle. So a new callback looks more robust overall.
> > 
> > Any opinion about this?  
> 
> Yeah I don't like the code either.
> 
> What might help for prototyping is if you start with a hotpluggeable
> bridge where the bridge is a i2c device. That way I think we should be
> able to benefit from the driver bind/unbind code that exists already.
> Although there's going to be issues with i2c hotunplug too in i2c core
> code.
> 
> Then lift whatever we learn there to our dsi code. But essentially I think
> we should model the driver core model a lot more, so I guess you could use
> any hotunplug capable bus as a template. Usb might be closest, since
> that's also a packet/message based interface, mostly at least?
> 
> > > - Finally I think we should have a lot of safety checks, like only bridges
> > >   which declare themselve to be hotunplug safe should be allowed as a part
> > >   of the hotpluggable bridge chain part. All others must still be attached
> > >   before the entire driver is registered with drm_dev_register.  
> > 
> > I'm fine with the principle of a "HOTPLUG" flag.
> > 
> > I wonder how that should be implemented, though. Based on my (surely
> > simplistic) understanding, right now bridges can be both added and
> > removed, but only in a specific sequence:
> > 
> >  1. add all bridges
> >  2. use the card
> >  3. remove all bridges
> >  4. EOF
> > 
> > We'd need to change to allow:
> > 
> >  1. add fixed bridges (including hotplug-bridge)
> >  2. add bridges on removable add-on
> >  3. use card
> >  4. remove bridges on removable add-on
> >  5. optionally goto 2
> >  6. remove fixed bridges (including hotplug-bridge)
> >  7. EOF
> > 
> > One naïve idea is that the DRM core keeps a flag whenever any fixed
> > bridge (no HOTLPUG flag) is removed and when that happens forbid adding
> > bridges as we are now at line 5.  
> 
> If a fixed bridge is removed while the driver is still in use (i.e. before
> drm_dev_unregister is called), that's a driver bug. Would be good to catch
> that, which is why I think a lot of all the bridge hotplug handling should
> be in core bridge code and not the separate hotplug driver, so that we can
> enforce all these constraints.
> 
> Also conceptually 3 can happen before 2 (but also before), and that's how
> dp mst works too. It does add all kinds of complications though ...
> 
> > Aside for tons of subtleties I'm surely missing, does this look a
> > proper approach? I'd not be surprised if there is something better and
> > more solid.  
> 
> Yeah roughly. If you look through dp mst code then that also adds all
> kinds of structures (since dp mst is a routed network really), not just
> the connectors. They also all come with refcounts (because the network is
> a tree), but like I said I think for your case we can avoid the per-bridge
> refcounts by relying on the connector refcount we have already.
> 
> But I might be overlooking something, and we need refcounting for each
> bridge like dp mst also needs refcounting for all its internal structures
> that map the entire hotpluggable display chain. If you want to read some
> dp mst code, these internal structures are ports/branches with struct
> drm_dp_mst_branch/port.
> 
> > >   Or that we only allow bridges with the NO_CONNECTOR flag for
> > >   drm_bridge_attach.
> > > 
> > > There's probably a pile more fundamental issues I've missed, but this
> > > should get a good discussion started.  
> > 
> > Sure, I think it has.
> > 
> > Bottom line, I'm clearly not seeing some issues that need to be
> > considered, and that do not show up for my use case. Based on my
> > limited DRM knowledge, this was expected, and I'm glad to work on those
> > issues with some practical indications about the path forward.  
> 
> Yeah for learning I think dp mst is best. It's a bit complex, but since
> you have a dock you should be able to play around and experiment with the
> code with some real hardware.
> 
> That should help to get a bit a feel for the complexity, since lots of
> opportunities for you to ask why/how locking/refcounting is used and
> against which potential issue they protect.
> 
> Cheers, Sima

Luca
Daniel Vetter Aug. 27, 2024, 4:37 p.m. UTC | #15
On Fri, Aug 23, 2024 at 12:39:03PM +0200, Luca Ceresoli wrote:
> Hello Sima,
> 
> these days I started looking in more detail into some of the topics you
> had mentioned in your v2 review. I have questions about those I have
> investigated, see below.
> 
> On Tue, 21 May 2024 14:01:19 +0200
> Daniel Vetter <daniel@ffwll.ch> wrote:
> 
> > On Mon, May 20, 2024 at 02:01:48PM +0200, Luca Ceresoli wrote:
> > > Hello Daniel,
> > > 
> > > On Thu, 16 May 2024 15:22:01 +0200
> > > Daniel Vetter <daniel@ffwll.ch> wrote:
> > >   
> > > > Apologies for missing v1 ...
> > > > 
> > > > On Fri, May 10, 2024 at 09:10:36AM +0200, Luca Ceresoli wrote:  
> > > > > DRM hotplug bridge driver
> > > > > =========================
> > > > > 
> > > > > DRM natively supports pipelines whose display can be removed, but all the
> > > > > components preceding it (all the display controller and any bridges) are
> > > > > assumed to be fixed and cannot be plugged, removed or modified at runtime.
> > > > > 
> > > > > This series adds support for DRM pipelines having a removable part after
> > > > > the encoder, thus also allowing bridges to be removed and reconnected at
> > > > > runtime, possibly with different components.
> > > > > 
> > > > > This picture summarizes the  DRM structure implemented by this series:
> > > > > 
> > > > >  .------------------------.
> > > > >  |   DISPLAY CONTROLLER   |
> > > > >  | .---------.   .------. |
> > > > >  | | ENCODER |<--| CRTC | |
> > > > >  | '---------'   '------' |
> > > > >  '------|-----------------'
> > > > >         |
> > > > >         |DSI            HOTPLUG
> > > > >         V              CONNECTOR
> > > > >    .---------.        .--.    .-.        .---------.         .-------.
> > > > >    | 0 to N  |        | _|   _| |        | 1 to N  |         |       |
> > > > >    | BRIDGES |--DSI-->||_   |_  |--DSI-->| BRIDGES |--LVDS-->| PANEL |
> > > > >    |         |        |  |    | |        |         |         |       |
> > > > >    '---------'        '--'    '-'        '---------'         '-------'
> > > > > 
> > > > >  [--- fixed components --]  [----------- removable add-on -----------]
> > > > > 
> > > > > Fixed components include:
> > > > > 
> > > > >  * all components up to the DRM encoder, usually part of the SoC
> > > > >  * optionally some bridges, in the SoC and/or as external chips
> > > > > 
> > > > > Components on the removable add-on include:
> > > > > 
> > > > >  * one or more bridges
> > > > >  * a fixed connector (not one natively supporting hotplug such as HDMI)
> > > > >  * the panel    
> > > > 
> > > > So I think at a high level this design approach makes sense,  
> > > 
> > > Good starting point :)
> > >   
> > > > but the
> > > > implementation needs some serious thought. One big thing upfront though,
> > > > we need to have a clear plan for the overlay hotunload issues, otherwise
> > > > trying to make drm bridges hotpluggable makes no sense to me. Hotunload is
> > > > very, very tricky, full of lifetime issues, and those need to be sorted
> > > > out first or we're just trying to build a castle on quicksand.
> > > > 
> > > > For bridges itself I don't think the current locking works. You're trying
> > > > to really cleverly hide it all behind a normal-looking bridge driver, but
> > > > there's many things beyond that which will blow up if bridges just
> > > > disappear. Most importantly the bridge states part of an atomic update.  
> > > 
> > > Surely possible as atomic updates are definitely not stimulated in my
> > > use case. Can you recommend any testing tools to be able to trigger any
> > > issues?  
> > 
> > Uh really hard ... You'd need to create an atomic commit that's blocked on
> > a sync_file in-fence (so that you can extend the race window). And then
> > hotunplug the bridge chain _before_ you signal that fence.
> > 
> > That's not going to cover all possible races, but at least a large chunk
> > of the really big ones.
> > 
> > > The main setups I used for my testing so far are 'modetest -s' for my
> > > daily work and a simple weston setup to periodically test a complete
> > > user space stack.
> > >   
> > > > Now in drm we have drm_connector as the only hotunpluggable thing, and it
> > > > took years to sort out all the issues. I think we should either model the
> > > > bridge hotunplug locking after that, or just outright reuse the connector
> > > > locking and lifetime rules. I much prefer the latter personally.
> > > > 
> > > > Anyway the big issues:
> > > > 
> > > > - We need to refcount the hotpluggable bridges, because software (like
> > > >   atomic state updates) might hang onto pointers for longer than the
> > > >   bridge physically exists. Assuming that you can all tear it down
> > > >   synchronously will not work.
> > > > 
> > > >   If we reuse connector locking/lifetime then we could put the
> > > >   hotpluggable part of the bridge chain into the drm_connector, since that
> > > >   already has refcounting as needed. It would mean that finding the next
> > > >   bridge in the chain becomes a lot more tricky though. With that model
> > > >   we'd create a new connector every time the bridge is hotplugged, which I
> > > >   think is also the cleaner model (because you might plug in a hdmi
> > > >   connector after a panel, so things like the connector type change).
> 
> So, based on your dp__mst hint I added connector creation/removal in the
> v3 I sent a few days ago. All other aspects in your list are unchanged
> from the v2 you have reviewed.
> 
> Now I'm trying to tackle some other of the items you mentioned here,
> and locking/lifetime is the hardest one for me to understand at the
> moment.
> 
> If I get you right, you suggest making all the removable bridges
> "owned" by the connector that gets hotplugged, the reason being that
> locking and lifetime issues for the hotplug connectors have already
> been sorted out, while that has not been done for bridges.
> 
> Correct?

Yes.

> Assuming the above is correct, I'm wondering whether this would be a
> correct design or rather a shortcut to leverage the connector locking
> instead of implementing bridge locking/lifetime. Note I'm not
> criticizing, I'm really asking myself this question and I'd like to
> know your position about that.
> 
> Again about putting the removable bridges inside the hotplug connector,
> I'm trying to understand how that may happen in the device tree case.
> The hot-pluggable bridge is populated starting from the DT code
> (through i2c-core-of in my case being an I2C chip), and that code is
> not aware that it is being instantiating a DRM device. So there is
> nothing we can do before the bridge probe is called. The bridge probe
> then does know much about any connectors. It also does not know about
> hotplug: this is a design decision I made to make the regular bridge
> drivers reusable without edits for the hotplug case, but it can be
> waived if needed.
> 
> So the only way I currently to move the bridge inside the connector is
> to catch it in the hotplug-bridge driver, when it gets notified about
> the new bridge having appeared (was a notifier, I'm changing that, see
> below). So the hotplug-bridge would need a function line
> drm_connector_add_bridge(conn, br) to make the association.
> 
> Is this the direction of development you had in mind?

Not quite. The issue is that while bridges are getting hotplugged, you
don't yet have a connector. So there's nothing you can attach to. What I
had in mind is more:

- The last fixed bridges knows that all subsequent bridges are hotplugged.
  Which means instead of just asking for the next bridge, it needs to ask
  for the fully formed bridge chain, including the connector.

- The hotplug bridge connector code checks every time a new bridge shows
  up whether the chain is now complete. Once that is the case, it creates
  the connector (with the new bridge design this is no longer the job of
  the last bridge in the chain, so we need to require that for
  hotpluggable bridges). Then it can attach all the bridges to that
  connector, and hand the entire fully formed chain to the last fixed
  bridge to hotplug insert and register.

- Hotunplug is the same, but on inverse: The entire chain including the
  connector gets removed in it's entirety, before it gets dissassembled
  and freed (once the last refcount to the connector is dropped).

There is a _lot_ of handwaving in this on the driver model side, but with
this the drm side should be clear and clean at least.

> > > I have been investigating the option of adding/removing a connector
> > > based on hot-plug/unplug events initially, see my reply to Maxime after
> > > v1 [0]:
> > >   
> > > > The first approach is based on removing the drm_connector. My laptop
> > > > uses the i915 driver, and I have observed that attaching/removing a
> > > > USB-C dock with an HDMI connector connected to a monitor, a new
> > > > drm_connector appears/disappears for the card. User space gets notified
> > > > and the external monitor is enabled/disabled, just the way a desktop
> > > > user would expect, so this is possible. I had a look at the driver but
> > > > how this magic happens was not clear to me honestly.  
> > > 
> > > So if you could point to where/how this is done, I would certainly
> > > reconsider that.  
> > 
> > Right now only the dp mst code uses hotplug/unplug (like you've observed
> > in your testing with i915, usb-c docks are generally all dp mst). For code
> > reading it might be best to start with the i915 dp mst code and then go
> > through the helpers.
> > 
> > > > - No notifiers please. The create a locking mess with inversions, and
> > > >   especially for hotunplug they create the illusion that you can
> > > >   synchronously keep up to date with hardware state. That's not possible.
> > > >   Fundamentally all bridge drivers which might be hotunplugged need to be
> > > >   able to cope with the hardware disappearing any momemnt.  
> > > 
> > > Can you clarify this point? I'm sorry I fail to understand the
> > > relationship between the use of notifiers and the ability of bridge
> > > drivers to cope with hardware disappearing.
> > > 
> > > In this patch, the hotplug-bridge uses notifiers to be informed when
> > > the following bridge is disappearing: which other way would you suggest
> > > to make the hotplug-bridge aware of that? OTOH the hotplug-bridge is
> > > not meant to disappear, and it has no actual hardware that can go away,
> > > being just a mechanical connector.  
> > 
> > Yeah you need code to handle that. My point is that using a notifier is
> > the wrong design, because the notifier has it's own locking. Which tends
> > to get in the way.
> 
> I went into this subject to see how drm_bridge_add() could inform the
> interested bridges using a DRM-specific mechanism instead of standard
> notifiers.
> 
> However I think to inform *the interested* bridges, at least in the
> device tree case, there is a fundamental issue: DRM core has no idea
> about the topology. Definitely not at drm_bridge_add() time, way before
> drm_bridge_attach() where the 'previous' pointer introduces a minimum
> of topology awareness in the DRM core.

Yeah we need special functions, which the last fixed bridge needs to call
instead of the current set of functions. So instead of of_drm_find_bridge
we need something like of_drm_register_hotplug_source_bridge or similar.

> Option 1 is to iterate over all the ports and endpoints and for every
> remote-endpoint pointing to a bridge, inform the remote bridge about
> that via a new optional callback in drm_bridge_funcs. That means likely
> informing more bridges than needed, so when they get informed the
> bridges will still have to check whether they are interested or not.
> 
> Pseudocode:
> 
>   void drm_bridge_add(struct drm_bridge *new_bridge)
>   {
>      // existing code unchanged
> 
> +    if (bridge->of_node) {
> +       for_each_remote_endpoint(bridge->of_node) {
> +          br = of_drm_find_bridge(remote_endpoint);
> +          if (br && br->funcs.bridge_event_notify)
> +              br->funcs->bridge_event_notify(br, DRM_EVENT_BRIDGE_ADD,
> +                                             new_bridge);
> +       }
> +    }
>   }
> 
> That means informing both upstream and downstream bridges, which could
> even be useful, but anyway there is no reliable way to pick only the
> sink or source ports. It also implies having lots of of_*() calls which
> iterate over the tree, which is not optimal, but it happens only at
> add/removal so it's fine I guess.
> 
> Option 2 is be to just inform all the bridges (using the global
> bridge_list). Pros and cons:
> 
>  * less of_*() calls to crawl around the remote-endpoints
>  * simpler code
>  * more bridges than needed would be informed, could be be an issue
>    if many implement the .bridge_event_notify()
>  * notifies even in the non-OF case too, not sure it's useful
> 
> What are your thoughts about these two options?

I'm not sure why you need to inform bridges? Currently for fixed bridges
we retry with EPROBE_DEFER if the subsequent bridge isn't there yet, which
means we only add the bridge when the entire chain exists. I think
hotplugged bridge chains should work the same, which means you only ever
need to watch the first element shows up. Then you create the connector
for the entire chain and through special callbacks inform the bridge that
called of_drm_register_hotplug_source_bridge that it now has a connector.

All the other bridges can be ignored I think, even when they're part of
the hotplug chain.

I just realized that means we need at least a refcount on hotpluggable
bridges, because otherwise drm_bridge_remove will result in use-after-free
issues. So should probably rename those from drm_bridge_add/remove to
drm_bridge_register/unregister.

For fixed bridges no one else should ever increment the refcount, so
should work exactly the same.

> > Instead I think that code should be directly in core bridge code (I don't
> > see the benefit of having this entirely in a separate driver), using drm
> > locking to make sure there's no races.
> 
> Not sure I got what you mean here. Which one of the following?
> 
>  1. The entire hotplug-bridge driver should not exist, and the DRM
>     core should handle it all (seems not doable, this driver has lots of
>     device-specific details)
>  2. The hotplug-driver should exist, but the code to attach/detach the
>     pipeline on hotplug/unplug should be moved to the core, with the
>     hotplug-driver providing callbacks for the device-specific aspects
>  3. Same as 2, but additionally the hotplug-bridge should become a
>     connector driver (hotplug-connector.c?) -- not sure it can decouple
>     the two sides without a bridge however
>  4. None of the above

3, roughly. The connector creation must be in core bridge code, or things
will go boom.

If you also want to keep the current lifetime model for bridge (without a
refcount, meaning drm_bridge_remove removes the bridge from all use right
away) there's also additional locking and tricks you need in in that
bridge connector code, including the entire bridge chain. Meaning those
hotplugged bridges must not be visible at all to the wider driver.

Or you refcount bridges and make them like driver model struct kobj (or
kobj directly), but that's a pile of work too. That would be the other
design approach, instead of only relying on drm_connector. We might need
both, I haven't dug into the details enough to be sure here.

Cheers, Sima


> > > On the opposite side, the following bridges are physically removable
> > > and so their drivers will have to be fixed (if needed) to work when
> > > removal happens, but I don't see how that relates to the DRM core
> > > emitting a notification of such bridges being removed.  
> > 
> > Yeah it's not directly related, just my experience that people assume
> > notifiers provide you more synchronization and race-preventation than they
> > really do. So it's better to hand-roll to make it all really explicit.
> > 
> > > > - Related to this: You're not allowed to shut down hardware behind the
> > > >   user's back with drm_atomic_helper_shutdown. We've tried that approach
> > > >   with dp mst, it really pisses off userspace when a page_flip that it
> > > >   expected to work doesn't work.
> > > > 
> > > > - There's also the design aspect that in atomic, only atomic_check is
> > > >   allowed to fail, atomic_commit must succeed, even when the hardware is
> > > >   gone. Using connectors and their refcounting should help with that.  
> > > 
> > > IIUC here you are suggesting again to remove the connector instead of
> > > marking it "disconnected". So, as above, pointers on how that is
> > > achieved would be helpful.  
> > 
> > See dp mst code. It's complex unfortunately, so there's some reading
> > involved :-/
> > >  
> > > > - Somewhat aside, but I noticed that the bridge->atomic_reset is in
> > > >   drm_bridge_attach, and that's kinda the wrong place. It should be in
> > > >   drm_mode_config_reset, like all the other ->atomic_reset hooks. That
> > > >   would make it a lot clearer that we need to figure out who/when  
> > > >   ->atomic_reset should be called for hotplugged bridges, maybe as part of    
> > > >   connector registration when the entire bridge and it's new connector is
> > > >   assembled?
> > > > 
> > > > - Finally this very much means we need to rethink who/how the connector
> > > >   for a bridge is created. The new design is that the main driver creates
> > > >   this connector, once the entire bridge exists. But with hotplugging this
> > > >   gets a lot more complicated, so we might want to extract a pile of that
> > > >   encoder related code from drivers (same way dp mst helpers take care of
> > > >   connector creation too, it's just too much of a mess otherwise).
> > > > 
> > > >   The current bridge chaining infrastructure requires a lot of hand-rolled
> > > >   code in each bridge driver and the encoder, so that might be a good
> > > >   thing anyway.
> > > > 
> > > > - Finally I think the entire bridge hotplug infrastructure should be
> > > >   irrespective of the underlying bus. Which means for the mipi dsi case we
> > > >   might also want to look into what's missing to make mipi dsi
> > > >   hotunpluggable, at least for the case where it's a proper driver. I
> > > >   think we should ignore the old bridge model where driver's stitched it
> > > >   all toghether using the component framework, in my opinion that approach
> > > >   should be deprecated.  
> > > 
> > > The DSI side was one of my headaches on writing this driver, and I
> > > must say I dislike the code in hotplug_bridge_dsi_attach(), with the
> > > need for an initial "dummy" attach and detach the first time. At first
> > > sight I think we need a .update_format callback in struct
> > > mipi_dsi_host_ops so the DSI host is aware of a format change.
> > > 
> > > Right now there are DSI host drivers which keep a copy of the struct
> > > mipi_dsi_device pointer and read the format from there when starting a
> > > stream (e.g. the tc358768.c driver [1]). That somewhat provides a way
> > > to react to format changes, but keeping a pointer is bad when the DSI
> > > device hot-unplugs, and the new format won't be seen until a
> > > disable/enable cycle. So a new callback looks more robust overall.
> > > 
> > > Any opinion about this?  
> > 
> > Yeah I don't like the code either.
> > 
> > What might help for prototyping is if you start with a hotpluggeable
> > bridge where the bridge is a i2c device. That way I think we should be
> > able to benefit from the driver bind/unbind code that exists already.
> > Although there's going to be issues with i2c hotunplug too in i2c core
> > code.
> > 
> > Then lift whatever we learn there to our dsi code. But essentially I think
> > we should model the driver core model a lot more, so I guess you could use
> > any hotunplug capable bus as a template. Usb might be closest, since
> > that's also a packet/message based interface, mostly at least?
> > 
> > > > - Finally I think we should have a lot of safety checks, like only bridges
> > > >   which declare themselve to be hotunplug safe should be allowed as a part
> > > >   of the hotpluggable bridge chain part. All others must still be attached
> > > >   before the entire driver is registered with drm_dev_register.  
> > > 
> > > I'm fine with the principle of a "HOTPLUG" flag.
> > > 
> > > I wonder how that should be implemented, though. Based on my (surely
> > > simplistic) understanding, right now bridges can be both added and
> > > removed, but only in a specific sequence:
> > > 
> > >  1. add all bridges
> > >  2. use the card
> > >  3. remove all bridges
> > >  4. EOF
> > > 
> > > We'd need to change to allow:
> > > 
> > >  1. add fixed bridges (including hotplug-bridge)
> > >  2. add bridges on removable add-on
> > >  3. use card
> > >  4. remove bridges on removable add-on
> > >  5. optionally goto 2
> > >  6. remove fixed bridges (including hotplug-bridge)
> > >  7. EOF
> > > 
> > > One naïve idea is that the DRM core keeps a flag whenever any fixed
> > > bridge (no HOTLPUG flag) is removed and when that happens forbid adding
> > > bridges as we are now at line 5.  
> > 
> > If a fixed bridge is removed while the driver is still in use (i.e. before
> > drm_dev_unregister is called), that's a driver bug. Would be good to catch
> > that, which is why I think a lot of all the bridge hotplug handling should
> > be in core bridge code and not the separate hotplug driver, so that we can
> > enforce all these constraints.
> > 
> > Also conceptually 3 can happen before 2 (but also before), and that's how
> > dp mst works too. It does add all kinds of complications though ...
> > 
> > > Aside for tons of subtleties I'm surely missing, does this look a
> > > proper approach? I'd not be surprised if there is something better and
> > > more solid.  
> > 
> > Yeah roughly. If you look through dp mst code then that also adds all
> > kinds of structures (since dp mst is a routed network really), not just
> > the connectors. They also all come with refcounts (because the network is
> > a tree), but like I said I think for your case we can avoid the per-bridge
> > refcounts by relying on the connector refcount we have already.
> > 
> > But I might be overlooking something, and we need refcounting for each
> > bridge like dp mst also needs refcounting for all its internal structures
> > that map the entire hotpluggable display chain. If you want to read some
> > dp mst code, these internal structures are ports/branches with struct
> > drm_dp_mst_branch/port.
> > 
> > > >   Or that we only allow bridges with the NO_CONNECTOR flag for
> > > >   drm_bridge_attach.
> > > > 
> > > > There's probably a pile more fundamental issues I've missed, but this
> > > > should get a good discussion started.  
> > > 
> > > Sure, I think it has.
> > > 
> > > Bottom line, I'm clearly not seeing some issues that need to be
> > > considered, and that do not show up for my use case. Based on my
> > > limited DRM knowledge, this was expected, and I'm glad to work on those
> > > issues with some practical indications about the path forward.  
> > 
> > Yeah for learning I think dp mst is best. It's a bit complex, but since
> > you have a dock you should be able to play around and experiment with the
> > code with some real hardware.
> > 
> > That should help to get a bit a feel for the complexity, since lots of
> > opportunities for you to ask why/how locking/refcounting is used and
> > against which potential issue they protect.
> > 
> > Cheers, Sima
> 
> Luca
> 
> -- 
> Luca Ceresoli, Bootlin
> Embedded Linux and Kernel engineering
> https://bootlin.com
Luca Ceresoli Sept. 9, 2024, 1:26 p.m. UTC | #16
Hello Sima,

again thanks for your feedback, and again I'm afraid I'm having a hard
time in understanding large parts of your e-mail. Below I'm trying to
focus on one fundamental aspect only, which I need to understand the
overall approach you are suggesting.

On Tue, 27 Aug 2024 18:37:12 +0200
Daniel Vetter <daniel.vetter@ffwll.ch> wrote:

> On Fri, Aug 23, 2024 at 12:39:03PM +0200, Luca Ceresoli wrote:
> > Hello Sima,
> > 
> > these days I started looking in more detail into some of the topics you
> > had mentioned in your v2 review. I have questions about those I have
> > investigated, see below.
> > 
> > On Tue, 21 May 2024 14:01:19 +0200
> > Daniel Vetter <daniel@ffwll.ch> wrote:
> >   
> > > On Mon, May 20, 2024 at 02:01:48PM +0200, Luca Ceresoli wrote:  
> > > > Hello Daniel,
> > > > 
> > > > On Thu, 16 May 2024 15:22:01 +0200
> > > > Daniel Vetter <daniel@ffwll.ch> wrote:
> > > >     
> > > > > Apologies for missing v1 ...
> > > > > 
> > > > > On Fri, May 10, 2024 at 09:10:36AM +0200, Luca Ceresoli wrote:    
> > > > > > DRM hotplug bridge driver
> > > > > > =========================
> > > > > > 
> > > > > > DRM natively supports pipelines whose display can be removed, but all the
> > > > > > components preceding it (all the display controller and any bridges) are
> > > > > > assumed to be fixed and cannot be plugged, removed or modified at runtime.
> > > > > > 
> > > > > > This series adds support for DRM pipelines having a removable part after
> > > > > > the encoder, thus also allowing bridges to be removed and reconnected at
> > > > > > runtime, possibly with different components.
> > > > > > 
> > > > > > This picture summarizes the  DRM structure implemented by this series:
> > > > > > 
> > > > > >  .------------------------.
> > > > > >  |   DISPLAY CONTROLLER   |
> > > > > >  | .---------.   .------. |
> > > > > >  | | ENCODER |<--| CRTC | |
> > > > > >  | '---------'   '------' |
> > > > > >  '------|-----------------'
> > > > > >         |
> > > > > >         |DSI            HOTPLUG
> > > > > >         V              CONNECTOR
> > > > > >    .---------.        .--.    .-.        .---------.         .-------.
> > > > > >    | 0 to N  |        | _|   _| |        | 1 to N  |         |       |
> > > > > >    | BRIDGES |--DSI-->||_   |_  |--DSI-->| BRIDGES |--LVDS-->| PANEL |
> > > > > >    |         |        |  |    | |        |         |         |       |
> > > > > >    '---------'        '--'    '-'        '---------'         '-------'
> > > > > > 
> > > > > >  [--- fixed components --]  [----------- removable add-on -----------]
> > > > > > 
> > > > > > Fixed components include:
> > > > > > 
> > > > > >  * all components up to the DRM encoder, usually part of the SoC
> > > > > >  * optionally some bridges, in the SoC and/or as external chips
> > > > > > 
> > > > > > Components on the removable add-on include:
> > > > > > 
> > > > > >  * one or more bridges
> > > > > >  * a fixed connector (not one natively supporting hotplug such as HDMI)
> > > > > >  * the panel      
> > > > > 
> > > > > So I think at a high level this design approach makes sense,    
> > > > 
> > > > Good starting point :)
> > > >     
> > > > > but the
> > > > > implementation needs some serious thought. One big thing upfront though,
> > > > > we need to have a clear plan for the overlay hotunload issues, otherwise
> > > > > trying to make drm bridges hotpluggable makes no sense to me. Hotunload is
> > > > > very, very tricky, full of lifetime issues, and those need to be sorted
> > > > > out first or we're just trying to build a castle on quicksand.
> > > > > 
> > > > > For bridges itself I don't think the current locking works. You're trying
> > > > > to really cleverly hide it all behind a normal-looking bridge driver, but
> > > > > there's many things beyond that which will blow up if bridges just
> > > > > disappear. Most importantly the bridge states part of an atomic update.    
> > > > 
> > > > Surely possible as atomic updates are definitely not stimulated in my
> > > > use case. Can you recommend any testing tools to be able to trigger any
> > > > issues?    
> > > 
> > > Uh really hard ... You'd need to create an atomic commit that's blocked on
> > > a sync_file in-fence (so that you can extend the race window). And then
> > > hotunplug the bridge chain _before_ you signal that fence.
> > > 
> > > That's not going to cover all possible races, but at least a large chunk
> > > of the really big ones.
> > >   
> > > > The main setups I used for my testing so far are 'modetest -s' for my
> > > > daily work and a simple weston setup to periodically test a complete
> > > > user space stack.
> > > >     
> > > > > Now in drm we have drm_connector as the only hotunpluggable thing, and it
> > > > > took years to sort out all the issues. I think we should either model the
> > > > > bridge hotunplug locking after that, or just outright reuse the connector
> > > > > locking and lifetime rules. I much prefer the latter personally.
> > > > > 
> > > > > Anyway the big issues:
> > > > > 
> > > > > - We need to refcount the hotpluggable bridges, because software (like
> > > > >   atomic state updates) might hang onto pointers for longer than the
> > > > >   bridge physically exists. Assuming that you can all tear it down
> > > > >   synchronously will not work.
> > > > > 
> > > > >   If we reuse connector locking/lifetime then we could put the
> > > > >   hotpluggable part of the bridge chain into the drm_connector, since that
> > > > >   already has refcounting as needed. It would mean that finding the next
> > > > >   bridge in the chain becomes a lot more tricky though. With that model
> > > > >   we'd create a new connector every time the bridge is hotplugged, which I
> > > > >   think is also the cleaner model (because you might plug in a hdmi
> > > > >   connector after a panel, so things like the connector type change).  
> > 
> > So, based on your dp__mst hint I added connector creation/removal in the
> > v3 I sent a few days ago. All other aspects in your list are unchanged
> > from the v2 you have reviewed.
> > 
> > Now I'm trying to tackle some other of the items you mentioned here,
> > and locking/lifetime is the hardest one for me to understand at the
> > moment.
> > 
> > If I get you right, you suggest making all the removable bridges
> > "owned" by the connector that gets hotplugged, the reason being that
> > locking and lifetime issues for the hotplug connectors have already
> > been sorted out, while that has not been done for bridges.
> > 
> > Correct?  
> 
> Yes.
> 
> > Assuming the above is correct, I'm wondering whether this would be a
> > correct design or rather a shortcut to leverage the connector locking
> > instead of implementing bridge locking/lifetime. Note I'm not
> > criticizing, I'm really asking myself this question and I'd like to
> > know your position about that.
> > 
> > Again about putting the removable bridges inside the hotplug connector,
> > I'm trying to understand how that may happen in the device tree case.
> > The hot-pluggable bridge is populated starting from the DT code
> > (through i2c-core-of in my case being an I2C chip), and that code is
> > not aware that it is being instantiating a DRM device. So there is
> > nothing we can do before the bridge probe is called. The bridge probe
> > then does know much about any connectors. It also does not know about
> > hotplug: this is a design decision I made to make the regular bridge
> > drivers reusable without edits for the hotplug case, but it can be
> > waived if needed.
> > 
> > So the only way I currently to move the bridge inside the connector is
> > to catch it in the hotplug-bridge driver, when it gets notified about
> > the new bridge having appeared (was a notifier, I'm changing that, see
> > below). So the hotplug-bridge would need a function line
> > drm_connector_add_bridge(conn, br) to make the association.
> > 
> > Is this the direction of development you had in mind?  
> 
> Not quite. The issue is that while bridges are getting hotplugged, you
> don't yet have a connector. So there's nothing you can attach to. What I
> had in mind is more:
> 

There is a fundamental question where your position is not clear to me.
Based on this:

> - The last fixed bridges knows that all subsequent bridges are hotplugged.
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>   Which means instead of just asking for the next bridge, it needs to ask
>   for the fully formed bridge chain, including the connector.
> 

and this:

> - The hotplug bridge connector code checks every time a new bridge shows
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>   up whether the chain is now complete. Once that is the case, it creates
>   the connector (with the new bridge design this is no longer the job of
>   the last bridge in the chain, so we need to require that for
>   hotpluggable bridges). Then it can attach all the bridges to that
>   connector, and hand the entire fully formed chain to the last fixed
>   bridge to hotplug insert and register.

The question is: do you think the hotplug-bridge driver should be
present, or not? To me the two above sentences appear to contradict each
other.

The reason we decided to implement a hotplug DRM bridge is it allows to
decouple the fixed and the remote segments of the pipeline, in such a
way that all the regular bridge drivers don't need any special handling
to support hotplug.

In my work the upstream bridge driver is samsung-dsim.c and the
downstream one is ti-sn65dsi83.c and none of them needed a single line
changed to support hotplug. I think this is useful: virtually any
physical bridge with pins can be used in a hotplug setup, either in the
fixed or in the removable section, so not needing to modify them is
valuable.

OTOH in various parts of this and other e-mails you seem to imply that
there should be no hotplug-bridge (not as a struct drm_bridge, not as
a driver, or both). Except for the fact that there is no chip
implementing such a bridge (there is a physical connector though) I do
not see many reasons.

> - Hotunplug is the same, but on inverse: The entire chain including the
>   connector gets removed in it's entirety, before it gets dissassembled
>   and freed (once the last refcount to the connector is dropped).
> 
> There is a _lot_ of handwaving in this on the driver model side, but with
> this the drm side should be clear and clean at least.
> 
> > > > I have been investigating the option of adding/removing a connector
> > > > based on hot-plug/unplug events initially, see my reply to Maxime after
> > > > v1 [0]:
> > > >     
> > > > > The first approach is based on removing the drm_connector. My laptop
> > > > > uses the i915 driver, and I have observed that attaching/removing a
> > > > > USB-C dock with an HDMI connector connected to a monitor, a new
> > > > > drm_connector appears/disappears for the card. User space gets notified
> > > > > and the external monitor is enabled/disabled, just the way a desktop
> > > > > user would expect, so this is possible. I had a look at the driver but
> > > > > how this magic happens was not clear to me honestly.    
> > > > 
> > > > So if you could point to where/how this is done, I would certainly
> > > > reconsider that.    
> > > 
> > > Right now only the dp mst code uses hotplug/unplug (like you've observed
> > > in your testing with i915, usb-c docks are generally all dp mst). For code
> > > reading it might be best to start with the i915 dp mst code and then go
> > > through the helpers.
> > >   
> > > > > - No notifiers please. The create a locking mess with inversions, and
> > > > >   especially for hotunplug they create the illusion that you can
> > > > >   synchronously keep up to date with hardware state. That's not possible.
> > > > >   Fundamentally all bridge drivers which might be hotunplugged need to be
> > > > >   able to cope with the hardware disappearing any momemnt.    
> > > > 
> > > > Can you clarify this point? I'm sorry I fail to understand the
> > > > relationship between the use of notifiers and the ability of bridge
> > > > drivers to cope with hardware disappearing.
> > > > 
> > > > In this patch, the hotplug-bridge uses notifiers to be informed when
> > > > the following bridge is disappearing: which other way would you suggest
> > > > to make the hotplug-bridge aware of that? OTOH the hotplug-bridge is
> > > > not meant to disappear, and it has no actual hardware that can go away,
> > > > being just a mechanical connector.    
> > > 
> > > Yeah you need code to handle that. My point is that using a notifier is
> > > the wrong design, because the notifier has it's own locking. Which tends
> > > to get in the way.  
> > 
> > I went into this subject to see how drm_bridge_add() could inform the
> > interested bridges using a DRM-specific mechanism instead of standard
> > notifiers.
> > 
> > However I think to inform *the interested* bridges, at least in the
> > device tree case, there is a fundamental issue: DRM core has no idea
> > about the topology. Definitely not at drm_bridge_add() time, way before
> > drm_bridge_attach() where the 'previous' pointer introduces a minimum
> > of topology awareness in the DRM core.  
> 
> Yeah we need special functions, which the last fixed bridge needs to call
> instead of the current set of functions. So instead of of_drm_find_bridge
> we need something like of_drm_register_hotplug_source_bridge or similar.

Continuing on the above topic, are you suggesting that there should be
no hotplug-bridge, and that every bridge that can be used as the "last
fixed bridge" should handle the hotplug case individually?

If that is the case, we'd need to modify any bridge driver that people
want to use as "last fixed bridge" to:

 1. know they are the "last fixed bridge" (via device tree?)
 2. use of_drm_register_hotplug_source_bridge()
    instead of of_drm_find_bridge() accordingly

> > Option 1 is to iterate over all the ports and endpoints and for every
> > remote-endpoint pointing to a bridge, inform the remote bridge about
> > that via a new optional callback in drm_bridge_funcs. That means likely
> > informing more bridges than needed, so when they get informed the
> > bridges will still have to check whether they are interested or not.
> > 
> > Pseudocode:
> > 
> >   void drm_bridge_add(struct drm_bridge *new_bridge)
> >   {
> >      // existing code unchanged
> > 
> > +    if (bridge->of_node) {
> > +       for_each_remote_endpoint(bridge->of_node) {
> > +          br = of_drm_find_bridge(remote_endpoint);
> > +          if (br && br->funcs.bridge_event_notify)
> > +              br->funcs->bridge_event_notify(br, DRM_EVENT_BRIDGE_ADD,
> > +                                             new_bridge);
> > +       }
> > +    }
> >   }
> > 
> > That means informing both upstream and downstream bridges, which could
> > even be useful, but anyway there is no reliable way to pick only the
> > sink or source ports. It also implies having lots of of_*() calls which
> > iterate over the tree, which is not optimal, but it happens only at
> > add/removal so it's fine I guess.
> > 
> > Option 2 is be to just inform all the bridges (using the global
> > bridge_list). Pros and cons:
> > 
> >  * less of_*() calls to crawl around the remote-endpoints
> >  * simpler code
> >  * more bridges than needed would be informed, could be be an issue
> >    if many implement the .bridge_event_notify()
> >  * notifies even in the non-OF case too, not sure it's useful
> > 
> > What are your thoughts about these two options?  
> 
> I'm not sure why you need to inform bridges? Currently for fixed bridges
> we retry with EPROBE_DEFER if the subsequent bridge isn't there yet, which
> means we only add the bridge when the entire chain exists. I think
> hotplugged bridge chains should work the same, which means you only ever
> need to watch the first element shows up. Then you create the connector
> for the entire chain and through special callbacks inform the bridge that
> called of_drm_register_hotplug_source_bridge that it now has a connector.
> 
> All the other bridges can be ignored I think, even when they're part of
> the hotplug chain.
> 
> I just realized that means we need at least a refcount on hotpluggable
> bridges, because otherwise drm_bridge_remove will result in use-after-free
> issues. So should probably rename those from drm_bridge_add/remove to
> drm_bridge_register/unregister.
> 
> For fixed bridges no one else should ever increment the refcount, so
> should work exactly the same.
> 
> > > Instead I think that code should be directly in core bridge code (I don't
> > > see the benefit of having this entirely in a separate driver), using drm
> > > locking to make sure there's no races.  
> > 
> > Not sure I got what you mean here. Which one of the following?
> > 
> >  1. The entire hotplug-bridge driver should not exist, and the DRM
> >     core should handle it all (seems not doable, this driver has lots of
> >     device-specific details)
> >  2. The hotplug-driver should exist, but the code to attach/detach the
> >     pipeline on hotplug/unplug should be moved to the core, with the
> >     hotplug-driver providing callbacks for the device-specific aspects
> >  3. Same as 2, but additionally the hotplug-bridge should become a
> >     connector driver (hotplug-connector.c?) -- not sure it can decouple
> >     the two sides without a bridge however
> >  4. None of the above  
> 
> 3, roughly. The connector creation must be in core bridge code, or things
> will go boom.

Based on this I think you mean:

 1. the hotplug-*something* driver should exist
 2. it should add the fixed connector (DSI in my case) on probe
 3. it should add/remove the removable connector (LVDS) on hot(un)plug
 4. it should add _no_ bridge (in the sense of struct drm_bridge)
    [not sure it can still decouple the fixed VS addon pipeline parts]

Could you please clearly state your position about the above points?

Best regards,
Luca

> If you also want to keep the current lifetime model for bridge (without a
> refcount, meaning drm_bridge_remove removes the bridge from all use right
> away) there's also additional locking and tricks you need in in that
> bridge connector code, including the entire bridge chain. Meaning those
> hotplugged bridges must not be visible at all to the wider driver.
> 
> Or you refcount bridges and make them like driver model struct kobj (or
> kobj directly), but that's a pile of work too. That would be the other
> design approach, instead of only relying on drm_connector. We might need
> both, I haven't dug into the details enough to be sure here.
> 
> Cheers, Sima
> 
> 
> > > > On the opposite side, the following bridges are physically removable
> > > > and so their drivers will have to be fixed (if needed) to work when
> > > > removal happens, but I don't see how that relates to the DRM core
> > > > emitting a notification of such bridges being removed.    
> > > 
> > > Yeah it's not directly related, just my experience that people assume
> > > notifiers provide you more synchronization and race-preventation than they
> > > really do. So it's better to hand-roll to make it all really explicit.
> > >   
> > > > > - Related to this: You're not allowed to shut down hardware behind the
> > > > >   user's back with drm_atomic_helper_shutdown. We've tried that approach
> > > > >   with dp mst, it really pisses off userspace when a page_flip that it
> > > > >   expected to work doesn't work.
> > > > > 
> > > > > - There's also the design aspect that in atomic, only atomic_check is
> > > > >   allowed to fail, atomic_commit must succeed, even when the hardware is
> > > > >   gone. Using connectors and their refcounting should help with that.    
> > > > 
> > > > IIUC here you are suggesting again to remove the connector instead of
> > > > marking it "disconnected". So, as above, pointers on how that is
> > > > achieved would be helpful.    
> > > 
> > > See dp mst code. It's complex unfortunately, so there's some reading
> > > involved :-/  
> > > >    
> > > > > - Somewhat aside, but I noticed that the bridge->atomic_reset is in
> > > > >   drm_bridge_attach, and that's kinda the wrong place. It should be in
> > > > >   drm_mode_config_reset, like all the other ->atomic_reset hooks. That
> > > > >   would make it a lot clearer that we need to figure out who/when    
> > > > >   ->atomic_reset should be called for hotplugged bridges, maybe as part of      
> > > > >   connector registration when the entire bridge and it's new connector is
> > > > >   assembled?
> > > > > 
> > > > > - Finally this very much means we need to rethink who/how the connector
> > > > >   for a bridge is created. The new design is that the main driver creates
> > > > >   this connector, once the entire bridge exists. But with hotplugging this
> > > > >   gets a lot more complicated, so we might want to extract a pile of that
> > > > >   encoder related code from drivers (same way dp mst helpers take care of
> > > > >   connector creation too, it's just too much of a mess otherwise).
> > > > > 
> > > > >   The current bridge chaining infrastructure requires a lot of hand-rolled
> > > > >   code in each bridge driver and the encoder, so that might be a good
> > > > >   thing anyway.
> > > > > 
> > > > > - Finally I think the entire bridge hotplug infrastructure should be
> > > > >   irrespective of the underlying bus. Which means for the mipi dsi case we
> > > > >   might also want to look into what's missing to make mipi dsi
> > > > >   hotunpluggable, at least for the case where it's a proper driver. I
> > > > >   think we should ignore the old bridge model where driver's stitched it
> > > > >   all toghether using the component framework, in my opinion that approach
> > > > >   should be deprecated.    
> > > > 
> > > > The DSI side was one of my headaches on writing this driver, and I
> > > > must say I dislike the code in hotplug_bridge_dsi_attach(), with the
> > > > need for an initial "dummy" attach and detach the first time. At first
> > > > sight I think we need a .update_format callback in struct
> > > > mipi_dsi_host_ops so the DSI host is aware of a format change.
> > > > 
> > > > Right now there are DSI host drivers which keep a copy of the struct
> > > > mipi_dsi_device pointer and read the format from there when starting a
> > > > stream (e.g. the tc358768.c driver [1]). That somewhat provides a way
> > > > to react to format changes, but keeping a pointer is bad when the DSI
> > > > device hot-unplugs, and the new format won't be seen until a
> > > > disable/enable cycle. So a new callback looks more robust overall.
> > > > 
> > > > Any opinion about this?    
> > > 
> > > Yeah I don't like the code either.
> > > 
> > > What might help for prototyping is if you start with a hotpluggeable
> > > bridge where the bridge is a i2c device. That way I think we should be
> > > able to benefit from the driver bind/unbind code that exists already.
> > > Although there's going to be issues with i2c hotunplug too in i2c core
> > > code.
> > > 
> > > Then lift whatever we learn there to our dsi code. But essentially I think
> > > we should model the driver core model a lot more, so I guess you could use
> > > any hotunplug capable bus as a template. Usb might be closest, since
> > > that's also a packet/message based interface, mostly at least?
> > >   
> > > > > - Finally I think we should have a lot of safety checks, like only bridges
> > > > >   which declare themselve to be hotunplug safe should be allowed as a part
> > > > >   of the hotpluggable bridge chain part. All others must still be attached
> > > > >   before the entire driver is registered with drm_dev_register.    
> > > > 
> > > > I'm fine with the principle of a "HOTPLUG" flag.
> > > > 
> > > > I wonder how that should be implemented, though. Based on my (surely
> > > > simplistic) understanding, right now bridges can be both added and
> > > > removed, but only in a specific sequence:
> > > > 
> > > >  1. add all bridges
> > > >  2. use the card
> > > >  3. remove all bridges
> > > >  4. EOF
> > > > 
> > > > We'd need to change to allow:
> > > > 
> > > >  1. add fixed bridges (including hotplug-bridge)
> > > >  2. add bridges on removable add-on
> > > >  3. use card
> > > >  4. remove bridges on removable add-on
> > > >  5. optionally goto 2
> > > >  6. remove fixed bridges (including hotplug-bridge)
> > > >  7. EOF
> > > > 
> > > > One naïve idea is that the DRM core keeps a flag whenever any fixed
> > > > bridge (no HOTLPUG flag) is removed and when that happens forbid adding
> > > > bridges as we are now at line 5.    
> > > 
> > > If a fixed bridge is removed while the driver is still in use (i.e. before
> > > drm_dev_unregister is called), that's a driver bug. Would be good to catch
> > > that, which is why I think a lot of all the bridge hotplug handling should
> > > be in core bridge code and not the separate hotplug driver, so that we can
> > > enforce all these constraints.
> > > 
> > > Also conceptually 3 can happen before 2 (but also before), and that's how
> > > dp mst works too. It does add all kinds of complications though ...
> > >   
> > > > Aside for tons of subtleties I'm surely missing, does this look a
> > > > proper approach? I'd not be surprised if there is something better and
> > > > more solid.    
> > > 
> > > Yeah roughly. If you look through dp mst code then that also adds all
> > > kinds of structures (since dp mst is a routed network really), not just
> > > the connectors. They also all come with refcounts (because the network is
> > > a tree), but like I said I think for your case we can avoid the per-bridge
> > > refcounts by relying on the connector refcount we have already.
> > > 
> > > But I might be overlooking something, and we need refcounting for each
> > > bridge like dp mst also needs refcounting for all its internal structures
> > > that map the entire hotpluggable display chain. If you want to read some
> > > dp mst code, these internal structures are ports/branches with struct
> > > drm_dp_mst_branch/port.
> > >   
> > > > >   Or that we only allow bridges with the NO_CONNECTOR flag for
> > > > >   drm_bridge_attach.
> > > > > 
> > > > > There's probably a pile more fundamental issues I've missed, but this
> > > > > should get a good discussion started.    
> > > > 
> > > > Sure, I think it has.
> > > > 
> > > > Bottom line, I'm clearly not seeing some issues that need to be
> > > > considered, and that do not show up for my use case. Based on my
> > > > limited DRM knowledge, this was expected, and I'm glad to work on those
> > > > issues with some practical indications about the path forward.    
> > > 
> > > Yeah for learning I think dp mst is best. It's a bit complex, but since
> > > you have a dock you should be able to play around and experiment with the
> > > code with some real hardware.
> > > 
> > > That should help to get a bit a feel for the complexity, since lots of
> > > opportunities for you to ask why/how locking/refcounting is used and
> > > against which potential issue they protect.
> > > 
> > > Cheers, Sima  
> > 
> > Luca
> > 
> > -- 
> > Luca Ceresoli, Bootlin
> > Embedded Linux and Kernel engineering
> > https://bootlin.com  
>