mbox series

[v4,0/8] Add tda998x (HDMI) support to atmel-hlcdc

Message ID 20180423072301.11962-1-peda@axentia.se
Headers show
Series Add tda998x (HDMI) support to atmel-hlcdc | expand

Message

Peter Rosin April 23, 2018, 7:22 a.m. UTC
Hi!

I naively thought that since there was support for both nxp,tda19988 (in
the tda998x driver) and the atmel-hlcdc, things would be a smooth ride.
But it wasn't, so I started looking around and realized I had to fix
things.

In v1 and v2 I fixed things by making the atmel-hlcdc driver a master
component, but now I fix things by making the tda998x driver a bridge
instead. This was after a suggestion from Boris Brezillon the
atmel-hlcdc maintainer), so there was some risk of bias ... but after
comparing what was needed, I too find the bridge approach better.

In addition to the above, our PCB interface between the SAMA5D3 and the
HDMI encoder is only using 16 bits, and this has to be described
somewhere, or the atmel-hlcdc driver have no chance of selecting the
correct output mode. Since I have similar problems with a ds90c185 lvds
encoder I added patches to override the atmel-hlcdc output format via
DT properties compatible with the media video-interface binding and
things start to play together.

Since this series superseeds the bridge series [1], I have included the
leftover bindings patch for the ti,ds90c185 here.

Anyway, this series solves some real issues for my HW.

And as a last bit I remove the master component code from the tilcdc
driver. It should no longer be needed, but I have only compile-tested
the patch and there could easily be some gotcha. I marked it RFC, and
it should get a Tested-by from someone with a tilcdc/tda998x combo
before it is applied.

Cheers,
Peter

Changes since v3   https://lkml.org/lkml/2018/4/19/736
- moved the meat of the drm_of_media_bus_fmt function from patch 3/7
  to a driver-local atmel_hlcdc_of_bus_fmt function, and squashed with
  patch 4/7 (this is now patch 3/8).
- patch 3/8 now parses and stores the DT bus format together with the
  encoder in a reintroduced struct atmel_hlcdc_rgb_output instead of
  allocating a separate encoder-indexed array for the bus formats.
- patch 5/8 is new and splits tda988x_encoder_dpms into a pair of
  functions to enable/disable separately. This fits better with both
  the current code and for the followup drm_bridge patch (7/8).
- adjust patch 6/8 and 7/8 to the above simplification.
- added patch 8/8 that removes component master code from tilcdc.

Changes since v2   https://lkml.org/lkml/2018/4/17/385
- patch 2/7 fixed spelling and added an example
- patch 4/7 parse the DT up front and store the result indexed by encoder
- old patch 5/6 and 6/6 dropped
- patch 5-7/7 are new and makes the tda998x driver a drm_bridge

Changes since v1   https://lkml.org/lkml/2018/4/9/294
- added reviewed-by from Rob to patch 1/6
- patch 2/6 changed so that the bus format override is in the endpoint
  DT node, and follows the binding of media video-interfaces.
- patch 3/6 is new, it adds drm_of_media_bus_fmt which parses above
  media video-interface binding (partially).
- patch 4/6 now makes use of the above helper (and also fixes problems
  with the 3/5 patch from v1 when no override was specified).
- do not mention unrelated connector display_info details in the cover
  letter and commit messages.

[1]
"Bridge" series v2   https://lkml.org/lkml/2018/3/26/610
"Bridge" series v1   https://lkml.org/lkml/2018/3/17/221

Peter Rosin (8):
  dt-bindings: display: bridge: lvds-transmitter: add ti,ds90c185
  dt-bindings: display: atmel: optional video-interface of endpoints
  drm/atmel-hlcdc: support bus-width (12/16/18/24) in endpoint nodes
  drm/i2c: tda998x: find the drm_device via the drm_connector
  drm/i2c: tda998x: split tda998x_encoder_dpms into enable/disable
  drm/i2c: tda998x: split encoder and component functions from the work
  drm/i2c: tda998x: register as a drm bridge
  drm/tilcdc: decomponentize now that tda998x is a bridge

 .../devicetree/bindings/display/atmel/hlcdc-dc.txt |  26 ++
 .../bindings/display/bridge/lvds-transmitter.txt   |   8 +-
 drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_crtc.c     |  70 ++++--
 drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_dc.h       |   1 +
 drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_output.c   |  67 +++++-
 drivers/gpu/drm/i2c/tda998x_drv.c                  | 268 ++++++++++++++++-----
 drivers/gpu/drm/tilcdc/tilcdc_crtc.c               |  11 -
 drivers/gpu/drm/tilcdc/tilcdc_drv.c                |  67 +-----
 drivers/gpu/drm/tilcdc/tilcdc_drv.h                |   1 -
 drivers/gpu/drm/tilcdc/tilcdc_external.c           |  51 ----
 drivers/gpu/drm/tilcdc/tilcdc_external.h           |   4 +-
 11 files changed, 357 insertions(+), 217 deletions(-)

Comments

Russell King (Oracle) April 23, 2018, 4:08 p.m. UTC | #1
On Mon, Apr 23, 2018 at 09:23:00AM +0200, Peter Rosin wrote:
>  static int tda998x_remove(struct i2c_client *client)
>  {
> -	component_del(&client->dev, &tda998x_ops);
> +	struct device *dev = &client->dev;
> +	struct tda998x_bridge *bridge = dev_get_drvdata(dev);
> +
> +	drm_bridge_remove(&bridge->bridge);
> +	component_del(dev, &tda998x_ops);
> +

I'd like to ask a rather fundamental question about DRM bridge support,
because I suspect that there's a major fsckup here.

The above is the function that deals with the TDA998x device being
unbound from the driver.  With the component API, this results in the
DRM device correctly being torn down, because one of the hardware
devices has gone.

With DRM bridge, the bridge is merely removed from the list of
bridges:

void drm_bridge_remove(struct drm_bridge *bridge)
{
        mutex_lock(&bridge_lock);
        list_del_init(&bridge->list);
        mutex_unlock(&bridge_lock);
}
EXPORT_SYMBOL(drm_bridge_remove);

and the memory backing the "struct tda998x_bridge" (which contains
the struct drm_bridge) will be freed by the devm subsystem.

However, there is no notification into the rest of the DRM subsystem
that the device has gone away.  Worse, the memory that is still in
use by DRM has now been freed, so further use of the DRM device
results in a use-after-free bug.

This is really not good, and to me looks like a fundamental problem
with the DRM bridge code.  I see nothing in the DRM bridge code that
deals with the lifetime of a "DRM bridge" or indeed the lifetime of
the actual device itself.

So, from what I can see, there seems to be a fundamental lifetime
issue with the design of the DRM bridge code.  This needs to be
fixed.
Boris Brezillon April 23, 2018, 5:40 p.m. UTC | #2
On Mon, 23 Apr 2018 09:22:56 +0200
Peter Rosin <peda@axentia.se> wrote:

> This beats the heuristic that the connector is involved in what format
> should be output for cases where this fails.
> 
> E.g. if there is a bridge that changes format between the encoder and the
> connector, or if some of the RGB pins between the lcd controller and the
> encoder are not routed on the PCB.
> 
> This is critical for the devices that have the "conflicting output
> formats" issue (SAM9N12, SAM9X5, SAMA5D3), since the most significant
> RGB bits move around depending on the selected output mode. For
> devices that do not have the "conflicting output formats" issue
> (SAMA5D2, SAMA5D4), this is completely irrelevant.
> 
> Signed-off-by: Peter Rosin <peda@axentia.se>

Acked-by: Boris Brezillon <boris.brezillon@bootlin.com>

> ---
>  drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_crtc.c   | 70 +++++++++++++++++-------
>  drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_dc.h     |  1 +
>  drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_output.c | 67 ++++++++++++++++++++---
>  3 files changed, 111 insertions(+), 27 deletions(-)
> 
> diff --git a/drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_crtc.c b/drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_crtc.c
> index d73281095fac..c38a479ada98 100644
> --- a/drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_crtc.c
> +++ b/drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_crtc.c
> @@ -226,6 +226,55 @@ static void atmel_hlcdc_crtc_atomic_enable(struct drm_crtc *c,
>  #define ATMEL_HLCDC_RGB888_OUTPUT	BIT(3)
>  #define ATMEL_HLCDC_OUTPUT_MODE_MASK	GENMASK(3, 0)
>  
> +static int atmel_hlcdc_connector_output_mode(struct drm_connector_state *state)
> +{
> +	struct drm_connector *connector = state->connector;
> +	struct drm_display_info *info = &connector->display_info;
> +	struct drm_encoder *encoder;
> +	unsigned int supported_fmts = 0;
> +	int j;
> +
> +	encoder = state->best_encoder;
> +	if (!encoder)
> +		encoder = connector->encoder;
> +
> +	switch (atmel_hlcdc_encoder_get_bus_fmt(encoder)) {
> +	case 0:
> +		break;
> +	case MEDIA_BUS_FMT_RGB444_1X12:
> +		return ATMEL_HLCDC_RGB444_OUTPUT;
> +	case MEDIA_BUS_FMT_RGB565_1X16:
> +		return ATMEL_HLCDC_RGB565_OUTPUT;
> +	case MEDIA_BUS_FMT_RGB666_1X18:
> +		return ATMEL_HLCDC_RGB666_OUTPUT;
> +	case MEDIA_BUS_FMT_RGB888_1X24:
> +		return ATMEL_HLCDC_RGB888_OUTPUT;
> +	default:
> +		return -EINVAL;
> +	}
> +
> +	for (j = 0; j < info->num_bus_formats; j++) {
> +		switch (info->bus_formats[j]) {
> +		case MEDIA_BUS_FMT_RGB444_1X12:
> +			supported_fmts |= ATMEL_HLCDC_RGB444_OUTPUT;
> +			break;
> +		case MEDIA_BUS_FMT_RGB565_1X16:
> +			supported_fmts |= ATMEL_HLCDC_RGB565_OUTPUT;
> +			break;
> +		case MEDIA_BUS_FMT_RGB666_1X18:
> +			supported_fmts |= ATMEL_HLCDC_RGB666_OUTPUT;
> +			break;
> +		case MEDIA_BUS_FMT_RGB888_1X24:
> +			supported_fmts |= ATMEL_HLCDC_RGB888_OUTPUT;
> +			break;
> +		default:
> +			break;
> +		}
> +	}
> +
> +	return supported_fmts;
> +}
> +
>  static int atmel_hlcdc_crtc_select_output_mode(struct drm_crtc_state *state)
>  {
>  	unsigned int output_fmts = ATMEL_HLCDC_OUTPUT_MODE_MASK;
> @@ -238,31 +287,12 @@ static int atmel_hlcdc_crtc_select_output_mode(struct drm_crtc_state *state)
>  	crtc = drm_crtc_to_atmel_hlcdc_crtc(state->crtc);
>  
>  	for_each_new_connector_in_state(state->state, connector, cstate, i) {
> -		struct drm_display_info *info = &connector->display_info;
>  		unsigned int supported_fmts = 0;
> -		int j;
>  
>  		if (!cstate->crtc)
>  			continue;
>  
> -		for (j = 0; j < info->num_bus_formats; j++) {
> -			switch (info->bus_formats[j]) {
> -			case MEDIA_BUS_FMT_RGB444_1X12:
> -				supported_fmts |= ATMEL_HLCDC_RGB444_OUTPUT;
> -				break;
> -			case MEDIA_BUS_FMT_RGB565_1X16:
> -				supported_fmts |= ATMEL_HLCDC_RGB565_OUTPUT;
> -				break;
> -			case MEDIA_BUS_FMT_RGB666_1X18:
> -				supported_fmts |= ATMEL_HLCDC_RGB666_OUTPUT;
> -				break;
> -			case MEDIA_BUS_FMT_RGB888_1X24:
> -				supported_fmts |= ATMEL_HLCDC_RGB888_OUTPUT;
> -				break;
> -			default:
> -				break;
> -			}
> -		}
> +		supported_fmts = atmel_hlcdc_connector_output_mode(cstate);
>  
>  		if (crtc->dc->desc->conflicting_output_formats)
>  			output_fmts &= supported_fmts;
> diff --git a/drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_dc.h b/drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_dc.h
> index ab32d5b268d2..77bd2d0ae508 100644
> --- a/drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_dc.h
> +++ b/drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_dc.h
> @@ -454,5 +454,6 @@ void atmel_hlcdc_crtc_irq(struct drm_crtc *c);
>  int atmel_hlcdc_crtc_create(struct drm_device *dev);
>  
>  int atmel_hlcdc_create_outputs(struct drm_device *dev);
> +int atmel_hlcdc_encoder_get_bus_fmt(struct drm_encoder *encoder);
>  
>  #endif /* DRM_ATMEL_HLCDC_H */
> diff --git a/drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_output.c b/drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_output.c
> index 8db51fb131db..db4d6aaaef93 100644
> --- a/drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_output.c
> +++ b/drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_output.c
> @@ -27,33 +27,86 @@
>  
>  #include "atmel_hlcdc_dc.h"
>  
> +struct atmel_hlcdc_rgb_output {
> +	struct drm_encoder encoder;
> +	int bus_fmt;
> +};
> +
>  static const struct drm_encoder_funcs atmel_hlcdc_panel_encoder_funcs = {
>  	.destroy = drm_encoder_cleanup,
>  };
>  
> +static struct atmel_hlcdc_rgb_output *
> +atmel_hlcdc_encoder_to_rgb_output(struct drm_encoder *encoder)
> +{
> +	return container_of(encoder, struct atmel_hlcdc_rgb_output, encoder);
> +}
> +
> +int atmel_hlcdc_encoder_get_bus_fmt(struct drm_encoder *encoder)
> +{
> +	struct atmel_hlcdc_rgb_output *output;
> +
> +	output = atmel_hlcdc_encoder_to_rgb_output(encoder);
> +
> +	return output->bus_fmt;
> +}
> +
> +static int atmel_hlcdc_of_bus_fmt(struct device_node *ep)
> +{
> +	u32 bus_width = 0;
> +
> +	of_property_read_u32(ep, "bus-width", &bus_width);
> +
> +	switch (bus_width) {
> +	case 0:
> +		return 0;
> +	case 12:
> +		return MEDIA_BUS_FMT_RGB444_1X12;
> +	case 16:
> +		return MEDIA_BUS_FMT_RGB565_1X16;
> +	case 18:
> +		return MEDIA_BUS_FMT_RGB666_1X18;
> +	case 24:
> +		return MEDIA_BUS_FMT_RGB888_1X24;
> +	default:
> +		return -EINVAL;
> +	}
> +}
> +
>  static int atmel_hlcdc_attach_endpoint(struct drm_device *dev, int endpoint)
>  {
> -	struct drm_encoder *encoder;
> +	struct atmel_hlcdc_rgb_output *output;
> +	struct device_node *ep;
>  	struct drm_panel *panel;
>  	struct drm_bridge *bridge;
>  	int ret;
>  
> +	ep = of_graph_get_endpoint_by_regs(dev->dev->of_node, 0, endpoint);
> +	if (!ep)
> +		return -ENODEV;
> +
>  	ret = drm_of_find_panel_or_bridge(dev->dev->of_node, 0, endpoint,
>  					  &panel, &bridge);
>  	if (ret)
>  		return ret;
>  
> -	encoder = devm_kzalloc(dev->dev, sizeof(*encoder), GFP_KERNEL);
> -	if (!encoder)
> +	output = devm_kzalloc(dev->dev, sizeof(*output), GFP_KERNEL);
> +	if (!output)
>  		return -EINVAL;
>  
> -	ret = drm_encoder_init(dev, encoder,
> +	output->bus_fmt = atmel_hlcdc_of_bus_fmt(ep);
> +	if (output->bus_fmt < 0) {
> +		dev_err(dev->dev, "invalid bus width\n");
> +		return -EINVAL;
> +	}
> +
> +	ret = drm_encoder_init(dev, &output->encoder,
>  			       &atmel_hlcdc_panel_encoder_funcs,
>  			       DRM_MODE_ENCODER_NONE, NULL);
>  	if (ret)
>  		return ret;
>  
> -	encoder->possible_crtcs = 0x1;
> +	output->encoder.possible_crtcs = 0x1;
>  
>  	if (panel) {
>  		bridge = drm_panel_bridge_add(panel, DRM_MODE_CONNECTOR_Unknown);
> @@ -62,7 +115,7 @@ static int atmel_hlcdc_attach_endpoint(struct drm_device *dev, int endpoint)
>  	}
>  
>  	if (bridge) {
> -		ret = drm_bridge_attach(encoder, bridge, NULL);
> +		ret = drm_bridge_attach(&output->encoder, bridge, NULL);
>  		if (!ret)
>  			return 0;
>  
> @@ -70,7 +123,7 @@ static int atmel_hlcdc_attach_endpoint(struct drm_device *dev, int endpoint)
>  			drm_panel_bridge_remove(bridge);
>  	}
>  
> -	drm_encoder_cleanup(encoder);
> +	drm_encoder_cleanup(&output->encoder);
>  
>  	return ret;
>  }

--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Russell King (Oracle) April 24, 2018, 8:08 a.m. UTC | #3
On Tue, Apr 24, 2018 at 08:58:42AM +0200, Peter Rosin wrote:
> On 2018-04-23 18:08, Russell King - ARM Linux wrote:
> > On Mon, Apr 23, 2018 at 09:23:00AM +0200, Peter Rosin wrote:
> >>  static int tda998x_remove(struct i2c_client *client)
> >>  {
> >> -	component_del(&client->dev, &tda998x_ops);
> >> +	struct device *dev = &client->dev;
> >> +	struct tda998x_bridge *bridge = dev_get_drvdata(dev);
> >> +
> >> +	drm_bridge_remove(&bridge->bridge);
> >> +	component_del(dev, &tda998x_ops);
> >> +
> > 
> > I'd like to ask a rather fundamental question about DRM bridge support,
> > because I suspect that there's a major fsckup here.
> > 
> > The above is the function that deals with the TDA998x device being
> > unbound from the driver.  With the component API, this results in the
> > DRM device correctly being torn down, because one of the hardware
> > devices has gone.
> > 
> > With DRM bridge, the bridge is merely removed from the list of
> > bridges:
> > 
> > void drm_bridge_remove(struct drm_bridge *bridge)
> > {
> >         mutex_lock(&bridge_lock);
> >         list_del_init(&bridge->list);
> >         mutex_unlock(&bridge_lock);
> > }
> > EXPORT_SYMBOL(drm_bridge_remove);
> > 
> > and the memory backing the "struct tda998x_bridge" (which contains
> > the struct drm_bridge) will be freed by the devm subsystem.
> > 
> > However, there is no notification into the rest of the DRM subsystem
> > that the device has gone away.  Worse, the memory that is still in
> > use by DRM has now been freed, so further use of the DRM device
> > results in a use-after-free bug.
> > 
> > This is really not good, and to me looks like a fundamental problem
> > with the DRM bridge code.  I see nothing in the DRM bridge code that
> > deals with the lifetime of a "DRM bridge" or indeed the lifetime of
> > the actual device itself.
> > 
> > So, from what I can see, there seems to be a fundamental lifetime
> > issue with the design of the DRM bridge code.  This needs to be
> > fixed.
> 
> Oh crap. A gigantic can of worms...

Yes, it's especially annoying for me, having put the effort in to
the component helper to cover all these cases.

> Would a patch (completely untested btw) along this line of thinking make
> any difference whatsoever?

It looks interesting - from what I can see of the device links code,
it would have the effect of unbinding the DRM device just before
TDA998x is unbound, so that's an improvement.

However, from what I can see, the link vanishes at that point (as
DL_FLAG_AUTOREMOVE is set), and re-binding the TDA998x device results
in nothing further happening - the link will be recreated, but there
appears to be nothing that triggers the "consumer" to rebind at that
point.  Maybe I've missed something?
Peter Rosin April 24, 2018, 1:26 p.m. UTC | #4
On 2018-04-24 12:14, Peter Rosin wrote:
> On 2018-04-24 10:08, Russell King - ARM Linux wrote:
>> On Tue, Apr 24, 2018 at 08:58:42AM +0200, Peter Rosin wrote:
>>> On 2018-04-23 18:08, Russell King - ARM Linux wrote:
>>>> On Mon, Apr 23, 2018 at 09:23:00AM +0200, Peter Rosin wrote:
>>>>>  static int tda998x_remove(struct i2c_client *client)
>>>>>  {
>>>>> -	component_del(&client->dev, &tda998x_ops);
>>>>> +	struct device *dev = &client->dev;
>>>>> +	struct tda998x_bridge *bridge = dev_get_drvdata(dev);
>>>>> +
>>>>> +	drm_bridge_remove(&bridge->bridge);
>>>>> +	component_del(dev, &tda998x_ops);
>>>>> +
>>>>
>>>> I'd like to ask a rather fundamental question about DRM bridge support,
>>>> because I suspect that there's a major fsckup here.
>>>>
>>>> The above is the function that deals with the TDA998x device being
>>>> unbound from the driver.  With the component API, this results in the
>>>> DRM device correctly being torn down, because one of the hardware
>>>> devices has gone.
>>>>
>>>> With DRM bridge, the bridge is merely removed from the list of
>>>> bridges:
>>>>
>>>> void drm_bridge_remove(struct drm_bridge *bridge)
>>>> {
>>>>         mutex_lock(&bridge_lock);
>>>>         list_del_init(&bridge->list);
>>>>         mutex_unlock(&bridge_lock);
>>>> }
>>>> EXPORT_SYMBOL(drm_bridge_remove);
>>>>
>>>> and the memory backing the "struct tda998x_bridge" (which contains
>>>> the struct drm_bridge) will be freed by the devm subsystem.
>>>>
>>>> However, there is no notification into the rest of the DRM subsystem
>>>> that the device has gone away.  Worse, the memory that is still in
>>>> use by DRM has now been freed, so further use of the DRM device
>>>> results in a use-after-free bug.
>>>>
>>>> This is really not good, and to me looks like a fundamental problem
>>>> with the DRM bridge code.  I see nothing in the DRM bridge code that
>>>> deals with the lifetime of a "DRM bridge" or indeed the lifetime of
>>>> the actual device itself.
>>>>
>>>> So, from what I can see, there seems to be a fundamental lifetime
>>>> issue with the design of the DRM bridge code.  This needs to be
>>>> fixed.
>>>
>>> Oh crap. A gigantic can of worms...
>>
>> Yes, it's especially annoying for me, having put the effort in to
>> the component helper to cover all these cases.
>>
>>> Would a patch (completely untested btw) along this line of thinking make
>>> any difference whatsoever?
>>
>> It looks interesting - from what I can see of the device links code,
>> it would have the effect of unbinding the DRM device just before
>> TDA998x is unbound, so that's an improvement.
>>
>> However, from what I can see, the link vanishes at that point (as
>> DL_FLAG_AUTOREMOVE is set), and re-binding the TDA998x device results
>> in nothing further happening - the link will be recreated, but there
>> appears to be nothing that triggers the "consumer" to rebind at that
>> point.  Maybe I've missed something?
> 
> Right, auto-remove is a no-go. So, improving on the previous...

Heh, I didn't address the rebind triggering part at all, and while I'm by
no means responsible or have any deep knowledge, I thought this was true:

- driver .remove for the device owning the drm_bridge is what typically
  calls drm_bridge_remove()
- driver .remove is called as part of the device being unbound from the
  driver by the bus (i2c in this case)
- by registering a link to the consumer, this unbinding will trigger the
  removal of this main drm consumer device as part of the unbinding
- so everything aboput the drm device will be torn down, and everything
  will thus have to be reprobed to get things back

But I could easily have misunderstood just about everything in the above...

And maybe it's really inconvenient to have to trigger a reprobe of the
whole drm cluster? Maybe all drm driver parts should be components?

I have no idea.

Cheers,
Peter

PS. compile-tested the below and drm_bridge.c needs to
#include <drm/drm_device.h>

> (I think drm_panel might suffer from this issue too?)
> 
> Cheers,
> Peter
> 
> diff --git a/drivers/gpu/drm/drm_bridge.c b/drivers/gpu/drm/drm_bridge.c
> index 1638bfe9627c..b1365cfee445 100644
> --- a/drivers/gpu/drm/drm_bridge.c
> +++ b/drivers/gpu/drm/drm_bridge.c
> @@ -121,12 +121,17 @@ int drm_bridge_attach(struct drm_encoder *encoder, struct drm_bridge *bridge,
>  	if (bridge->dev)
>  		return -EBUSY;
>  
> +	bridge->link = device_link_add(encoder->dev->dev, bridge->owner, 0);
> +	if (!bridge->link)
> +		return -EINVAL;
> +
>  	bridge->dev = encoder->dev;
>  	bridge->encoder = encoder;
>  
>  	if (bridge->funcs->attach) {
>  		ret = bridge->funcs->attach(bridge);
>  		if (ret < 0) {
> +			device_link_del(bridge->link);
>  			bridge->dev = NULL;
>  			bridge->encoder = NULL;
>  			return ret;
> @@ -153,6 +158,8 @@ void drm_bridge_detach(struct drm_bridge *bridge)
>  	if (bridge->funcs->detach)
>  		bridge->funcs->detach(bridge);
>  
> +	device_link_del(bridge->link);
> +
>  	bridge->dev = NULL;
>  }
>  
> diff --git a/drivers/gpu/drm/i2c/tda998x_drv.c b/drivers/gpu/drm/i2c/tda998x_drv.c
> index b8cb6237a38b..29eba4e9a39d 100644
> --- a/drivers/gpu/drm/i2c/tda998x_drv.c
> +++ b/drivers/gpu/drm/i2c/tda998x_drv.c
> @@ -1857,6 +1857,7 @@ tda998x_probe(struct i2c_client *client, const struct i2c_device_id *id)
>  	bridge->dev = dev;
>  	dev_set_drvdata(dev, bridge);
>  
> +	bridge->bridge.owner = dev;
>  	bridge->bridge.funcs = &tda998x_bridge_funcs;
>  #ifdef CONFIG_OF
>  	bridge->bridge.of_node = dev->of_node;
> diff --git a/include/drm/drm_bridge.h b/include/drm/drm_bridge.h
> index 682d01ba920c..b8f33aba3216 100644
> --- a/include/drm/drm_bridge.h
> +++ b/include/drm/drm_bridge.h
> @@ -224,6 +224,8 @@ struct drm_bridge_funcs {
>  
>  /**
>   * struct drm_bridge - central DRM bridge control structure
> + * @owner: device that owns the bridge
> + * @link: drm consumer <-> bridge supplier
>   * @dev: DRM device this bridge belongs to
>   * @encoder: encoder to which this bridge is connected
>   * @next: the next bridge in the encoder chain
> @@ -233,6 +235,8 @@ struct drm_bridge_funcs {
>   * @driver_private: pointer to the bridge driver's internal context
>   */
>  struct drm_bridge {
> +	struct device *owner;
> +	struct device_link *link;
>  	struct drm_device *dev;
>  	struct drm_encoder *encoder;
>  	struct drm_bridge *next;
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jyri Sarha April 24, 2018, 4:04 p.m. UTC | #5
On 24/04/18 13:14, Peter Rosin wrote:
> On 2018-04-24 10:08, Russell King - ARM Linux wrote:
>> On Tue, Apr 24, 2018 at 08:58:42AM +0200, Peter Rosin wrote:
>>> On 2018-04-23 18:08, Russell King - ARM Linux wrote:
>>>> On Mon, Apr 23, 2018 at 09:23:00AM +0200, Peter Rosin wrote:
>>>>>  static int tda998x_remove(struct i2c_client *client)
>>>>>  {
>>>>> -	component_del(&client->dev, &tda998x_ops);
>>>>> +	struct device *dev = &client->dev;
>>>>> +	struct tda998x_bridge *bridge = dev_get_drvdata(dev);
>>>>> +
>>>>> +	drm_bridge_remove(&bridge->bridge);
>>>>> +	component_del(dev, &tda998x_ops);
>>>>> +
>>>>
>>>> I'd like to ask a rather fundamental question about DRM bridge support,
>>>> because I suspect that there's a major fsckup here.
>>>>
>>>> The above is the function that deals with the TDA998x device being
>>>> unbound from the driver.  With the component API, this results in the
>>>> DRM device correctly being torn down, because one of the hardware
>>>> devices has gone.
>>>>
>>>> With DRM bridge, the bridge is merely removed from the list of
>>>> bridges:
>>>>
>>>> void drm_bridge_remove(struct drm_bridge *bridge)
>>>> {
>>>>         mutex_lock(&bridge_lock);
>>>>         list_del_init(&bridge->list);
>>>>         mutex_unlock(&bridge_lock);
>>>> }
>>>> EXPORT_SYMBOL(drm_bridge_remove);
>>>>
>>>> and the memory backing the "struct tda998x_bridge" (which contains
>>>> the struct drm_bridge) will be freed by the devm subsystem.
>>>>
>>>> However, there is no notification into the rest of the DRM subsystem
>>>> that the device has gone away.  Worse, the memory that is still in
>>>> use by DRM has now been freed, so further use of the DRM device
>>>> results in a use-after-free bug.
>>>>
>>>> This is really not good, and to me looks like a fundamental problem
>>>> with the DRM bridge code.  I see nothing in the DRM bridge code that
>>>> deals with the lifetime of a "DRM bridge" or indeed the lifetime of
>>>> the actual device itself.
>>>>
>>>> So, from what I can see, there seems to be a fundamental lifetime
>>>> issue with the design of the DRM bridge code.  This needs to be
>>>> fixed.
>>>
>>> Oh crap. A gigantic can of worms...
>>
>> Yes, it's especially annoying for me, having put the effort in to
>> the component helper to cover all these cases.
>>
>>> Would a patch (completely untested btw) along this line of thinking make
>>> any difference whatsoever?
>>
>> It looks interesting - from what I can see of the device links code,
>> it would have the effect of unbinding the DRM device just before
>> TDA998x is unbound, so that's an improvement.
>>
>> However, from what I can see, the link vanishes at that point (as
>> DL_FLAG_AUTOREMOVE is set), and re-binding the TDA998x device results
>> in nothing further happening - the link will be recreated, but there
>> appears to be nothing that triggers the "consumer" to rebind at that
>> point.  Maybe I've missed something?
> 
> Right, auto-remove is a no-go. So, improving on the previous...
> 
> (I think drm_panel might suffer from this issue too?)

Yes it does and I took a shot at trying to fix it at the end of the
previous merge window, but gave up as I run out of time. I re-spun the
work now after reading this thread. I add you and Russell to cc.

As far as I understand the re-binding problem can be solved with this patch:

https://lists.freedesktop.org/archives/dri-devel/2018-February/166907.html

The device_links are such a new concept that there is at least hope this
change won't have too unwanted side effects.

> > Cheers,
> Peter
> 
> diff --git a/drivers/gpu/drm/drm_bridge.c b/drivers/gpu/drm/drm_bridge.c
> index 1638bfe9627c..b1365cfee445 100644
> --- a/drivers/gpu/drm/drm_bridge.c
> +++ b/drivers/gpu/drm/drm_bridge.c
> @@ -121,12 +121,17 @@ int drm_bridge_attach(struct drm_encoder *encoder, struct drm_bridge *bridge,
>  	if (bridge->dev)
>  		return -EBUSY;
>  
> +	bridge->link = device_link_add(encoder->dev->dev, bridge->owner, 0);
> +	if (!bridge->link)
> +		return -EINVAL;
> +

At least this piece code should prepare for the bridge owner and the
master drm device being the same, I which case the link should not be
needed. This happens at least when a panel is attached using
drm_panel_bridge_add().

Best reagards,
Jyri

>  	bridge->dev = encoder->dev;
>  	bridge->encoder = encoder;
>  
>  	if (bridge->funcs->attach) {
>  		ret = bridge->funcs->attach(bridge);
>  		if (ret < 0) {
> +			device_link_del(bridge->link);
>  			bridge->dev = NULL;
>  			bridge->encoder = NULL;
>  			return ret;
> @@ -153,6 +158,8 @@ void drm_bridge_detach(struct drm_bridge *bridge)
>  	if (bridge->funcs->detach)
>  		bridge->funcs->detach(bridge);
>  
> +	device_link_del(bridge->link);
> +
>  	bridge->dev = NULL;
>  }
>  
> diff --git a/drivers/gpu/drm/i2c/tda998x_drv.c b/drivers/gpu/drm/i2c/tda998x_drv.c
> index b8cb6237a38b..29eba4e9a39d 100644
> --- a/drivers/gpu/drm/i2c/tda998x_drv.c
> +++ b/drivers/gpu/drm/i2c/tda998x_drv.c
> @@ -1857,6 +1857,7 @@ tda998x_probe(struct i2c_client *client, const struct i2c_device_id *id)
>  	bridge->dev = dev;
>  	dev_set_drvdata(dev, bridge);
>  
> +	bridge->bridge.owner = dev;
>  	bridge->bridge.funcs = &tda998x_bridge_funcs;
>  #ifdef CONFIG_OF
>  	bridge->bridge.of_node = dev->of_node;
> diff --git a/include/drm/drm_bridge.h b/include/drm/drm_bridge.h
> index 682d01ba920c..b8f33aba3216 100644
> --- a/include/drm/drm_bridge.h
> +++ b/include/drm/drm_bridge.h
> @@ -224,6 +224,8 @@ struct drm_bridge_funcs {
>  
>  /**
>   * struct drm_bridge - central DRM bridge control structure
> + * @owner: device that owns the bridge
> + * @link: drm consumer <-> bridge supplier
>   * @dev: DRM device this bridge belongs to
>   * @encoder: encoder to which this bridge is connected
>   * @next: the next bridge in the encoder chain
> @@ -233,6 +235,8 @@ struct drm_bridge_funcs {
>   * @driver_private: pointer to the bridge driver's internal context
>   */
>  struct drm_bridge {
> +	struct device *owner;
> +	struct device_link *link;
>  	struct drm_device *dev;
>  	struct drm_encoder *encoder;
>  	struct drm_bridge *next;
> 
>
Russell King (Oracle) April 24, 2018, 5:06 p.m. UTC | #6
On Tue, Apr 24, 2018 at 07:04:16PM +0300, Jyri Sarha wrote:
> On 24/04/18 13:14, Peter Rosin wrote:
> > On 2018-04-24 10:08, Russell King - ARM Linux wrote:
> >> On Tue, Apr 24, 2018 at 08:58:42AM +0200, Peter Rosin wrote:
> >>> On 2018-04-23 18:08, Russell King - ARM Linux wrote:
> >>>> On Mon, Apr 23, 2018 at 09:23:00AM +0200, Peter Rosin wrote:
> >>>>>  static int tda998x_remove(struct i2c_client *client)
> >>>>>  {
> >>>>> -	component_del(&client->dev, &tda998x_ops);
> >>>>> +	struct device *dev = &client->dev;
> >>>>> +	struct tda998x_bridge *bridge = dev_get_drvdata(dev);
> >>>>> +
> >>>>> +	drm_bridge_remove(&bridge->bridge);
> >>>>> +	component_del(dev, &tda998x_ops);
> >>>>> +
> >>>>
> >>>> I'd like to ask a rather fundamental question about DRM bridge support,
> >>>> because I suspect that there's a major fsckup here.
> >>>>
> >>>> The above is the function that deals with the TDA998x device being
> >>>> unbound from the driver.  With the component API, this results in the
> >>>> DRM device correctly being torn down, because one of the hardware
> >>>> devices has gone.
> >>>>
> >>>> With DRM bridge, the bridge is merely removed from the list of
> >>>> bridges:
> >>>>
> >>>> void drm_bridge_remove(struct drm_bridge *bridge)
> >>>> {
> >>>>         mutex_lock(&bridge_lock);
> >>>>         list_del_init(&bridge->list);
> >>>>         mutex_unlock(&bridge_lock);
> >>>> }
> >>>> EXPORT_SYMBOL(drm_bridge_remove);
> >>>>
> >>>> and the memory backing the "struct tda998x_bridge" (which contains
> >>>> the struct drm_bridge) will be freed by the devm subsystem.
> >>>>
> >>>> However, there is no notification into the rest of the DRM subsystem
> >>>> that the device has gone away.  Worse, the memory that is still in
> >>>> use by DRM has now been freed, so further use of the DRM device
> >>>> results in a use-after-free bug.
> >>>>
> >>>> This is really not good, and to me looks like a fundamental problem
> >>>> with the DRM bridge code.  I see nothing in the DRM bridge code that
> >>>> deals with the lifetime of a "DRM bridge" or indeed the lifetime of
> >>>> the actual device itself.
> >>>>
> >>>> So, from what I can see, there seems to be a fundamental lifetime
> >>>> issue with the design of the DRM bridge code.  This needs to be
> >>>> fixed.
> >>>
> >>> Oh crap. A gigantic can of worms...
> >>
> >> Yes, it's especially annoying for me, having put the effort in to
> >> the component helper to cover all these cases.
> >>
> >>> Would a patch (completely untested btw) along this line of thinking make
> >>> any difference whatsoever?
> >>
> >> It looks interesting - from what I can see of the device links code,
> >> it would have the effect of unbinding the DRM device just before
> >> TDA998x is unbound, so that's an improvement.
> >>
> >> However, from what I can see, the link vanishes at that point (as
> >> DL_FLAG_AUTOREMOVE is set), and re-binding the TDA998x device results
> >> in nothing further happening - the link will be recreated, but there
> >> appears to be nothing that triggers the "consumer" to rebind at that
> >> point.  Maybe I've missed something?
> > 
> > Right, auto-remove is a no-go. So, improving on the previous...
> > 
> > (I think drm_panel might suffer from this issue too?)
> 
> Yes it does and I took a shot at trying to fix it at the end of the
> previous merge window, but gave up as I run out of time. I re-spun the
> work now after reading this thread. I add you and Russell to cc.

Right, and these exact problems are what the component helper is
there to sort out, in a subsystem independent way.

What is the problem with the component helper that people seem to
be soo loathed to use it, instead preferring to come up with sub-
standard and broken alternatives?
Jyri Sarha April 24, 2018, 6:25 p.m. UTC | #7
On 24/04/18 20:06, Russell King - ARM Linux wrote:
> On Tue, Apr 24, 2018 at 07:04:16PM +0300, Jyri Sarha wrote:
>> On 24/04/18 13:14, Peter Rosin wrote:
>>> On 2018-04-24 10:08, Russell King - ARM Linux wrote:
>>>> On Tue, Apr 24, 2018 at 08:58:42AM +0200, Peter Rosin wrote:
>>>>> On 2018-04-23 18:08, Russell King - ARM Linux wrote:
>>>>>> On Mon, Apr 23, 2018 at 09:23:00AM +0200, Peter Rosin wrote:
>>>>>>>  static int tda998x_remove(struct i2c_client *client)
>>>>>>>  {
>>>>>>> -	component_del(&client->dev, &tda998x_ops);
>>>>>>> +	struct device *dev = &client->dev;
>>>>>>> +	struct tda998x_bridge *bridge = dev_get_drvdata(dev);
>>>>>>> +
>>>>>>> +	drm_bridge_remove(&bridge->bridge);
>>>>>>> +	component_del(dev, &tda998x_ops);
>>>>>>> +
>>>>>>
>>>>>> I'd like to ask a rather fundamental question about DRM bridge support,
>>>>>> because I suspect that there's a major fsckup here.
>>>>>>
>>>>>> The above is the function that deals with the TDA998x device being
>>>>>> unbound from the driver.  With the component API, this results in the
>>>>>> DRM device correctly being torn down, because one of the hardware
>>>>>> devices has gone.
>>>>>>
>>>>>> With DRM bridge, the bridge is merely removed from the list of
>>>>>> bridges:
>>>>>>
>>>>>> void drm_bridge_remove(struct drm_bridge *bridge)
>>>>>> {
>>>>>>         mutex_lock(&bridge_lock);
>>>>>>         list_del_init(&bridge->list);
>>>>>>         mutex_unlock(&bridge_lock);
>>>>>> }
>>>>>> EXPORT_SYMBOL(drm_bridge_remove);
>>>>>>
>>>>>> and the memory backing the "struct tda998x_bridge" (which contains
>>>>>> the struct drm_bridge) will be freed by the devm subsystem.
>>>>>>
>>>>>> However, there is no notification into the rest of the DRM subsystem
>>>>>> that the device has gone away.  Worse, the memory that is still in
>>>>>> use by DRM has now been freed, so further use of the DRM device
>>>>>> results in a use-after-free bug.
>>>>>>
>>>>>> This is really not good, and to me looks like a fundamental problem
>>>>>> with the DRM bridge code.  I see nothing in the DRM bridge code that
>>>>>> deals with the lifetime of a "DRM bridge" or indeed the lifetime of
>>>>>> the actual device itself.
>>>>>>
>>>>>> So, from what I can see, there seems to be a fundamental lifetime
>>>>>> issue with the design of the DRM bridge code.  This needs to be
>>>>>> fixed.
>>>>>
>>>>> Oh crap. A gigantic can of worms...
>>>>
>>>> Yes, it's especially annoying for me, having put the effort in to
>>>> the component helper to cover all these cases.
>>>>
>>>>> Would a patch (completely untested btw) along this line of thinking make
>>>>> any difference whatsoever?
>>>>
>>>> It looks interesting - from what I can see of the device links code,
>>>> it would have the effect of unbinding the DRM device just before
>>>> TDA998x is unbound, so that's an improvement.
>>>>
>>>> However, from what I can see, the link vanishes at that point (as
>>>> DL_FLAG_AUTOREMOVE is set), and re-binding the TDA998x device results
>>>> in nothing further happening - the link will be recreated, but there
>>>> appears to be nothing that triggers the "consumer" to rebind at that
>>>> point.  Maybe I've missed something?
>>>
>>> Right, auto-remove is a no-go. So, improving on the previous...
>>>
>>> (I think drm_panel might suffer from this issue too?)
>>
>> Yes it does and I took a shot at trying to fix it at the end of the
>> previous merge window, but gave up as I run out of time. I re-spun the
>> work now after reading this thread. I add you and Russell to cc.
> 
> Right, and these exact problems are what the component helper is
> there to sort out, in a subsystem independent way.
> 
> What is the problem with the component helper that people seem to
> be soo loathed to use it, instead preferring to come up with sub-
> standard and broken alternatives?
> 

Nothing but time. Embedding component helpers seamlessly into drm
framework does not sound like a couple of days job. Right now I simply
do not have time to take on a challenge like that. If someone does it I
am all for it.

However, I would not call device links substandard. They are in the
device core after all.

Best regards,
Jyri
Russell King (Oracle) April 24, 2018, 11:25 p.m. UTC | #8
On Tue, Apr 24, 2018 at 09:25:44PM +0300, Jyri Sarha wrote:
> On 24/04/18 20:06, Russell King - ARM Linux wrote:
> > On Tue, Apr 24, 2018 at 07:04:16PM +0300, Jyri Sarha wrote:
> >> On 24/04/18 13:14, Peter Rosin wrote:
> >>> On 2018-04-24 10:08, Russell King - ARM Linux wrote:
> >>>> On Tue, Apr 24, 2018 at 08:58:42AM +0200, Peter Rosin wrote:
> >>>>> On 2018-04-23 18:08, Russell King - ARM Linux wrote:
> >>>>>> On Mon, Apr 23, 2018 at 09:23:00AM +0200, Peter Rosin wrote:
> >>>>>>>  static int tda998x_remove(struct i2c_client *client)
> >>>>>>>  {
> >>>>>>> -	component_del(&client->dev, &tda998x_ops);
> >>>>>>> +	struct device *dev = &client->dev;
> >>>>>>> +	struct tda998x_bridge *bridge = dev_get_drvdata(dev);
> >>>>>>> +
> >>>>>>> +	drm_bridge_remove(&bridge->bridge);
> >>>>>>> +	component_del(dev, &tda998x_ops);
> >>>>>>> +
> >>>>>>
> >>>>>> I'd like to ask a rather fundamental question about DRM bridge support,
> >>>>>> because I suspect that there's a major fsckup here.
> >>>>>>
> >>>>>> The above is the function that deals with the TDA998x device being
> >>>>>> unbound from the driver.  With the component API, this results in the
> >>>>>> DRM device correctly being torn down, because one of the hardware
> >>>>>> devices has gone.
> >>>>>>
> >>>>>> With DRM bridge, the bridge is merely removed from the list of
> >>>>>> bridges:
> >>>>>>
> >>>>>> void drm_bridge_remove(struct drm_bridge *bridge)
> >>>>>> {
> >>>>>>         mutex_lock(&bridge_lock);
> >>>>>>         list_del_init(&bridge->list);
> >>>>>>         mutex_unlock(&bridge_lock);
> >>>>>> }
> >>>>>> EXPORT_SYMBOL(drm_bridge_remove);
> >>>>>>
> >>>>>> and the memory backing the "struct tda998x_bridge" (which contains
> >>>>>> the struct drm_bridge) will be freed by the devm subsystem.
> >>>>>>
> >>>>>> However, there is no notification into the rest of the DRM subsystem
> >>>>>> that the device has gone away.  Worse, the memory that is still in
> >>>>>> use by DRM has now been freed, so further use of the DRM device
> >>>>>> results in a use-after-free bug.
> >>>>>>
> >>>>>> This is really not good, and to me looks like a fundamental problem
> >>>>>> with the DRM bridge code.  I see nothing in the DRM bridge code that
> >>>>>> deals with the lifetime of a "DRM bridge" or indeed the lifetime of
> >>>>>> the actual device itself.
> >>>>>>
> >>>>>> So, from what I can see, there seems to be a fundamental lifetime
> >>>>>> issue with the design of the DRM bridge code.  This needs to be
> >>>>>> fixed.
> >>>>>
> >>>>> Oh crap. A gigantic can of worms...
> >>>>
> >>>> Yes, it's especially annoying for me, having put the effort in to
> >>>> the component helper to cover all these cases.
> >>>>
> >>>>> Would a patch (completely untested btw) along this line of thinking make
> >>>>> any difference whatsoever?
> >>>>
> >>>> It looks interesting - from what I can see of the device links code,
> >>>> it would have the effect of unbinding the DRM device just before
> >>>> TDA998x is unbound, so that's an improvement.
> >>>>
> >>>> However, from what I can see, the link vanishes at that point (as
> >>>> DL_FLAG_AUTOREMOVE is set), and re-binding the TDA998x device results
> >>>> in nothing further happening - the link will be recreated, but there
> >>>> appears to be nothing that triggers the "consumer" to rebind at that
> >>>> point.  Maybe I've missed something?
> >>>
> >>> Right, auto-remove is a no-go. So, improving on the previous...
> >>>
> >>> (I think drm_panel might suffer from this issue too?)
> >>
> >> Yes it does and I took a shot at trying to fix it at the end of the
> >> previous merge window, but gave up as I run out of time. I re-spun the
> >> work now after reading this thread. I add you and Russell to cc.
> > 
> > Right, and these exact problems are what the component helper is
> > there to sort out, in a subsystem independent way.
> > 
> > What is the problem with the component helper that people seem to
> > be soo loathed to use it, instead preferring to come up with sub-
> > standard and broken alternatives?
> > 
> 
> Nothing but time. Embedding component helpers seamlessly into drm
> framework does not sound like a couple of days job. Right now I simply
> do not have time to take on a challenge like that. If someone does it I
> am all for it.
> 
> However, I would not call device links substandard. They are in the
> device core after all.

Umm, no, I was not talking about the device links, but the tendency to
have subsystem or component specific solutions to this problem.

We're now heading down the path of trying to retrofit the functionality
that one expects and is provided by the component helpers into DRM
bridge and DRM panel by using device links, which appears to only
partially resolve the problem.

On the point of device links, I've been wondering whether the component
helpers could take advantage of the device links, but at the moment
that would cause a regression if there's no facility to re-probe the
"consumer" when a "supplier" returns.

As you bring that up, I would say that they _are_ a substandard way to
solve this problem _if_, as I suspect, they would cause a regression
to the component helper if the component helper were to use them as a
means to control the probing of the "master" device.  This is not a
matter of which part of the kernel they are "in", but the functionality
that they can offer vs what would be expected.
Jyri Sarha April 25, 2018, 8:01 p.m. UTC | #9
On 25/04/18 02:25, Russell King - ARM Linux wrote:
> On Tue, Apr 24, 2018 at 09:25:44PM +0300, Jyri Sarha wrote:
>> On 24/04/18 20:06, Russell King - ARM Linux wrote:
>>> On Tue, Apr 24, 2018 at 07:04:16PM +0300, Jyri Sarha wrote:
>>>> On 24/04/18 13:14, Peter Rosin wrote:
>>>>> On 2018-04-24 10:08, Russell King - ARM Linux wrote:
>>>>>> On Tue, Apr 24, 2018 at 08:58:42AM +0200, Peter Rosin wrote:
>>>>>>> On 2018-04-23 18:08, Russell King - ARM Linux wrote:
>>>>>>>> On Mon, Apr 23, 2018 at 09:23:00AM +0200, Peter Rosin wrote:
>>>>>>>>>  static int tda998x_remove(struct i2c_client *client)
>>>>>>>>>  {
>>>>>>>>> -	component_del(&client->dev, &tda998x_ops);
>>>>>>>>> +	struct device *dev = &client->dev;
>>>>>>>>> +	struct tda998x_bridge *bridge = dev_get_drvdata(dev);
>>>>>>>>> +
>>>>>>>>> +	drm_bridge_remove(&bridge->bridge);
>>>>>>>>> +	component_del(dev, &tda998x_ops);
>>>>>>>>> +
>>>>>>>>
>>>>>>>> I'd like to ask a rather fundamental question about DRM bridge support,
>>>>>>>> because I suspect that there's a major fsckup here.
>>>>>>>>
>>>>>>>> The above is the function that deals with the TDA998x device being
>>>>>>>> unbound from the driver.  With the component API, this results in the
>>>>>>>> DRM device correctly being torn down, because one of the hardware
>>>>>>>> devices has gone.
>>>>>>>>
>>>>>>>> With DRM bridge, the bridge is merely removed from the list of
>>>>>>>> bridges:
>>>>>>>>
>>>>>>>> void drm_bridge_remove(struct drm_bridge *bridge)
>>>>>>>> {
>>>>>>>>         mutex_lock(&bridge_lock);
>>>>>>>>         list_del_init(&bridge->list);
>>>>>>>>         mutex_unlock(&bridge_lock);
>>>>>>>> }
>>>>>>>> EXPORT_SYMBOL(drm_bridge_remove);
>>>>>>>>
>>>>>>>> and the memory backing the "struct tda998x_bridge" (which contains
>>>>>>>> the struct drm_bridge) will be freed by the devm subsystem.
>>>>>>>>
>>>>>>>> However, there is no notification into the rest of the DRM subsystem
>>>>>>>> that the device has gone away.  Worse, the memory that is still in
>>>>>>>> use by DRM has now been freed, so further use of the DRM device
>>>>>>>> results in a use-after-free bug.
>>>>>>>>
>>>>>>>> This is really not good, and to me looks like a fundamental problem
>>>>>>>> with the DRM bridge code.  I see nothing in the DRM bridge code that
>>>>>>>> deals with the lifetime of a "DRM bridge" or indeed the lifetime of
>>>>>>>> the actual device itself.
>>>>>>>>
>>>>>>>> So, from what I can see, there seems to be a fundamental lifetime
>>>>>>>> issue with the design of the DRM bridge code.  This needs to be
>>>>>>>> fixed.
>>>>>>>
>>>>>>> Oh crap. A gigantic can of worms...
>>>>>>
>>>>>> Yes, it's especially annoying for me, having put the effort in to
>>>>>> the component helper to cover all these cases.
>>>>>>
>>>>>>> Would a patch (completely untested btw) along this line of thinking make
>>>>>>> any difference whatsoever?
>>>>>>
>>>>>> It looks interesting - from what I can see of the device links code,
>>>>>> it would have the effect of unbinding the DRM device just before
>>>>>> TDA998x is unbound, so that's an improvement.
>>>>>>
>>>>>> However, from what I can see, the link vanishes at that point (as
>>>>>> DL_FLAG_AUTOREMOVE is set), and re-binding the TDA998x device results
>>>>>> in nothing further happening - the link will be recreated, but there
>>>>>> appears to be nothing that triggers the "consumer" to rebind at that
>>>>>> point.  Maybe I've missed something?
>>>>>
>>>>> Right, auto-remove is a no-go. So, improving on the previous...
>>>>>
>>>>> (I think drm_panel might suffer from this issue too?)
>>>>
>>>> Yes it does and I took a shot at trying to fix it at the end of the
>>>> previous merge window, but gave up as I run out of time. I re-spun the
>>>> work now after reading this thread. I add you and Russell to cc.
>>>
>>> Right, and these exact problems are what the component helper is
>>> there to sort out, in a subsystem independent way.
>>>
>>> What is the problem with the component helper that people seem to
>>> be soo loathed to use it, instead preferring to come up with sub-
>>> standard and broken alternatives?
>>>
>>
>> Nothing but time. Embedding component helpers seamlessly into drm
>> framework does not sound like a couple of days job. Right now I simply
>> do not have time to take on a challenge like that. If someone does it I
>> am all for it.
>>
>> However, I would not call device links substandard. They are in the
>> device core after all.
> 
> Umm, no, I was not talking about the device links, but the tendency to
> have subsystem or component specific solutions to this problem.
> 

Oh yes. But in this case the substandard solution is already there and
it is already widely used, despite it being severely broken. I am merely
trying to fix the existing substandard solution.

I admit that a full integration with component helpers would probably be
more elegant solution to the problem, but the amount of work is just too
much. The change would impact the way all the master drm drivers pull
them selves together. The drivers that already use the component helpers
for some internal stuff will add their own challenge. Separate component
matching implementations are needed for device-tree and ACPI (are ther
more flavors?) etc. I just do not see this happening any time soon (am
happy to be wrong about this).

> We're now heading down the path of trying to retrofit the functionality
> that one expects and is provided by the component helpers into DRM
> bridge and DRM panel by using device links, which appears to only
> partially resolve the problem.
> 
> On the point of device links, I've been wondering whether the component
> helpers could take advantage of the device links, but at the moment
> that would cause a regression if there's no facility to re-probe the
> "consumer" when a "supplier" returns.
> 

I sent a patch[1] in February that solves the re-probing problem at
least in my environment.

> As you bring that up, I would say that they _are_ a substandard way to
> solve this problem _if_, as I suspect, they would cause a regression
> to the component helper if the component helper were to use them as a
> means to control the probing of the "master" device.  This is not a
> matter of which part of the kernel they are "in", but the functionality
> that they can offer vs what would be expected.
> 

The component helpers certainly have many features in that are not
(yet?) in device core's device links, but the amount of work needed to
fix the most pressing problem of attached drm panels or bridges is
couple of magnitudes less with device links when compared to full blown
component helper usage.

Best regards,
Jyri

[1]
https://lists.freedesktop.org/archives/dri-devel/2018-February/166907.html
Russell King (Oracle) July 6, 2018, 10:03 a.m. UTC | #10
On Wed, Apr 25, 2018 at 11:01:15PM +0300, Jyri Sarha wrote:
> On 25/04/18 02:25, Russell King - ARM Linux wrote:
> > On Tue, Apr 24, 2018 at 09:25:44PM +0300, Jyri Sarha wrote:
> >> On 24/04/18 20:06, Russell King - ARM Linux wrote:
> >>> On Tue, Apr 24, 2018 at 07:04:16PM +0300, Jyri Sarha wrote:
> >>>> On 24/04/18 13:14, Peter Rosin wrote:
> >>>>> On 2018-04-24 10:08, Russell King - ARM Linux wrote:
> >>>>>> On Tue, Apr 24, 2018 at 08:58:42AM +0200, Peter Rosin wrote:
> >>>>>>> On 2018-04-23 18:08, Russell King - ARM Linux wrote:
> >>>>>>>> On Mon, Apr 23, 2018 at 09:23:00AM +0200, Peter Rosin wrote:
> >>>>>>>>>  static int tda998x_remove(struct i2c_client *client)
> >>>>>>>>>  {
> >>>>>>>>> -	component_del(&client->dev, &tda998x_ops);
> >>>>>>>>> +	struct device *dev = &client->dev;
> >>>>>>>>> +	struct tda998x_bridge *bridge = dev_get_drvdata(dev);
> >>>>>>>>> +
> >>>>>>>>> +	drm_bridge_remove(&bridge->bridge);
> >>>>>>>>> +	component_del(dev, &tda998x_ops);
> >>>>>>>>> +
> >>>>>>>>
> >>>>>>>> I'd like to ask a rather fundamental question about DRM bridge support,
> >>>>>>>> because I suspect that there's a major fsckup here.
> >>>>>>>>
> >>>>>>>> The above is the function that deals with the TDA998x device being
> >>>>>>>> unbound from the driver.  With the component API, this results in the
> >>>>>>>> DRM device correctly being torn down, because one of the hardware
> >>>>>>>> devices has gone.
> >>>>>>>>
> >>>>>>>> With DRM bridge, the bridge is merely removed from the list of
> >>>>>>>> bridges:
> >>>>>>>>
> >>>>>>>> void drm_bridge_remove(struct drm_bridge *bridge)
> >>>>>>>> {
> >>>>>>>>         mutex_lock(&bridge_lock);
> >>>>>>>>         list_del_init(&bridge->list);
> >>>>>>>>         mutex_unlock(&bridge_lock);
> >>>>>>>> }
> >>>>>>>> EXPORT_SYMBOL(drm_bridge_remove);
> >>>>>>>>
> >>>>>>>> and the memory backing the "struct tda998x_bridge" (which contains
> >>>>>>>> the struct drm_bridge) will be freed by the devm subsystem.
> >>>>>>>>
> >>>>>>>> However, there is no notification into the rest of the DRM subsystem
> >>>>>>>> that the device has gone away.  Worse, the memory that is still in
> >>>>>>>> use by DRM has now been freed, so further use of the DRM device
> >>>>>>>> results in a use-after-free bug.
> >>>>>>>>
> >>>>>>>> This is really not good, and to me looks like a fundamental problem
> >>>>>>>> with the DRM bridge code.  I see nothing in the DRM bridge code that
> >>>>>>>> deals with the lifetime of a "DRM bridge" or indeed the lifetime of
> >>>>>>>> the actual device itself.
> >>>>>>>>
> >>>>>>>> So, from what I can see, there seems to be a fundamental lifetime
> >>>>>>>> issue with the design of the DRM bridge code.  This needs to be
> >>>>>>>> fixed.
> >>>>>>>
> >>>>>>> Oh crap. A gigantic can of worms...
> >>>>>>
> >>>>>> Yes, it's especially annoying for me, having put the effort in to
> >>>>>> the component helper to cover all these cases.
> >>>>>>
> >>>>>>> Would a patch (completely untested btw) along this line of thinking make
> >>>>>>> any difference whatsoever?
> >>>>>>
> >>>>>> It looks interesting - from what I can see of the device links code,
> >>>>>> it would have the effect of unbinding the DRM device just before
> >>>>>> TDA998x is unbound, so that's an improvement.
> >>>>>>
> >>>>>> However, from what I can see, the link vanishes at that point (as
> >>>>>> DL_FLAG_AUTOREMOVE is set), and re-binding the TDA998x device results
> >>>>>> in nothing further happening - the link will be recreated, but there
> >>>>>> appears to be nothing that triggers the "consumer" to rebind at that
> >>>>>> point.  Maybe I've missed something?
> >>>>>
> >>>>> Right, auto-remove is a no-go. So, improving on the previous...
> >>>>>
> >>>>> (I think drm_panel might suffer from this issue too?)
> >>>>
> >>>> Yes it does and I took a shot at trying to fix it at the end of the
> >>>> previous merge window, but gave up as I run out of time. I re-spun the
> >>>> work now after reading this thread. I add you and Russell to cc.
> >>>
> >>> Right, and these exact problems are what the component helper is
> >>> there to sort out, in a subsystem independent way.
> >>>
> >>> What is the problem with the component helper that people seem to
> >>> be soo loathed to use it, instead preferring to come up with sub-
> >>> standard and broken alternatives?
> >>>
> >>
> >> Nothing but time. Embedding component helpers seamlessly into drm
> >> framework does not sound like a couple of days job. Right now I simply
> >> do not have time to take on a challenge like that. If someone does it I
> >> am all for it.
> >>
> >> However, I would not call device links substandard. They are in the
> >> device core after all.
> > 
> > Umm, no, I was not talking about the device links, but the tendency to
> > have subsystem or component specific solutions to this problem.
> > 
> 
> Oh yes. But in this case the substandard solution is already there and
> it is already widely used, despite it being severely broken. I am merely
> trying to fix the existing substandard solution.
> 
> I admit that a full integration with component helpers would probably be
> more elegant solution to the problem, but the amount of work is just too
> much. The change would impact the way all the master drm drivers pull
> them selves together. The drivers that already use the component helpers
> for some internal stuff will add their own challenge. Separate component
> matching implementations are needed for device-tree and ACPI (are ther
> more flavors?) etc. I just do not see this happening any time soon (am
> happy to be wrong about this).

The issue is actually worse than that:

- drivers that are already componentised can't use bridges
- drivers that use bridges can't use componentised stuff

because bridges don't register themselves with the component helper,
and the helpers in drm_of.c assume that all graph nodes will be
components.

The whole thing about whether stuff is componentised or bridge based
is really getting out of hand, and the push is towards bridge based
stuff even though that is technically inferior when it comes to being
able to develop and test (which involves being able to remove and
re-insert modules.)

Consequently more and more code is being written for bridges, and
the component helper ignored, and the problems with bridges are
being ignored.  This is not healthy.

The problem is only going to get worse.  Someone needs to bite the
bullet and fix bridges before the problem gets any more out of hand.
Russell King (Oracle) July 17, 2018, 3:57 p.m. UTC | #11
Ping - any comments from anyone on this idea?

On Fri, Jul 06, 2018 at 01:43:05PM +0100, Russell King - ARM Linux wrote:
> On Fri, Jul 06, 2018 at 11:03:46AM +0100, Russell King - ARM Linux wrote:
> > On Wed, Apr 25, 2018 at 11:01:15PM +0300, Jyri Sarha wrote:
> > > Oh yes. But in this case the substandard solution is already there and
> > > it is already widely used, despite it being severely broken. I am merely
> > > trying to fix the existing substandard solution.
> > > 
> > > I admit that a full integration with component helpers would probably be
> > > more elegant solution to the problem, but the amount of work is just too
> > > much. The change would impact the way all the master drm drivers pull
> > > them selves together. The drivers that already use the component helpers
> > > for some internal stuff will add their own challenge. Separate component
> > > matching implementations are needed for device-tree and ACPI (are ther
> > > more flavors?) etc. I just do not see this happening any time soon (am
> > > happy to be wrong about this).
> > 
> > The issue is actually worse than that:
> > 
> > - drivers that are already componentised can't use bridges
> > - drivers that use bridges can't use componentised stuff
> > 
> > because bridges don't register themselves with the component helper,
> > and the helpers in drm_of.c assume that all graph nodes will be
> > components.
> > 
> > The whole thing about whether stuff is componentised or bridge based
> > is really getting out of hand, and the push is towards bridge based
> > stuff even though that is technically inferior when it comes to being
> > able to develop and test (which involves being able to remove and
> > re-insert modules.)
> > 
> > Consequently more and more code is being written for bridges, and
> > the component helper ignored, and the problems with bridges are
> > being ignored.  This is not healthy.
> > 
> > The problem is only going to get worse.  Someone needs to bite the
> > bullet and fix bridges before the problem gets any more out of hand.
> 
> This patch (which is actually two patches locally) allows the component
> helper to know what's going on inside the bridge code wrt bridge
> availability, and takes the appropriate action at the correct time.
> No need for device links or similar, or incompatibilities between
> bridges and components.  The only requirement is that bridges set the
> "device" member of struct drm_bridge to opt-in to this.
> 
> Tested with Armada converted to support bridges, TDA998x as a
> componentised bridge, and dumb-vga-dac as a non-componentised bridge:
> 
> root@cubox:~# less /sys/kernel/debug/device_component/display-subsystem
> master name                                            status
> -------------------------------------------------------------
> display-subsystem                                       bound
> 
> device name                                            status
> -------------------------------------------------------------
> port                                               registered
> port                                               registered
> hdmi-encoder                                       registered
> vga-bridge                                         registered
> root@cubox:~# dmesg |grep bound
> [    1.921798] armada-drm display-subsystem: bound f1820000.lcd-controller (ops
> armada_lcd_ops)
> [    1.931014] armada-drm display-subsystem: bound f1810000.lcd-controller (ops
> armada_lcd_ops)
> [    2.069231] armada-drm display-subsystem: bound 1-0070 (ops tda998x_ops)
> [    2.076059] armada-drm display-subsystem: bound vga-bridge (ops dummy_ops)
> 
> Without this, the same DT fails because "vga-bridge" is never added
> to the component helpers.
> 
> diff --git a/drivers/base/component.c b/drivers/base/component.c
> index 8946dfee4768..b14b3a3655ea 100644
> --- a/drivers/base/component.c
> +++ b/drivers/base/component.c
> @@ -602,4 +602,32 @@ void component_del(struct device *dev, const struct component_ops *ops)
>  }
>  EXPORT_SYMBOL_GPL(component_del);
>  
> +static int component_dummy_bind(struct device *comp, struct device *master,
> +				void *master_data)
> +{
> +	return 0;
> +}
> +
> +static void component_dummy_unbind(struct device *comp, struct device *master,
> +				   void *master_data)
> +{
> +}
> +
> +static const struct component_ops dummy_ops = {
> +	.bind = component_dummy_bind,
> +	.unbind = component_dummy_unbind,
> +};
> +
> +int component_mark_available(struct device *dev)
> +{
> +	return component_add(dev, &dummy_ops);
> +}
> +EXPORT_SYMBOL_GPL(component_mark_available);
> +
> +void component_mark_unavailable(struct device *dev)
> +{
> +	component_del(dev, &dummy_ops);
> +}
> +EXPORT_SYMBOL_GPL(component_mark_unavailable);
> +
>  MODULE_LICENSE("GPL v2");
> diff --git a/drivers/gpu/drm/drm_bridge.c b/drivers/gpu/drm/drm_bridge.c
> index 1638bfe9627c..ce3ccd327916 100644
> --- a/drivers/gpu/drm/drm_bridge.c
> +++ b/drivers/gpu/drm/drm_bridge.c
> @@ -21,6 +21,7 @@
>   * DEALINGS IN THE SOFTWARE.
>   */
>  
> +#include <linux/component.h>
>  #include <linux/err.h>
>  #include <linux/module.h>
>  #include <linux/mutex.h>
> @@ -73,6 +74,9 @@ void drm_bridge_add(struct drm_bridge *bridge)
>  	mutex_lock(&bridge_lock);
>  	list_add_tail(&bridge->list, &bridge_list);
>  	mutex_unlock(&bridge_lock);
> +
> +	if (bridge->device)
> +		WARN_ON(component_mark_available(bridge->device));
>  }
>  EXPORT_SYMBOL(drm_bridge_add);
>  
> @@ -83,6 +87,9 @@ EXPORT_SYMBOL(drm_bridge_add);
>   */
>  void drm_bridge_remove(struct drm_bridge *bridge)
>  {
> +	if (bridge->device)
> +		component_mark_unavailable(bridge->device);
> +
>  	mutex_lock(&bridge_lock);
>  	list_del_init(&bridge->list);
>  	mutex_unlock(&bridge_lock);
> diff --git a/include/drm/drm_bridge.h b/include/drm/drm_bridge.h
> index 3270fec46979..e863da14d4d9 100644
> --- a/include/drm/drm_bridge.h
> +++ b/include/drm/drm_bridge.h
> @@ -268,6 +268,7 @@ struct drm_bridge {
>  	struct drm_device *dev;
>  	struct drm_encoder *encoder;
>  	struct drm_bridge *next;
> +	struct device *device;
>  #ifdef CONFIG_OF
>  	struct device_node *of_node;
>  #endif
> diff --git a/include/linux/component.h b/include/linux/component.h
> index e71fbbbc74e2..a1c824485f54 100644
> --- a/include/linux/component.h
> +++ b/include/linux/component.h
> @@ -16,6 +16,10 @@ struct component_ops {
>  int component_add(struct device *, const struct component_ops *);
>  void component_del(struct device *, const struct component_ops *);
>  
> +/* For subsystems where drivers do not call component_add()/component_del() */
> +int component_mark_available(struct device *dev);
> +void component_mark_unavailable(struct device *dev);
> +
>  int component_bind_all(struct device *master, void *master_data);
>  void component_unbind_all(struct device *master, void *master_data);
>  
> 
> 
> -- 
> RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
> FTTC broadband for 0.8mile line in suburbia: sync at 13.8Mbps down 630kbps up
> According to speedtest.net: 13Mbps down 490kbps up
Peter Rosin Aug. 28, 2018, 5:49 p.m. UTC | #12
On 2018-07-06 14:43, Russell King - ARM Linux wrote:
> On Fri, Jul 06, 2018 at 11:03:46AM +0100, Russell King - ARM Linux wrote:
>> On Wed, Apr 25, 2018 at 11:01:15PM +0300, Jyri Sarha wrote:
>>> Oh yes. But in this case the substandard solution is already there and
>>> it is already widely used, despite it being severely broken. I am merely
>>> trying to fix the existing substandard solution.
>>>
>>> I admit that a full integration with component helpers would probably be
>>> more elegant solution to the problem, but the amount of work is just too
>>> much. The change would impact the way all the master drm drivers pull
>>> them selves together. The drivers that already use the component helpers
>>> for some internal stuff will add their own challenge. Separate component
>>> matching implementations are needed for device-tree and ACPI (are ther
>>> more flavors?) etc. I just do not see this happening any time soon (am
>>> happy to be wrong about this).
>>
>> The issue is actually worse than that:
>>
>> - drivers that are already componentised can't use bridges
>> - drivers that use bridges can't use componentised stuff
>>
>> because bridges don't register themselves with the component helper,
>> and the helpers in drm_of.c assume that all graph nodes will be
>> components.
>>
>> The whole thing about whether stuff is componentised or bridge based
>> is really getting out of hand, and the push is towards bridge based
>> stuff even though that is technically inferior when it comes to being
>> able to develop and test (which involves being able to remove and
>> re-insert modules.)
>>
>> Consequently more and more code is being written for bridges, and
>> the component helper ignored, and the problems with bridges are
>> being ignored.  This is not healthy.
>>
>> The problem is only going to get worse.  Someone needs to bite the
>> bullet and fix bridges before the problem gets any more out of hand.
> 
> This patch (which is actually two patches locally) allows the component
> helper to know what's going on inside the bridge code wrt bridge
> availability, and takes the appropriate action at the correct time.
> No need for device links or similar, or incompatibilities between
> bridges and components.  The only requirement is that bridges set the
> "device" member of struct drm_bridge to opt-in to this.
> 
> Tested with Armada converted to support bridges, TDA998x as a
> componentised bridge, and dumb-vga-dac as a non-componentised bridge:
> 
> root@cubox:~# less /sys/kernel/debug/device_component/display-subsystem
> master name                                            status
> -------------------------------------------------------------
> display-subsystem                                       bound
> 
> device name                                            status
> -------------------------------------------------------------
> port                                               registered
> port                                               registered
> hdmi-encoder                                       registered
> vga-bridge                                         registered
> root@cubox:~# dmesg |grep bound
> [    1.921798] armada-drm display-subsystem: bound f1820000.lcd-controller (ops
> armada_lcd_ops)
> [    1.931014] armada-drm display-subsystem: bound f1810000.lcd-controller (ops
> armada_lcd_ops)
> [    2.069231] armada-drm display-subsystem: bound 1-0070 (ops tda998x_ops)
> [    2.076059] armada-drm display-subsystem: bound vga-bridge (ops dummy_ops)
> 
> Without this, the same DT fails because "vga-bridge" is never added
> to the component helpers.

What did you need to do to convert Armada to support bridges? How much
work is it to convert drivers that support bridges so that they
support components? Maybe that's not needed? What happens with tda998x?
I mean, it already calls component_add, and with this there's an
indirect call in drm_bridge_add which it also calls. I guess I'm asking
if a component may call component_add several times without things
sliding sideways?

> 
> diff --git a/drivers/base/component.c b/drivers/base/component.c
> index 8946dfee4768..b14b3a3655ea 100644
> --- a/drivers/base/component.c
> +++ b/drivers/base/component.c
> @@ -602,4 +602,32 @@ void component_del(struct device *dev, const struct component_ops *ops)
>  }
>  EXPORT_SYMBOL_GPL(component_del);
>  
> +static int component_dummy_bind(struct device *comp, struct device *master,
> +				void *master_data)
> +{
> +	return 0;
> +}
> +
> +static void component_dummy_unbind(struct device *comp, struct device *master,
> +				   void *master_data)
> +{
> +}
> +
> +static const struct component_ops dummy_ops = {
> +	.bind = component_dummy_bind,
> +	.unbind = component_dummy_unbind,
> +};
> +
> +int component_mark_available(struct device *dev)
> +{
> +	return component_add(dev, &dummy_ops);
> +}
> +EXPORT_SYMBOL_GPL(component_mark_available);
> +
> +void component_mark_unavailable(struct device *dev)
> +{
> +	component_del(dev, &dummy_ops);
> +}
> +EXPORT_SYMBOL_GPL(component_mark_unavailable);
> +

Is this really needed in component.c? I'd say that these dummy
bridge_component_bind/unbind can be added directly in drm_bridge.c
and that the new call to component_mark_available in drm_bridge
could simply be component_add(bridge->device, &bridge_component_ops)
(etc)

>  MODULE_LICENSE("GPL v2");
> diff --git a/drivers/gpu/drm/drm_bridge.c b/drivers/gpu/drm/drm_bridge.c
> index 1638bfe9627c..ce3ccd327916 100644
> --- a/drivers/gpu/drm/drm_bridge.c
> +++ b/drivers/gpu/drm/drm_bridge.c
> @@ -21,6 +21,7 @@
>   * DEALINGS IN THE SOFTWARE.
>   */
>  
> +#include <linux/component.h>
>  #include <linux/err.h>
>  #include <linux/module.h>
>  #include <linux/mutex.h>
> @@ -73,6 +74,9 @@ void drm_bridge_add(struct drm_bridge *bridge)
>  	mutex_lock(&bridge_lock);
>  	list_add_tail(&bridge->list, &bridge_list);
>  	mutex_unlock(&bridge_lock);
> +
> +	if (bridge->device)
> +		WARN_ON(component_mark_available(bridge->device));
>  }
>  EXPORT_SYMBOL(drm_bridge_add);
>  
> @@ -83,6 +87,9 @@ EXPORT_SYMBOL(drm_bridge_add);
>   */
>  void drm_bridge_remove(struct drm_bridge *bridge)
>  {
> +	if (bridge->device)
> +		component_mark_unavailable(bridge->device);
> +
>  	mutex_lock(&bridge_lock);
>  	list_del_init(&bridge->list);
>  	mutex_unlock(&bridge_lock);
> diff --git a/include/drm/drm_bridge.h b/include/drm/drm_bridge.h
> index 3270fec46979..e863da14d4d9 100644
> --- a/include/drm/drm_bridge.h
> +++ b/include/drm/drm_bridge.h
> @@ -268,6 +268,7 @@ struct drm_bridge {
>  	struct drm_device *dev;
>  	struct drm_encoder *encoder;
>  	struct drm_bridge *next;
> +	struct device *device;

In patch [1] i add struct device *odev (for owner device) and the series
then proceeds to convert all bridges to add a link to its owner device
and to then remove the (below) of_node member.

[1] https://lkml.org/lkml/2018/5/16/382

Would it be bad if all bridges opted in to this? In other words, could
my "odev" and your "device" be shared?

Cheers,
Peter

>  #ifdef CONFIG_OF
>  	struct device_node *of_node;
>  #endif
> diff --git a/include/linux/component.h b/include/linux/component.h
> index e71fbbbc74e2..a1c824485f54 100644
> --- a/include/linux/component.h
> +++ b/include/linux/component.h
> @@ -16,6 +16,10 @@ struct component_ops {
>  int component_add(struct device *, const struct component_ops *);
>  void component_del(struct device *, const struct component_ops *);
>  
> +/* For subsystems where drivers do not call component_add()/component_del() */
> +int component_mark_available(struct device *dev);
> +void component_mark_unavailable(struct device *dev);
> +
>  int component_bind_all(struct device *master, void *master_data);
>  void component_unbind_all(struct device *master, void *master_data);
>  
> 
>
Russell King (Oracle) Aug. 28, 2018, 6:14 p.m. UTC | #13
On Tue, Aug 28, 2018 at 07:49:28PM +0200, Peter Rosin wrote:
> On 2018-07-06 14:43, Russell King - ARM Linux wrote:
> > On Fri, Jul 06, 2018 at 11:03:46AM +0100, Russell King - ARM Linux wrote:
> >> On Wed, Apr 25, 2018 at 11:01:15PM +0300, Jyri Sarha wrote:
> >>> Oh yes. But in this case the substandard solution is already there and
> >>> it is already widely used, despite it being severely broken. I am merely
> >>> trying to fix the existing substandard solution.
> >>>
> >>> I admit that a full integration with component helpers would probably be
> >>> more elegant solution to the problem, but the amount of work is just too
> >>> much. The change would impact the way all the master drm drivers pull
> >>> them selves together. The drivers that already use the component helpers
> >>> for some internal stuff will add their own challenge. Separate component
> >>> matching implementations are needed for device-tree and ACPI (are ther
> >>> more flavors?) etc. I just do not see this happening any time soon (am
> >>> happy to be wrong about this).
> >>
> >> The issue is actually worse than that:
> >>
> >> - drivers that are already componentised can't use bridges
> >> - drivers that use bridges can't use componentised stuff
> >>
> >> because bridges don't register themselves with the component helper,
> >> and the helpers in drm_of.c assume that all graph nodes will be
> >> components.
> >>
> >> The whole thing about whether stuff is componentised or bridge based
> >> is really getting out of hand, and the push is towards bridge based
> >> stuff even though that is technically inferior when it comes to being
> >> able to develop and test (which involves being able to remove and
> >> re-insert modules.)
> >>
> >> Consequently more and more code is being written for bridges, and
> >> the component helper ignored, and the problems with bridges are
> >> being ignored.  This is not healthy.
> >>
> >> The problem is only going to get worse.  Someone needs to bite the
> >> bullet and fix bridges before the problem gets any more out of hand.
> > 
> > This patch (which is actually two patches locally) allows the component
> > helper to know what's going on inside the bridge code wrt bridge
> > availability, and takes the appropriate action at the correct time.
> > No need for device links or similar, or incompatibilities between
> > bridges and components.  The only requirement is that bridges set the
> > "device" member of struct drm_bridge to opt-in to this.
> > 
> > Tested with Armada converted to support bridges, TDA998x as a
> > componentised bridge, and dumb-vga-dac as a non-componentised bridge:
> > 
> > root@cubox:~# less /sys/kernel/debug/device_component/display-subsystem
> > master name                                            status
> > -------------------------------------------------------------
> > display-subsystem                                       bound
> > 
> > device name                                            status
> > -------------------------------------------------------------
> > port                                               registered
> > port                                               registered
> > hdmi-encoder                                       registered
> > vga-bridge                                         registered
> > root@cubox:~# dmesg |grep bound
> > [    1.921798] armada-drm display-subsystem: bound f1820000.lcd-controller (ops
> > armada_lcd_ops)
> > [    1.931014] armada-drm display-subsystem: bound f1810000.lcd-controller (ops
> > armada_lcd_ops)
> > [    2.069231] armada-drm display-subsystem: bound 1-0070 (ops tda998x_ops)
> > [    2.076059] armada-drm display-subsystem: bound vga-bridge (ops dummy_ops)
> > 
> > Without this, the same DT fails because "vga-bridge" is never added
> > to the component helpers.
> 
> What did you need to do to convert Armada to support bridges? How much
> work is it to convert drivers that support bridges so that they
> support components? Maybe that's not needed? What happens with tda998x?
> I mean, it already calls component_add, and with this there's an
> indirect call in drm_bridge_add which it also calls. I guess I'm asking
> if a component may call component_add several times without things
> sliding sideways?

The difference with tda998x is that with the code below (as it stood
in an earlier revision of the bridge code, when we had a separate
bridge->of_node member), bridge->device is not set for the tda998x,
which avoids the duplicated component_add() - which would be illegal
(and cause problems.)

However, I also hacked tda998x to make tda998x_bind() a no-op - without
such a hack, the DRM driver needs to know whether the bridge is tda998x
or not, so it knows whether it needs to create the encoder.

I don't think there's any simple, non-hacky solution to this problem.

> 
> > 
> > diff --git a/drivers/base/component.c b/drivers/base/component.c
> > index 8946dfee4768..b14b3a3655ea 100644
> > --- a/drivers/base/component.c
> > +++ b/drivers/base/component.c
> > @@ -602,4 +602,32 @@ void component_del(struct device *dev, const struct component_ops *ops)
> >  }
> >  EXPORT_SYMBOL_GPL(component_del);
> >  
> > +static int component_dummy_bind(struct device *comp, struct device *master,
> > +				void *master_data)
> > +{
> > +	return 0;
> > +}
> > +
> > +static void component_dummy_unbind(struct device *comp, struct device *master,
> > +				   void *master_data)
> > +{
> > +}
> > +
> > +static const struct component_ops dummy_ops = {
> > +	.bind = component_dummy_bind,
> > +	.unbind = component_dummy_unbind,
> > +};
> > +
> > +int component_mark_available(struct device *dev)
> > +{
> > +	return component_add(dev, &dummy_ops);
> > +}
> > +EXPORT_SYMBOL_GPL(component_mark_available);
> > +
> > +void component_mark_unavailable(struct device *dev)
> > +{
> > +	component_del(dev, &dummy_ops);
> > +}
> > +EXPORT_SYMBOL_GPL(component_mark_unavailable);
> > +
> 
> Is this really needed in component.c? I'd say that these dummy
> bridge_component_bind/unbind can be added directly in drm_bridge.c
> and that the new call to component_mark_available in drm_bridge
> could simply be component_add(bridge->device, &bridge_component_ops)
> (etc)

What if other subsystems want this functionality?  IMHO, it belongs
in the component layer, not in other subsystems where it could end
up being duplicated.

> >  MODULE_LICENSE("GPL v2");
> > diff --git a/drivers/gpu/drm/drm_bridge.c b/drivers/gpu/drm/drm_bridge.c
> > index 1638bfe9627c..ce3ccd327916 100644
> > --- a/drivers/gpu/drm/drm_bridge.c
> > +++ b/drivers/gpu/drm/drm_bridge.c
> > @@ -21,6 +21,7 @@
> >   * DEALINGS IN THE SOFTWARE.
> >   */
> >  
> > +#include <linux/component.h>
> >  #include <linux/err.h>
> >  #include <linux/module.h>
> >  #include <linux/mutex.h>
> > @@ -73,6 +74,9 @@ void drm_bridge_add(struct drm_bridge *bridge)
> >  	mutex_lock(&bridge_lock);
> >  	list_add_tail(&bridge->list, &bridge_list);
> >  	mutex_unlock(&bridge_lock);
> > +
> > +	if (bridge->device)
> > +		WARN_ON(component_mark_available(bridge->device));
> >  }
> >  EXPORT_SYMBOL(drm_bridge_add);
> >  
> > @@ -83,6 +87,9 @@ EXPORT_SYMBOL(drm_bridge_add);
> >   */
> >  void drm_bridge_remove(struct drm_bridge *bridge)
> >  {
> > +	if (bridge->device)
> > +		component_mark_unavailable(bridge->device);
> > +
> >  	mutex_lock(&bridge_lock);
> >  	list_del_init(&bridge->list);
> >  	mutex_unlock(&bridge_lock);
> > diff --git a/include/drm/drm_bridge.h b/include/drm/drm_bridge.h
> > index 3270fec46979..e863da14d4d9 100644
> > --- a/include/drm/drm_bridge.h
> > +++ b/include/drm/drm_bridge.h
> > @@ -268,6 +268,7 @@ struct drm_bridge {
> >  	struct drm_device *dev;
> >  	struct drm_encoder *encoder;
> >  	struct drm_bridge *next;
> > +	struct device *device;
> 
> In patch [1] i add struct device *odev (for owner device) and the series
> then proceeds to convert all bridges to add a link to its owner device
> and to then remove the (below) of_node member.
> 
> [1] https://lkml.org/lkml/2018/5/16/382
> 
> Would it be bad if all bridges opted in to this? In other words, could
> my "odev" and your "device" be shared?

No (see my explanation above about duplicate registrations not being
permitted.)