diff mbox

[v2,1/2] astbmc: Add NPU slot type and use it in Garrison

Message ID 20160621080448.25155-1-ruscur@russell.cc
State Superseded
Headers show

Commit Message

Russell Currey June 21, 2016, 8:04 a.m. UTC
NPU links are grouped into pairs, signifying that they both correlate to
the same physical GPU.  These pairs are presented in the Garrison slot
table as typical PCI devices, creating what are essentially redundant
entries.  Add a new slot type specifically for NPUs, and subsequently
perform bdfn correlation using it.

Signed-off-by: Russell Currey <ruscur@russell.cc>
---
V2: New change to facilitate bdfn allocation without using pbcq
---
 platforms/astbmc/astbmc.h   |  2 ++
 platforms/astbmc/garrison.c | 36 ++++++++----------------------------
 platforms/astbmc/slots.c    |  7 +++++++
 3 files changed, 17 insertions(+), 28 deletions(-)

Comments

Alistair Popple June 22, 2016, 2:12 a.m. UTC | #1
On Tue, 21 Jun 2016 18:04:47 Russell Currey wrote:
> NPU links are grouped into pairs, signifying that they both correlate to
> the same physical GPU.  These pairs are presented in the Garrison slot
> table as typical PCI devices, creating what are essentially redundant
> entries.  Add a new slot type specifically for NPUs, and subsequently
> perform bdfn correlation using it.
> 
> Signed-off-by: Russell Currey <ruscur@russell.cc>
> ---
> V2: New change to facilitate bdfn allocation without using pbcq
> ---
>  platforms/astbmc/astbmc.h   |  2 ++
>  platforms/astbmc/garrison.c | 36 ++++++++----------------------------
>  platforms/astbmc/slots.c    |  7 +++++++
>  3 files changed, 17 insertions(+), 28 deletions(-)

<snip>
 
> diff --git a/platforms/astbmc/slots.c b/platforms/astbmc/slots.c
> index 36547e1..992f68b 100644
> --- a/platforms/astbmc/slots.c
> +++ b/platforms/astbmc/slots.c
> @@ -70,6 +70,13 @@ static const struct slot_table_entry 
*match_slot_dev_entry(struct phb *phb,
>  			prerror("SLOT: Bad PHB entry type in table !\n");
>  			continue;
>  		}
> +
> +		if (ent->etype == st_npu_slot) {
> +			/* NPU groups are at device level, so ignore function 
*/
> +			if (ent->location == ((pd->bdfn & 0xff) >> 3))

Is this reliable? ent->location is a ST_LOC_NPU_GROUP code but what guarantees 
that the pd->bdfn device number matches the NPU_GROUP number?

Do we need something like "if (ent->location == pd->npu_group)" instead?

- Alistair

> +				return ent;
> +		}
> +
>  		if (ent->location == (pd->bdfn & 0xff))
>  			return ent;
>  	}
>
Russell Currey June 22, 2016, 2:24 a.m. UTC | #2
On Wed, 2016-06-22 at 12:12 +1000, Alistair Popple wrote:
> On Tue, 21 Jun 2016 18:04:47 Russell Currey wrote:
> > NPU links are grouped into pairs, signifying that they both correlate to
> > the same physical GPU.  These pairs are presented in the Garrison slot
> > table as typical PCI devices, creating what are essentially redundant
> > entries.  Add a new slot type specifically for NPUs, and subsequently
> > perform bdfn correlation using it.
> > 
> > Signed-off-by: Russell Currey <ruscur@russell.cc>
> > ---
> > V2: New change to facilitate bdfn allocation without using pbcq
> > ---
> >  platforms/astbmc/astbmc.h   |  2 ++
> >  platforms/astbmc/garrison.c | 36 ++++++++----------------------------
> >  platforms/astbmc/slots.c    |  7 +++++++
> >  3 files changed, 17 insertions(+), 28 deletions(-)
> 
> <snip>
>  
> > diff --git a/platforms/astbmc/slots.c b/platforms/astbmc/slots.c
> > index 36547e1..992f68b 100644
> > --- a/platforms/astbmc/slots.c
> > +++ b/platforms/astbmc/slots.c
> > @@ -70,6 +70,13 @@ static const struct slot_table_entry 
> *match_slot_dev_entry(struct phb *phb,
> >  			prerror("SLOT: Bad PHB entry type in table !\n");
> >  			continue;
> >  		}
> > +
> > +		if (ent->etype == st_npu_slot) {
> > +			/* NPU groups are at device level, so ignore
> > function 
> */
> > +			if (ent->location == ((pd->bdfn & 0xff) >> 3))
> 
> Is this reliable? ent->location is a ST_LOC_NPU_GROUP code but what
> guarantees 
> that the pd->bdfn device number matches the NPU_GROUP number?

It depends.  It works for Garrison since the groups (0, 1) correlate to the
device numbers, which are (0, 1, 8, 9 -- so 0, 0, 1, 1).  There's probably a
more robust way to do things, though.
> 
> Do we need something like "if (ent->location == pd->npu_group)" instead?

I'm hoping to avoid adding NPU-specific stuff to any generic PCI structs, if
possible.

> 
> - Alistair
> 
> > +				return ent;
> > +		}
> > +
> >  		if (ent->location == (pd->bdfn & 0xff))
> >  			return ent;
> >  	}
> >
Alistair Popple June 22, 2016, 4:23 a.m. UTC | #3
On Wed, 22 Jun 2016 12:24:17 Russell Currey wrote:
> On Wed, 2016-06-22 at 12:12 +1000, Alistair Popple wrote:
> > On Tue, 21 Jun 2016 18:04:47 Russell Currey wrote:
> > > NPU links are grouped into pairs, signifying that they both correlate to
> > > the same physical GPU.  These pairs are presented in the Garrison slot
> > > table as typical PCI devices, creating what are essentially redundant
> > > entries.  Add a new slot type specifically for NPUs, and subsequently
> > > perform bdfn correlation using it.
> > > 
> > > Signed-off-by: Russell Currey <ruscur@russell.cc>
> > > ---
> > > V2: New change to facilitate bdfn allocation without using pbcq
> > > ---
> > >  platforms/astbmc/astbmc.h   |  2 ++
> > >  platforms/astbmc/garrison.c | 36 ++++++++----------------------------
> > >  platforms/astbmc/slots.c    |  7 +++++++
> > >  3 files changed, 17 insertions(+), 28 deletions(-)
> > 
> > <snip>
> >  
> > > diff --git a/platforms/astbmc/slots.c b/platforms/astbmc/slots.c
> > > index 36547e1..992f68b 100644
> > > --- a/platforms/astbmc/slots.c
> > > +++ b/platforms/astbmc/slots.c
> > > @@ -70,6 +70,13 @@ static const struct slot_table_entry 
> > *match_slot_dev_entry(struct phb *phb,
> > >  			prerror("SLOT: Bad PHB entry type in table !\n");
> > >  			continue;
> > >  		}
> > > +
> > > +		if (ent->etype == st_npu_slot) {
> > > +			/* NPU groups are at device level, so ignore
> > > function 
> > */
> > > +			if (ent->location == ((pd->bdfn & 0xff) >> 3))
> > 
> > Is this reliable? ent->location is a ST_LOC_NPU_GROUP code but what
> > guarantees 
> > that the pd->bdfn device number matches the NPU_GROUP number?
> 
> It depends.  It works for Garrison since the groups (0, 1) correlate to the
> device numbers, which are (0, 1, 8, 9 -- so 0, 0, 1, 1).  There's probably a
> more robust way to do things, though.

My concern is that this is more by dumb luck than design as doesn't it depend 
on the order in which NPUs are allocated bdfn's in npu_allocate_bdfn?

For example if the links in ST_LOC_NPU_GROUP(1) are assigned bdfn's before 
those in ST_LOC_NPU_GROUP(0) then won't the slot and therefore the GPU 
associations be reversed because those in GROUP(1) will have device number 0? 
Incorrect GPU associations will cause issues in Linux and we don't make any 
guarantees about the order in which links are probed/bdfn's allocated so we 
would need to fix that for this to work (or derive bdfn from group id).

> > Do we need something like "if (ent->location == pd->npu_group)" instead?
> 
> I'm hoping to avoid adding NPU-specific stuff to any generic PCI structs, if
> possible.

Yeah, it's generally best to avoid bloating generic structs with device 
specific fields if possible but in this case it would be justified imho 
(especially as it's only an int).

- Alistair

> > 
> > - Alistair
> > 
> > > +				return ent;
> > > +		}
> > > +
> > >  		if (ent->location == (pd->bdfn & 0xff))
> > >  			return ent;
> > >  	}
> > > 
>
diff mbox

Patch

diff --git a/platforms/astbmc/astbmc.h b/platforms/astbmc/astbmc.h
index 23c31c7..0cf2ac1 100644
--- a/platforms/astbmc/astbmc.h
+++ b/platforms/astbmc/astbmc.h
@@ -20,6 +20,7 @@ 
 
 #define ST_LOC_PHB(chip_id, phb_idx)    ((chip_id) << 16 | (phb_idx))
 #define ST_LOC_DEVFN(dev, fn)	        ((dev) << 3 | (fn))
+#define ST_LOC_NPU_GROUP(group_id)	(group_id)
 
 struct slot_table_entry {
 	enum slot_table_etype {
@@ -27,6 +28,7 @@  struct slot_table_entry {
 		st_phb,
 		st_pluggable_slot,
 		st_builtin_dev,
+		st_npu_slot
 	} etype;
 	uint32_t location;
 	const char *name;
diff --git a/platforms/astbmc/garrison.c b/platforms/astbmc/garrison.c
index 3ff84a3..fb006db 100644
--- a/platforms/astbmc/garrison.c
+++ b/platforms/astbmc/garrison.c
@@ -63,23 +63,13 @@  static const struct slot_table_entry garrison_phb0_3_slot[] = {
 
 static const struct slot_table_entry garrison_npu0_slots[] = {
 	{
-		.etype = st_pluggable_slot,
-		.location = ST_LOC_DEVFN(0,0),
-		.name = "GPU2",
-	},
-	{
-		.etype = st_pluggable_slot,
-		.location = ST_LOC_DEVFN(0,1),
+		.etype = st_npu_slot,
+		.location = ST_LOC_NPU_GROUP(0),
 		.name = "GPU2",
 	},
 	{
-		.etype = st_pluggable_slot,
-		.location = ST_LOC_DEVFN(1,0),
-		.name = "GPU1",
-	},
-	{
-		.etype = st_pluggable_slot,
-		.location = ST_LOC_DEVFN(1,1),
+		.etype = st_npu_slot,
+		.location = ST_LOC_NPU_GROUP(1),
 		.name = "GPU1",
 	},
 	{ .etype = st_end },
@@ -152,23 +142,13 @@  static const struct slot_table_entry garrison_phb1_3_slot[] = {
 
 static const struct slot_table_entry garrison_npu1_slots[] = {
 	{
-		.etype = st_pluggable_slot,
-		.location = ST_LOC_DEVFN(0,0),
-		.name = "GPU4",
-	},
-	{
-		.etype = st_pluggable_slot,
-		.location = ST_LOC_DEVFN(0,1),
+		.etype = st_npu_slot,
+		.location = ST_LOC_NPU_GROUP(0),
 		.name = "GPU4",
 	},
 	{
-		.etype = st_pluggable_slot,
-		.location = ST_LOC_DEVFN(1,0),
-		.name = "GPU3",
-	},
-	{
-		.etype = st_pluggable_slot,
-		.location = ST_LOC_DEVFN(1,1),
+		.etype = st_npu_slot,
+		.location = ST_LOC_NPU_GROUP(1),
 		.name = "GPU3",
 	},
 	{ .etype = st_end },
diff --git a/platforms/astbmc/slots.c b/platforms/astbmc/slots.c
index 36547e1..992f68b 100644
--- a/platforms/astbmc/slots.c
+++ b/platforms/astbmc/slots.c
@@ -70,6 +70,13 @@  static const struct slot_table_entry *match_slot_dev_entry(struct phb *phb,
 			prerror("SLOT: Bad PHB entry type in table !\n");
 			continue;
 		}
+
+		if (ent->etype == st_npu_slot) {
+			/* NPU groups are at device level, so ignore function */
+			if (ent->location == ((pd->bdfn & 0xff) >> 3))
+				return ent;
+		}
+
 		if (ent->location == (pd->bdfn & 0xff))
 			return ent;
 	}