Message ID | AANLkTince6QY0t4gExABG21ZqZ4hMEjGm=mTm7MBfYGo@mail.gmail.com |
---|---|
State | Rejected |
Headers | show |
On Sun, 26 Sep 2010, Bryan Wu wrote: > On Sun, Sep 26, 2010 at 11:01 PM, Nicolas Pitre > <nicolas.pitre@canonical.com> wrote: > > On Sun, 26 Sep 2010, Ricardo Salveti de Araujo wrote: > > > >> On Fri, Sep 24, 2010 at 03:04:10AM -0300, Ricardo Salveti de Araujo wrote: > >> > Will also test it with only one cpu to see if this could be realted with SMP > >> > issues. > >> > >> Ok, tested the same kernel but running with only one CPU, for 40 hours (what gave me > >> 15 builds), and went all fine, without any errors at both userspace and kernelspace. > >> > >> So it seems that this data abort exception could be related with concurrency and > >> SMP support at our kernel. > > > > Right. So I'd suggest you keep highmem off, and 2g:2g on (with the > > VMALLOC_END fix), then try to reliably reproduce the issue with that > > configuration and fix it before involving highmem again. While highmem > > may make the problem more visible, it also brings a set of added > > complexity of its own which would make the tracking of the issue much > > harder. > > > > > > Nicolas > > > > I disabled CONFIG_CACHE_L2X0 L2 cache controller for omap4. So far for > the SMP kernel with mem=1G, kernel building is running correctly. > I will test more. It looks like L2 cache controlling has some issue. That's with or without highmem involved? Nicolas
On Mon, Sep 27, 2010 at 12:31 AM, Nicolas Pitre <nicolas.pitre@canonical.com> wrote: > On Sun, 26 Sep 2010, Bryan Wu wrote: > >> On Sun, Sep 26, 2010 at 11:01 PM, Nicolas Pitre >> <nicolas.pitre@canonical.com> wrote: >> > On Sun, 26 Sep 2010, Ricardo Salveti de Araujo wrote: >> > >> >> On Fri, Sep 24, 2010 at 03:04:10AM -0300, Ricardo Salveti de Araujo wrote: >> >> > Will also test it with only one cpu to see if this could be realted with SMP >> >> > issues. >> >> >> >> Ok, tested the same kernel but running with only one CPU, for 40 hours (what gave me >> >> 15 builds), and went all fine, without any errors at both userspace and kernelspace. >> >> >> >> So it seems that this data abort exception could be related with concurrency and >> >> SMP support at our kernel. >> > >> > Right. So I'd suggest you keep highmem off, and 2g:2g on (with the >> > VMALLOC_END fix), then try to reliably reproduce the issue with that >> > configuration and fix it before involving highmem again. While highmem >> > may make the problem more visible, it also brings a set of added >> > complexity of its own which would make the tracking of the issue much >> > harder. >> > >> > >> > Nicolas >> > >> >> I disabled CONFIG_CACHE_L2X0 L2 cache controller for omap4. So far for >> the SMP kernel with mem=1G, kernel building is running correctly. >> I will test more. It looks like L2 cache controlling has some issue. > > That's with or without highmem involved? > > It's with highmem, but finally it still fails with message like this: "Unhandled fault: imprecise external abort (0x1406) at 0x400b0000" Thanks,
On Mon, 2010-09-27 at 09:32 +0800, Bryan Wu wrote: > On Mon, Sep 27, 2010 at 12:31 AM, Nicolas Pitre > <nicolas.pitre@canonical.com> wrote: > > On Sun, 26 Sep 2010, Bryan Wu wrote: > > > >> On Sun, Sep 26, 2010 at 11:01 PM, Nicolas Pitre > >> <nicolas.pitre@canonical.com> wrote: > >> > On Sun, 26 Sep 2010, Ricardo Salveti de Araujo wrote: > >> > > >> >> On Fri, Sep 24, 2010 at 03:04:10AM -0300, Ricardo Salveti de Araujo wrote: > >> >> > Will also test it with only one cpu to see if this could be realted with SMP > >> >> > issues. > >> >> > >> >> Ok, tested the same kernel but running with only one CPU, for 40 hours (what gave me > >> >> 15 builds), and went all fine, without any errors at both userspace and kernelspace. > >> >> > >> >> So it seems that this data abort exception could be related with concurrency and > >> >> SMP support at our kernel. > >> > > >> > Right. So I'd suggest you keep highmem off, and 2g:2g on (with the > >> > VMALLOC_END fix), then try to reliably reproduce the issue with that > >> > configuration and fix it before involving highmem again. While highmem > >> > may make the problem more visible, it also brings a set of added > >> > complexity of its own which would make the tracking of the issue much > >> > harder. > >> > > >> > > >> > Nicolas > >> > > >> > >> I disabled CONFIG_CACHE_L2X0 L2 cache controller for omap4. So far for > >> the SMP kernel with mem=1G, kernel building is running correctly. > >> I will test more. It looks like L2 cache controlling has some issue. > > > > That's with or without highmem involved? > > It's with highmem, but finally it still fails with message like this: > "Unhandled fault: imprecise external abort (0x1406) at 0x400b0000" Without L2, with highmem and SMP I can easily reproduce the issue, but was able to run for 20 hours (6 builds) without L2, without highmem and with SMP. So currently I can use 1G when not running with highmem and disabling SMP or L2. This issue is probably a racing condition, but hard to trace where exactly. Cheers,
diff --git a/debian.ti-omap4/config/config.common.ubuntu b/debian.ti-omap4/config/config.common.ubuntu index 8d46b55..8f5b7e9 100644 --- a/debian.ti-omap4/config/config.common.ubuntu +++ b/debian.ti-omap4/config/config.common.ubuntu @@ -320,8 +320,7 @@ CONFIG_C2PORT=m CONFIG_CACHEFILES=m # CONFIG_CACHEFILES_DEBUG is not set # CONFIG_CACHEFILES_HISTOGRAM is not set -CONFIG_CACHE_L2X0=y -CONFIG_CACHE_PL310=y +# CONFIG_CACHE_L2X0 is not set # CONFIG_CAIF is not set CONFIG_CAN=m CONFIG_CAN_BCM=m @@ -928,6 +927,7 @@ CONFIG_HID_WACOM=m CONFIG_HID_ZEROPLUS=m # CONFIG_HID_ZYDACRON is not set CONFIG_HIGHMEM=y +# CONFIG_HIGHPTE is not set CONFIG_HIGH_RES_TIMERS=y CONFIG_HOSTAP=m # CONFIG_HOSTAP_FIRMWARE is not set @@ -1928,8 +1928,6 @@ CONFIG_OMFS_FS=m CONFIG_OPROFILE=y CONFIG_OSF_PARTITION=y # CONFIG_OTUS is not set -CONFIG_OUTER_CACHE=y -CONFIG_OUTER_CACHE_SYNC=y CONFIG_P54_COMMON=m CONFIG_P54_LEDS=y CONFIG_P54_SPI=m @@ -1970,7 +1968,6 @@ CONFIG_PHONET=m CONFIG_PHYLIB=y # CONFIG_PHYS_ADDR_T_64BIT is not set CONFIG_PID_NS=y -# CONFIG_PL310_ERRATA_588369 is not set # CONFIG_PLAT_SPEAR is not set CONFIG_PLIP=m # CONFIG_PM is not set