Message ID | f6812adbe26a68b0545f5fd25b6d9d6a@bga.com (mailing list archive) |
---|---|
State | Not Applicable, archived |
Headers | show |
Hello Milton, On Tuesday, November 11, 2008 Milton Miller wrote: [snip] >>>> >>>> #ifdef CONFIG_PTE_64BIT >>>> typedef unsigned long long pte_basic_t; >>>> +#ifdef CONFIG_PPC_256K_PAGES >>>> +#define PTE_SHIFT (PAGE_SHIFT - 7) >>> >>> This seems to be missing the comment on how many ptes are actually in >>> the page that are in the other if and else cases. >> >> Ok. I'll fix this. Actually it's another hack: we don't use full page >> for PTE table because we need to reserve something for PGD > I don't understand "we need to reserve something for PGD". Do you > mean that you would not require a second page for the PGD because the > full pagetable could fit in one page? My first reaction was to say > then create pgtable-nopgd.h like the other two. The page walkers > support this with the advent of gigantic pages. Then I realized that > might not be optimal: while the page table might fit in one page, it > would mean you always allocate the pte space to cover the full address > space. Even if your processes spread out over the 3G of address space > allocated to them (32 bit kernel), you will allocate space for 4G, > wasting 1/4 of the pte space. > That does imply you want to allocate the pte page from a slab instead > of pgalloc. Is that covered? Well, in case of 256K PAGE_SIZE we do not need the PGD level indeed (18 bits are used for offset, and remaining 14 bits are for PTE index inside the PTE table). Even the full 256K PTE page isn't necessary to cover the full range: only half of it would be enough (with 14 bits we can address only 16K PTEs). But the head_44x.S code is essentially based on the assumption of 2-level page addressing. Also, I may guess that eliminating of the PGD level won't be as easy as just a re-implementation of the TLB-miss handlers in head_44x.S. So, the current approach for 256K-pages support was just a compromise between the required for the project functionality, and the effort necessary to achieve it. Regards, Yuri -- Yuri Tikhonov, Senior Software Engineer Emcraft Systems, www.emcraft.com
On Nov 13, 2008, at 10:32 PM, Yuri Tikhonov wrote: > On Tuesday, November 11, 2008 Milton Miller wrote: >>>>> #ifdef CONFIG_PTE_64BIT >>>>> typedef unsigned long long pte_basic_t; >>>>> +#ifdef CONFIG_PPC_256K_PAGES >>>>> +#define PTE_SHIFT (PAGE_SHIFT - 7) >>>> >>>> This seems to be missing the comment on how many ptes are actually >>>> in >>>> the page that are in the other if and else cases. >>> >>> Ok. I'll fix this. Actually it's another hack: we don't use full page >>> for PTE table because we need to reserve something for PGD > >> I don't understand "we need to reserve something for PGD". Do you >> mean that you would not require a second page for the PGD because the >> full pagetable could fit in one page? ... >> That does imply you want to allocate the pte page from a slab instead >> of pgalloc. Is that covered? > > Well, in case of 256K PAGE_SIZE we do not need the PGD level indeed > (18 bits are used for offset, and remaining 14 bits are for PTE index > inside the PTE table). Even the full 256K PTE page isn't necessary to > cover the full range: only half of it would be enough (with 14 bits we > can address only 16K PTEs). > > But the head_44x.S code is essentially based on the assumption of > 2-level page addressing. Also, I may guess that eliminating of the > PGD level won't be as easy as just a re-implementation of the TLB-miss > handlers in head_44x.S. So, the current approach for 256K-pages > support was just a compromise between the required for the project > functionality, and the effort necessary to achieve it. So are you allocating the < PAGE_SIZE levels from slabs (either kmalloc or dedicated) instead of allocating pages? Or are you wasting the extra space? At a very minimum you need to comment this in the code. If I were maintiner I would say not wasting large fractions of pages when the page size is 256k would be my merge requirement. As I said, I'm fine with keeping the page table two levels, but the tradeoff needs to be documented. milton
Hello Milton, On Friday, November 14, 2008 you wrote: > On Nov 13, 2008, at 10:32 PM, Yuri Tikhonov wrote: >> On Tuesday, November 11, 2008 Milton Miller wrote: >>>>>> #ifdef CONFIG_PTE_64BIT >>>>>> typedef unsigned long long pte_basic_t; >>>>>> +#ifdef CONFIG_PPC_256K_PAGES >>>>>> +#define PTE_SHIFT (PAGE_SHIFT - 7) >>>>> >>>>> This seems to be missing the comment on how many ptes are actually >>>>> in >>>>> the page that are in the other if and else cases. >>>> >>>> Ok. I'll fix this. Actually it's another hack: we don't use full page >>>> for PTE table because we need to reserve something for PGD >> >>> I don't understand "we need to reserve something for PGD". Do you >>> mean that you would not require a second page for the PGD because the >>> full pagetable could fit in one page? > ... >>> That does imply you want to allocate the pte page from a slab instead >>> of pgalloc. Is that covered? >> >> Well, in case of 256K PAGE_SIZE we do not need the PGD level indeed >> (18 bits are used for offset, and remaining 14 bits are for PTE index >> inside the PTE table). Even the full 256K PTE page isn't necessary to >> cover the full range: only half of it would be enough (with 14 bits we >> can address only 16K PTEs). >> >> But the head_44x.S code is essentially based on the assumption of >> 2-level page addressing. Also, I may guess that eliminating of the >> PGD level won't be as easy as just a re-implementation of the TLB-miss >> handlers in head_44x.S. So, the current approach for 256K-pages >> support was just a compromise between the required for the project >> functionality, and the effort necessary to achieve it. > So are you allocating the < PAGE_SIZE levels from slabs (either kmalloc > or dedicated) instead of allocating pages? Or are you wasting the > extra space? Wasting the extra space has a place here. > At a very minimum you need to comment this in the code. If I were > maintiner I would say not wasting large fractions of pages when the > page size is 256k would be my merge requirement. As I said, I'm fine > with keeping the page table two levels, but the tradeoff needs to be > documented. Agree, we'll document this fact, and re-submit the patch. Regards, Yuri -- Yuri Tikhonov, Senior Software Engineer Emcraft Systems, www.emcraft.com
diff --git a/kernel/fork.c b/kernel/fork.c index f5fba87..f157ad6 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -118,10 +118,7 @@ void __init fork_init(unsigned long mempages) * value: the thread structures can take up at most half * of memory. */ - if (THREAD_SIZE >= PAGE_SIZE) - max_threads = mempages / (THREAD_SIZE/PAGE_SIZE) / 8; - else - max_threads = mempages / 8; + max_threads = mempages / (8 * THREAD_SIZE / PAGE_SIZE); /* * we need to allow at least 20 threads to boot a system v2.6.10-rc1-g368b064 commit 368b06415c11e286f6ab3fe7c52bdd5b9b6f3008 Author: dhowells <dhowells> Commit: dhowells <dhowells> [PATCH] fix page size assumption in fork() The attached patch fixes fork to get rid of the assumption that THREAD_SIZE >= PAGE_SIZE (on the FR-V the smallest available page size is 16KB). Signed-Off-By: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org> BKrev: 4193db17ZJRaaVNEGezHMBUmByER4A diff --git a/kernel/fork.c b/kernel/fork.c index eb689d9..f5fba87 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -118,7 +118,11 @@ void __init fork_init(unsigned long mempages) * value: the thread structures can take up at most half * of memory. */ - max_threads = mempages / (THREAD_SIZE/PAGE_SIZE) / 8; + if (THREAD_SIZE >= PAGE_SIZE) + max_threads = mempages / (THREAD_SIZE/PAGE_SIZE) / 8; + else + max_threads = mempages / 8; + /* * we need to allow at least 20 threads to boot a system */ v2.4.0-g4214e42 commit 4214e42f96d4051cb77b1b7c2b041715db84ffd9 Author: torvalds <torvalds> Commit: torvalds <torvalds> v2.4.9.11 -> v2.4.9.12 - Alan Cox: much more merging - Pete Zaitcev: ymfpci race fixes - Andrea Arkangeli: VM race fix and OOM tweak. - Arjan Van de Ven: merge RH kernel fixes - Andi Kleen: use more readable 'likely()/unlikely()' instead of __builtin_expect() - Keith Owens: fix 64-bit ELF types - Gerd Knorr: mark more broken PCI bridges, update btaudio driver - Paul Mackerras: powermac driver update - me: clean up PTRACE_DETACH to use common infrastructure BKrev: 3c603e338Tv2BTX9tkeBFGWLdI-r4Q diff --git a/kernel/fork.c b/kernel/fork.c index 9179e23..91aeda9 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -72,7 +72,7 @@ void __init fork_init(unsigned long mempages) * value: the thread structures can take up at most half * of memory. */ - max_threads = mempages / (THREAD_SIZE/PAGE_SIZE) / 16; + max_threads = mempages / (THREAD_SIZE/PAGE_SIZE) / 8; init_task.rlim[RLIMIT_NPROC].rlim_cur = max_threads/2; init_task.rlim[RLIMIT_NPROC].rlim_max = max_threads/2; v2.4.0-gcaeb6d6 commit caeb6d68179ecd9dfeac8fa17daa7150163fa318 Author: torvalds <torvalds> Commit: torvalds <torvalds> v2.4.9.10 -> v2.4.9.11 - Neil Brown: md cleanups/fixes - Andrew Morton: console locking merge - Andrea Arkangeli: major VM merge BKrev: 3c603e2fnBNvsVsBbJrGD3fFs4xTFg diff --git a/kernel/fork.c b/kernel/fork.c index ebfbf2b..9179e23 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -72,7 +72,7 @@ void __init fork_init(unsigned long mempages) * value: the thread structures can take up at most half * of memory. */ - max_threads = mempages / (THREAD_SIZE/PAGE_SIZE) / 2; + max_threads = mempages / (THREAD_SIZE/PAGE_SIZE) / 16; init_task.rlim[RLIMIT_NPROC].rlim_cur = max_threads/2; init_task.rlim[RLIMIT_NPROC].rlim_max = max_threads/2;