Message ID | 20100115011604.DCB1CB93@are.twiddle.net |
---|---|
State | New |
Headers | show |
On 01/14/2010 04:38 PM, Richard Henderson wrote: > Previously, mmap_find_vma could return addresses not properly aligned > to the target page size. This of course led to all sorts of odd > problems down the road. > > The trivial fix, to simply reject the unaligned address and continue > searching the address space by increments of one page, is not a good > idea when there's a 64-bit address space involved. The kernel may well > continue to return the last available address which we've already > rejected while we search upward from e.g. 2**42 from 2**64. > > This patch uses a more complex search algorithm that takes the result > of the previous allocation into account. We normally search upward, > but notice 2 consecutive results and start searching downward instead. I've failed to take guest_base into account properly; my testing happened to have guest_base = 0. New patch to follow. r~
On Fri, Jan 15, 2010 at 07:43:48AM -0800, Richard Henderson wrote: >> This patch uses a more complex search algorithm that takes the result >> of the previous allocation into account. We normally search upward, >> but notice 2 consecutive results and start searching downward instead. > I've failed to take guest_base into account properly; my testing > happened to have guest_base = 0. New patch to follow. Did you have time to look at this again?
On 01/25/2010 05:35 AM, Riku Voipio wrote: > On Fri, Jan 15, 2010 at 07:43:48AM -0800, Richard Henderson wrote: >>> This patch uses a more complex search algorithm that takes the result >>> of the previous allocation into account. We normally search upward, >>> but notice 2 consecutive results and start searching downward instead. > >> I've failed to take guest_base into account properly; my testing >> happened to have guest_base = 0. New patch to follow. > > Did you have time to look at this again? Some. There is an additional problem. The target's pages are recorded in the physical page mapping via page_set_flags, whose input address is bounded by TARGET_PHYS_ADDR_SPACE_BITS, which is local to exec.c. Doubly unfortunately, page_set_flags silently discards pages that it considers must be outside the target's address space. It's fairly easy to get the x86-64 kernel to return a vma outside the range of any of the existing TARGET_PHYS_ADDR_SPACE_BITS. Which works fine inside the TB's, but causes any other syscall to return -EFAULT. We have a check /* If address fits target address space we've found what we need */ if ((unsigned long)ptr + size - 1 <= (abi_ulong)-1) break; which kind-of works for 32-bit targets. (You'll note, of course, that the comparison is against PTR, which is a host address; the correct test would have been h2g_valid, as seen with -B 0x100000000.) However, it does nothing for 64-bit targets and the artificial 2**42 virtual address space limit we impose on most of them. I talked with pbrook a bit about this on irc, and there seems to be no simple solution (like exporting TARGET_PHYS_ADDR_SPACE_BITS, possibly renamed as TARGET_ADDR_SPACE_BITS, in cpu.h) that would be acceptable to him. Given the number of patches I've already submitted that aren't being reviewed, I'm unlikely to develop the momentum to totally rewrite qemu's page table support. Particularly without a clue as to what sort of solution might be acceptable. (Something like Sparc64's hashed page tables perhaps, which support a full 64-bit virtual address space?) r~
diff --git a/linux-user/main.c b/linux-user/main.c index a0d8ce7..7db9fc3 100644 --- a/linux-user/main.c +++ b/linux-user/main.c @@ -2725,12 +2725,9 @@ int main(int argc, char **argv, char **envp) /* * Read in mmap_min_addr kernel parameter. This value is used * When loading the ELF image to determine whether guest_base - * is needed. - * - * When user has explicitly set the quest base, we skip this - * test. + * is needed. It is also used in mmap_find_vma. */ - if (!have_guest_base) { + { FILE *fp; if ((fp = fopen("/proc/sys/vm/mmap_min_addr", "r")) != NULL) { diff --git a/linux-user/mmap.c b/linux-user/mmap.c index 144fb7c..b92fdc4 100644 --- a/linux-user/mmap.c +++ b/linux-user/mmap.c @@ -281,8 +281,9 @@ unsigned long last_brk; */ abi_ulong mmap_find_vma(abi_ulong start, abi_ulong size) { - void *ptr; + void *ptr, *prev; abi_ulong addr; + int wrapped, repeat; size = HOST_PAGE_ALIGN(size); start &= qemu_host_page_mask; @@ -292,8 +293,11 @@ abi_ulong mmap_find_vma(abi_ulong start, abi_ulong size) start = mmap_next_start; addr = start; + wrapped = (start == 0); + repeat = 0; + prev = 0; - for(;;) { + for (;; prev = ptr) { /* * Reserve needed memory area to avoid a race. * It should be discarded using: @@ -305,20 +309,69 @@ abi_ulong mmap_find_vma(abi_ulong start, abi_ulong size) MAP_ANONYMOUS|MAP_PRIVATE|MAP_NORESERVE, -1, 0); /* ENOMEM, if host address space has no memory */ - if (ptr == MAP_FAILED) + if (ptr == MAP_FAILED) { return (abi_ulong)-1; + } - /* If address fits target address space we've found what we need */ - if ((unsigned long)ptr + size - 1 <= (abi_ulong)-1) + /* Count the number of sequential returns of the same address. + This is used to modify the search algorithm below. */ + repeat = (ptr == prev ? repeat + 1 : 0); + + if ((unsigned long)ptr & ~TARGET_PAGE_MASK) { + /* The address is not properly aligned for the target. */ + switch (repeat) { + case 0: + /* Assume the result that the kernel gave us is the + first with enough free space, so start again at the + next higher target page. */ + addr = TARGET_PAGE_ALIGN((unsigned long)ptr); + break; + case 1: + /* Sometimes the kernel decides to perform the allocation + at the top end of memory instead. Notice this via + sequential allocations that result in the same address. */ + /* ??? This can be exacerbated by a successful allocation + at the top of memory on one round, and storing that + result in mmap_next_start. The next allocation is sure + to start at an address that's going to fail. */ + addr = (unsigned long)ptr & TARGET_PAGE_MASK; + break; + case 2: + /* Start over at low memory. */ + addr = 0; + break; + default: + /* Fail. This unaligned block must be the only one left. */ + addr = -1; + break; + } + } else if ((unsigned long)ptr + size - 1 <= (abi_ulong)-1) { break; + } else { + /* Since the result the kernel gave didn't fit, start + again at low memory. If any repetition, fail. */ + addr = (repeat ? -1 : 0); + } - /* Unmap and try again with new page */ + /* Unmap and try again. */ munmap(ptr, size); - addr += qemu_host_page_size; - /* ENOMEM if we check whole of target address space */ - if (addr == start) + /* ENOMEM if we checked the whole of the target address space. */ + if (addr == -1) { return (abi_ulong)-1; + } else if (addr == 0) { + if (wrapped) { + return (abi_ulong)-1; + } + wrapped = 1; + /* Don't actually use 0 when wrapping, instead indicate + that we'd truely like an allocation in low memory. */ + addr = (mmap_min_addr > TARGET_PAGE_SIZE + ? TARGET_PAGE_ALIGN(mmap_min_addr) + : TARGET_PAGE_SIZE); + } else if (wrapped && addr >= start) { + return (abi_ulong)-1; + } } /* Update default start address */
Previously, mmap_find_vma could return addresses not properly aligned to the target page size. This of course led to all sorts of odd problems down the road. The trivial fix, to simply reject the unaligned address and continue searching the address space by increments of one page, is not a good idea when there's a 64-bit address space involved. The kernel may well continue to return the last available address which we've already rejected while we search upward from e.g. 2**42 from 2**64. This patch uses a more complex search algorithm that takes the result of the previous allocation into account. We normally search upward, but notice 2 consecutive results and start searching downward instead. Signed-off-by: Richard Henderson <rth@twiddle.net> --- linux-user/main.c | 7 +--- linux-user/mmap.c | 71 ++++++++++++++++++++++++++++++++++++++++++++++------ 2 files changed, 64 insertions(+), 14 deletions(-)