Click to See Complete Forum and Search --> : 2GB Per Process Limitation IA32
madcompnerd
04-28-2005, 12:58 AM
Alright, I've dug around about this enough; time to pray someone has a clue:
I have a machine with 4GB of physical memory. It's a Xeon, 2.6GHz I belive. My current kernel configs have these things setup:
User Address Space 3GB
High Memory 64GB
Yet, I cannot get a single process to be allowed more than 2GB's of memory usage. I should be able to get 3GB out of a process (although I still don't understand why it's not 4, but that's another story).
Don't argue with me over whether I can get around allocing 2GB, just tell me if it's possible please and what is required. I'm using highmem and she's set for 64GB!
bwkaz
04-28-2005, 07:35 PM
I think the caveat with HIGHMEM_64G is that no individual process gets more than 4G at a time. If that's true, the reason would be the way the Pentium Pro processors (and higher) give access to the extra memory -- you have to jump through all kinds of hoops to point 32-bit addresses at some other physical address, so having the kernel do it on pagefaults (or whatever) would be horribly slow. It just does it on a context switch instead.
But I don't know that for sure, either.
And I don't know why you're running into a 2G limit -- where are you getting that number from? I thought that the Linux kernel function to allocate memory would never fail, it would just kill off a process if it runs out of memory. (It'll use the OOM killer to do that.) So if you're getting that number from ps, pmap, or top, it may be because the process itself isn't allocating more than 2G.
Actually... if you turn on grsecurity's "segmentation-based PAX" code to prevent data areas (stack, heap) from running code, then that will cut the user-usable memory in half. Segmentation-based PAX is faster than paging-based PAX, but it gives you less memory. (It's the classic tradeoff of speed vs. space.) Anyway, if you run grsecurity, that might be part of it.
madcompnerd
04-28-2005, 09:03 PM
Not necessarily, malloc returns NULL if it fails. It actually turns out you can get 3GB, which makes sense (but of course someone wants 4GB). The 3GB limit makes sense with the 3GB user address space. I think what had happened was that there wasn't enough large spaces to allocate 2GB (I was doing it by 1GB at a time).
64GB uses PAE I believe. PAE is slow because it requires pointers to pointers in memory to be able to see those values (since the internal int is actually smaller than the address). It's supposed to incur a 3-6% performance loss. But if you *need* that much then you *need* that much.
Incidentally, highmem is needed for more than 1GB of RAM, which is expensive as it is. I have yet to figure out why it's needed at 1GB but oh well.
bwkaz
04-29-2005, 10:43 AM
Originally posted by madcompnerd
Not necessarily, malloc returns NULL if it fails. In general, yes, you have to check the return value from malloc if you want your programs to be portable.
But I think I remember hearing that Linux's malloc won't ever return NULL, the kernel will just kill a process if you run out of RAM. Not sure on that though.
I think what had happened was that there wasn't enough large spaces to allocate 2GB (I was doing it by 1GB at a time). That may be -- malloc (or actually, brk(), the system call) may require the allocated addresses to be contiguous.
64GB uses PAE I believe. Yep, that's how I understand it.
PAE is slow because it requires pointers to pointers in memory to be able to see those values (since the internal int is actually smaller than the address). Maybe, but I don't think so. I think the reason is the page tables.
Page table entries are 32 bits wide (IIRC), but they can't map to arbitrary physical addresses. Some bits out of those 32 are various flags (present/not-present, writable/not-writable, etc.), some are reserved, and some (in a nutshell) map the page to a physical address. (There are actually two or three levels of page tables, each taking a certain part of the linear address and either using it to select an entry in the next table down, or using it to get at the actual physical address.)
Without PAE, the processor uses either 4K or 4M pages (which means that if you view the page-table as one long contiguous table, then for 4G, the part of the PTE that refers to the physical address is either 20 bits (4K pages) or 10 bits (4M pages)).
With PAE, I believe the processor has to use 4M pages, but the 10 bits of physical address in the PTE gets another 4 tacked onto it somehow. (Maybe out of the extra 10 that would be used in 4K-page mode?) These extra 4 bits make it so that the processor can address 64G of memory total (though only in larger pages). (Again, this is if you view the page-table as one long contiguous table -- it's actually still multi-level.)
I may be remembering wrong, too -- it's been a while since I last read the Pentium Pro development guide PDF on Intel's site that explained all this.
If it is true, then in order to access more than 4G at a time, your process would have to use several different segment registers (to map to different page-table entries, to map to different physical address ranges). The Linux ABI has no way of doing that -- all data accesses happen with respect to %ds, and all stack accesses happen with respect to %ss -- so there's no inherent support for >4G-per-process memory, I don't think.
Incidentally, highmem is needed for more than 1GB of RAM, which is expensive as it is. I have yet to figure out why it's needed at 1GB but oh well. It's because of the way the kernel maps memory. It has to map the memory into its own address space before it can use it for user programs, I think, and by default the kernel's address space is the upper 1G of memory.
Turning on HIGHMEM_4G changes that somehow.
madcompnerd
04-29-2005, 10:51 AM
I've seen it return NULL; that's what it does. Which will typically kill your program (and depending on the program, that may be the behavior you want). Obviously, if you planned to allocate large amounts of memory, you'd check it.
I don't know what else it could return if it couldn't allocate the RAM... If it returns any other number you have real problems.
I believe what I read about PAE was referring to the final "pass the address unit the address" part and not internal kernel workings. Obviously, with 32bit registers you can't store the 36bit address within the general purpose registers.
http://www.spack.org/wiki/LinuxRamLimits