Why can a 352GB NumPy ndarray be used on an 8GB memory macOS computer?

import numpy as np



array = np.zeros((210000, 210000)) # default numpy.float64

array.nbytes

When I run the above code on my 8GB memory MacBook with macOS, no error occurs. But running the same code on a 16GB memory PC with Windows 10, or a 12GB memory Ubuntu laptop, or even on a 128GB memory Linux supercomputer, the Python interpreter will raise a MemoryError. All the test environments have 64-bit Python 3.6 or 3.7 installed.

edited 2 hours ago

Boann

37.1k1290121

asked 8 hours ago

Blaise Wang

798

1

MacOS extends memory with virtual memory on your disk. Check your process details with Activity Monitor and you'll find a Virtual Memory: 332.71 GB entry. But it's all zeros, so it compresses really, really well..

– Martijn Pieters♦
7 hours ago

@MartijnPieters but Windows 10 and Linux also have similar mechanisms. Windows 10 has virtual memory and Linux have swap. Activity Monitor doesn't have VM for 332.71 GB. I use sysctl vm.swapusage to see the real VM usage and got 1200 M

– Blaise Wang
7 hours ago

But they don't compress.

– Martijn Pieters♦
7 hours ago

1

@MartijnPieters The problem is that Windows 10 added support of RAM compression science build 10525. But still cannot run the above code.

– Blaise Wang
4 hours ago

add a comment |

import numpy as np



array = np.zeros((210000, 210000)) # default numpy.float64

array.nbytes

edited 2 hours ago

Boann

37.1k1290121

asked 8 hours ago

Blaise Wang

798

1

MacOS extends memory with virtual memory on your disk. Check your process details with Activity Monitor and you'll find a Virtual Memory: 332.71 GB entry. But it's all zeros, so it compresses really, really well..

– Martijn Pieters♦
7 hours ago

@MartijnPieters but Windows 10 and Linux also have similar mechanisms. Windows 10 has virtual memory and Linux have swap. Activity Monitor doesn't have VM for 332.71 GB. I use sysctl vm.swapusage to see the real VM usage and got 1200 M

– Blaise Wang
7 hours ago

But they don't compress.

– Martijn Pieters♦
7 hours ago

1

@MartijnPieters The problem is that Windows 10 added support of RAM compression science build 10525. But still cannot run the above code.

– Blaise Wang
4 hours ago

add a comment |

import numpy as np



array = np.zeros((210000, 210000)) # default numpy.float64

array.nbytes

edited 2 hours ago

Boann

37.1k1290121

asked 8 hours ago

Blaise Wang

798

import numpy as np



array = np.zeros((210000, 210000)) # default numpy.float64

array.nbytes

python macos numpy memory

edited 2 hours ago

Boann

37.1k1290121

asked 8 hours ago

Blaise Wang

798

edited 2 hours ago

Boann

37.1k1290121

asked 8 hours ago

Blaise Wang

798

edited 2 hours ago

Boann

37.1k1290121

edited 2 hours ago

Boann

37.1k1290121

edited 2 hours ago

Boann

37.1k1290121

asked 8 hours ago

Blaise Wang

798

asked 8 hours ago

Blaise Wang

798

asked 8 hours ago

Blaise Wang

798

1

MacOS extends memory with virtual memory on your disk. Check your process details with Activity Monitor and you'll find a Virtual Memory: 332.71 GB entry. But it's all zeros, so it compresses really, really well..

– Martijn Pieters♦
7 hours ago

@MartijnPieters but Windows 10 and Linux also have similar mechanisms. Windows 10 has virtual memory and Linux have swap. Activity Monitor doesn't have VM for 332.71 GB. I use sysctl vm.swapusage to see the real VM usage and got 1200 M

– Blaise Wang
7 hours ago

But they don't compress.

– Martijn Pieters♦
7 hours ago

1

@MartijnPieters The problem is that Windows 10 added support of RAM compression science build 10525. But still cannot run the above code.

– Blaise Wang
4 hours ago

add a comment |

1

MacOS extends memory with virtual memory on your disk. Check your process details with Activity Monitor and you'll find a Virtual Memory: 332.71 GB entry. But it's all zeros, so it compresses really, really well..

– Martijn Pieters♦
7 hours ago

@MartijnPieters but Windows 10 and Linux also have similar mechanisms. Windows 10 has virtual memory and Linux have swap. Activity Monitor doesn't have VM for 332.71 GB. I use sysctl vm.swapusage to see the real VM usage and got 1200 M

– Blaise Wang
7 hours ago

But they don't compress.

– Martijn Pieters♦
7 hours ago

1

@MartijnPieters The problem is that Windows 10 added support of RAM compression science build 10525. But still cannot run the above code.

– Blaise Wang
4 hours ago

MacOS extends memory with virtual memory on your disk. Check your process details with Activity Monitor and you'll find a Virtual Memory: 332.71 GB entry. But it's all zeros, so it compresses really, really well..

– Martijn Pieters♦
7 hours ago

@MartijnPieters but Windows 10 and Linux also have similar mechanisms. Windows 10 has virtual memory and Linux have swap. Activity Monitor doesn't have VM for 332.71 GB. I use sysctl vm.swapusage to see the real VM usage and got 1200 M

– Blaise Wang
7 hours ago

But they don't compress.

– Martijn Pieters♦
7 hours ago

@MartijnPieters The problem is that Windows 10 added support of RAM compression science build 10525. But still cannot run the above code.

– Blaise Wang
4 hours ago

add a comment |

2 Answers
2

active

oldest

votes

@Martijn Pieters' answer is on the right track, but not quite right: this has nothing to do with memory compression, but instead it has to do with virtual memory.

For example, try running the following code on your machine:

arrays = [np.zeros((21000, 21000)) for _ in range(0, 10000)]

This code allocates 32TiB of memory, but you won't get an error (at least I didn't, on Linux). If I check htop, I see the following:

  PID USER      PRI  NI  VIRT   RES   SHR S CPU% MEM%   TIME+  Command

31362 user       20   0 32.1T 69216 12712 S  0.0  0.4  0:00.22 python

This because the OS is perfectly willing to overcommit on virtual memory. It won't actually assign pages to physical memory until it needs to. The way it works is:

calloc asks the OS for some memory to use

the OS looks in the process's page tables, and finds a chunk of memory that it's willing to assign. This is fast operation, the OS just stores the memory address range in an internal data structure.

the program writes to one of the addresses.

the OS receives a page fault, at which point it looks and actually assigns the page to physical memory. A page is usually a few KiB in size.

the OS passes control back to the program, which proceeds without noticing the interruption.

I have no idea why creating a single huge array doesn't work on Linux or Windows, but I'd expect it to have more to do with the platform's libc's calloc() implementation and the limits imposed there than the operating system.

For fun, try running arrays = [np.ones((21000, 21000)) for _ in range(0, 10000)]. You'll definitely get an out of memory error, even on MacOs or Linux with swap compression. Yes, certain OSes can compress RAM, but they can't compress it to the level that you wouldn't run out of memory.

edited 21 mins ago

answered 1 hour ago

user60561

9201824

I tried your first example which indeed the Linux allocated 32t virtual memory on a 128GB memory server. However, MemoryError raised with my example array = np.zeros((210000, 210000)). My example will only need 352GB virtual memory which seems more reasonable than the 32t virtual memory.

– Blaise Wang
44 mins ago

@BlaiseWang Right, I addressed that in my answer "I have no idea why creating a single huge array doesn't work on Linux or Windows, but I'd expect it to have more to do with the platform's implementation of libc and the limits imposed there than the operating system." If you'd really like to know why, I'd suggest you review the code in code.woboq.org/userspace/glibc/malloc/malloc.c.html (I can't be bothered to do so)

– user60561
22 mins ago

add a comment |

You are most likely using Mac OS X Mavericks or newer, so 10.9 or up. From that version onwards, MacOS uses virtual memory compression, where memory requirements that exceed your physical memory are not only redirected to memory pages on disk, but those pages are compressed to save space.

For your ndarray, you may have requested ~332GB of memory, but it's all a contiguous sequence of NUL bytes at the moment, and that compresses really, really well:

Memory stats from the Activity Monitor, showing a virtual memory size of 332.71 GB but Real Memory Size stat of 9.3 MB

That's a screenshot from the Activity Monitor tool, with the process details of my Python process where I replicated your test (use the (I) icon on the toolbar to open it); this is from the Memory tab, where you can see that the Real Memory Size column is only 9.3 MB used, against a Virtual Memory Size of 332.71GB.

Once you start setting other values for those indices, you'll quickly see the memory stats increase to gigabytes instead of megabytes:

while True:

    index = tuple(np.random.randint(array.shape[0], size=2))

    array[index] = np.random.uniform(-10 ** -307, 10 ** 307)

or you can push the limit further by assigning to every index (in batches, so you can watch the memory grow):

array = array.reshape((-1,))

for i in range(0, array.shape[0], 10**5):

    array[i:i + 10**5] = np.random.uniform(-10 ** -307, 10 ** 307, 10**5)

The process is eventually terminated; my Macbook Pro doesn't have enough swap space to store hard-to-compress gigabytes of random data:

>>> array = array.reshape((-1,))

>>> for i in range(0, array.shape[0], 10**5):

...     array[i:i + 10**5] = np.random.uniform(-10 ** -307, 10 ** 307, 10**5)

...

Killed: 9

You could argue that MacOS is being too trusting, letting programs request that much memory without bounds, but with memory compression, memory limits are much more fluid. Your np.zeros() array does fit your system, after all. Even though you probably don't actually have the swap space to store the uncompressed data, compressed it all fits fine so MacOS allows it and terminates processes that then take advantage of the generosity.

If you don't want this to happen, use resource.setrlimit() to set limits on RLIMIT_STACK to, say 2 ** 14, at which point the OS will segfault Python when it exceeds the limits.

edited 3 hours ago

answered 7 hours ago

Martijn Pieters♦

715k13825002313

Memory compression should only matter after allocation has already succeeded. The problem here is probably rather either memory limits (ulimits on linux for example) or more likely that the allocator doesn't find a 300GB sized chunk. If you split those up into 100 3GB pieces it would probably work on windows or linux (with big enough swap) as well.

– inf
7 hours ago

@inf: I don't have 300GB free on my SSD. I do run out of memory when I start filling the array, randomly.

– Martijn Pieters♦
7 hours ago

Define "run out of memory", do you get a MemoryError or just start filling RAM, swapping and get OOMed?

– inf
6 hours ago

@inf: I'm a little reluctant to actually let it run.. As the memory has been allocated by the OS (tracemalloc confirms Python has been given the memory allocation), there won't be a MemoryError, so it'll start swapping and eventually OOMed. But before that point this laptop will be hard to use for a while as everything else is swapped out first.

– Martijn Pieters♦
6 hours ago

I understand :) But that's what I mean. The allocation doesn't even succeed on ubuntu and linux and hence the MemoryError.

– inf
5 hours ago

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54961554%2fwhy-can-a-352gb-numpy-ndarray-be-used-on-an-8gb-memory-macos-computer%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

@Martijn Pieters' answer is on the right track, but not quite right: this has nothing to do with memory compression, but instead it has to do with virtual memory.

For example, try running the following code on your machine:

arrays = [np.zeros((21000, 21000)) for _ in range(0, 10000)]

This code allocates 32TiB of memory, but you won't get an error (at least I didn't, on Linux). If I check htop, I see the following:

  PID USER      PRI  NI  VIRT   RES   SHR S CPU% MEM%   TIME+  Command

31362 user       20   0 32.1T 69216 12712 S  0.0  0.4  0:00.22 python

This because the OS is perfectly willing to overcommit on virtual memory. It won't actually assign pages to physical memory until it needs to. The way it works is:

calloc asks the OS for some memory to use

the OS looks in the process's page tables, and finds a chunk of memory that it's willing to assign. This is fast operation, the OS just stores the memory address range in an internal data structure.

the program writes to one of the addresses.

the OS receives a page fault, at which point it looks and actually assigns the page to physical memory. A page is usually a few KiB in size.

the OS passes control back to the program, which proceeds without noticing the interruption.

edited 21 mins ago

answered 1 hour ago

user60561

9201824

I tried your first example which indeed the Linux allocated 32t virtual memory on a 128GB memory server. However, MemoryError raised with my example array = np.zeros((210000, 210000)). My example will only need 352GB virtual memory which seems more reasonable than the 32t virtual memory.

– Blaise Wang
44 mins ago

@BlaiseWang Right, I addressed that in my answer "I have no idea why creating a single huge array doesn't work on Linux or Windows, but I'd expect it to have more to do with the platform's implementation of libc and the limits imposed there than the operating system." If you'd really like to know why, I'd suggest you review the code in code.woboq.org/userspace/glibc/malloc/malloc.c.html (I can't be bothered to do so)

– user60561
22 mins ago

add a comment |

@Martijn Pieters' answer is on the right track, but not quite right: this has nothing to do with memory compression, but instead it has to do with virtual memory.

For example, try running the following code on your machine:

arrays = [np.zeros((21000, 21000)) for _ in range(0, 10000)]

This code allocates 32TiB of memory, but you won't get an error (at least I didn't, on Linux). If I check htop, I see the following:

  PID USER      PRI  NI  VIRT   RES   SHR S CPU% MEM%   TIME+  Command

31362 user       20   0 32.1T 69216 12712 S  0.0  0.4  0:00.22 python

This because the OS is perfectly willing to overcommit on virtual memory. It won't actually assign pages to physical memory until it needs to. The way it works is:

calloc asks the OS for some memory to use

the OS looks in the process's page tables, and finds a chunk of memory that it's willing to assign. This is fast operation, the OS just stores the memory address range in an internal data structure.

the program writes to one of the addresses.

the OS receives a page fault, at which point it looks and actually assigns the page to physical memory. A page is usually a few KiB in size.

the OS passes control back to the program, which proceeds without noticing the interruption.

edited 21 mins ago

answered 1 hour ago

user60561

9201824

I tried your first example which indeed the Linux allocated 32t virtual memory on a 128GB memory server. However, MemoryError raised with my example array = np.zeros((210000, 210000)). My example will only need 352GB virtual memory which seems more reasonable than the 32t virtual memory.

– Blaise Wang
44 mins ago

@BlaiseWang Right, I addressed that in my answer "I have no idea why creating a single huge array doesn't work on Linux or Windows, but I'd expect it to have more to do with the platform's implementation of libc and the limits imposed there than the operating system." If you'd really like to know why, I'd suggest you review the code in code.woboq.org/userspace/glibc/malloc/malloc.c.html (I can't be bothered to do so)

– user60561
22 mins ago

add a comment |

@Martijn Pieters' answer is on the right track, but not quite right: this has nothing to do with memory compression, but instead it has to do with virtual memory.

For example, try running the following code on your machine:

arrays = [np.zeros((21000, 21000)) for _ in range(0, 10000)]

This code allocates 32TiB of memory, but you won't get an error (at least I didn't, on Linux). If I check htop, I see the following:

  PID USER      PRI  NI  VIRT   RES   SHR S CPU% MEM%   TIME+  Command

31362 user       20   0 32.1T 69216 12712 S  0.0  0.4  0:00.22 python

This because the OS is perfectly willing to overcommit on virtual memory. It won't actually assign pages to physical memory until it needs to. The way it works is:

calloc asks the OS for some memory to use

the OS looks in the process's page tables, and finds a chunk of memory that it's willing to assign. This is fast operation, the OS just stores the memory address range in an internal data structure.

the program writes to one of the addresses.

the OS receives a page fault, at which point it looks and actually assigns the page to physical memory. A page is usually a few KiB in size.

the OS passes control back to the program, which proceeds without noticing the interruption.

edited 21 mins ago

answered 1 hour ago

user60561

9201824

@Martijn Pieters' answer is on the right track, but not quite right: this has nothing to do with memory compression, but instead it has to do with virtual memory.

For example, try running the following code on your machine:

arrays = [np.zeros((21000, 21000)) for _ in range(0, 10000)]

This code allocates 32TiB of memory, but you won't get an error (at least I didn't, on Linux). If I check htop, I see the following:

  PID USER      PRI  NI  VIRT   RES   SHR S CPU% MEM%   TIME+  Command

31362 user       20   0 32.1T 69216 12712 S  0.0  0.4  0:00.22 python

This because the OS is perfectly willing to overcommit on virtual memory. It won't actually assign pages to physical memory until it needs to. The way it works is:

calloc asks the OS for some memory to use

the OS looks in the process's page tables, and finds a chunk of memory that it's willing to assign. This is fast operation, the OS just stores the memory address range in an internal data structure.

the program writes to one of the addresses.

the OS receives a page fault, at which point it looks and actually assigns the page to physical memory. A page is usually a few KiB in size.

the OS passes control back to the program, which proceeds without noticing the interruption.

edited 21 mins ago

answered 1 hour ago

user60561

9201824

edited 21 mins ago

answered 1 hour ago

user60561

9201824

answered 1 hour ago

user60561

9201824

answered 1 hour ago

user60561

9201824

I tried your first example which indeed the Linux allocated 32t virtual memory on a 128GB memory server. However, MemoryError raised with my example array = np.zeros((210000, 210000)). My example will only need 352GB virtual memory which seems more reasonable than the 32t virtual memory.

– Blaise Wang
44 mins ago

@BlaiseWang Right, I addressed that in my answer "I have no idea why creating a single huge array doesn't work on Linux or Windows, but I'd expect it to have more to do with the platform's implementation of libc and the limits imposed there than the operating system." If you'd really like to know why, I'd suggest you review the code in code.woboq.org/userspace/glibc/malloc/malloc.c.html (I can't be bothered to do so)

– user60561
22 mins ago

add a comment |

I tried your first example which indeed the Linux allocated 32t virtual memory on a 128GB memory server. However, MemoryError raised with my example array = np.zeros((210000, 210000)). My example will only need 352GB virtual memory which seems more reasonable than the 32t virtual memory.

– Blaise Wang
44 mins ago

@BlaiseWang Right, I addressed that in my answer "I have no idea why creating a single huge array doesn't work on Linux or Windows, but I'd expect it to have more to do with the platform's implementation of libc and the limits imposed there than the operating system." If you'd really like to know why, I'd suggest you review the code in code.woboq.org/userspace/glibc/malloc/malloc.c.html (I can't be bothered to do so)

– user60561
22 mins ago

I tried your first example which indeed the Linux allocated 32t virtual memory on a 128GB memory server. However, MemoryError raised with my example array = np.zeros((210000, 210000)). My example will only need 352GB virtual memory which seems more reasonable than the 32t virtual memory.

– Blaise Wang
44 mins ago

@BlaiseWang Right, I addressed that in my answer "I have no idea why creating a single huge array doesn't work on Linux or Windows, but I'd expect it to have more to do with the platform's implementation of libc and the limits imposed there than the operating system." If you'd really like to know why, I'd suggest you review the code in code.woboq.org/userspace/glibc/malloc/malloc.c.html (I can't be bothered to do so)

– user60561
22 mins ago

add a comment |

For your ndarray, you may have requested ~332GB of memory, but it's all a contiguous sequence of NUL bytes at the moment, and that compresses really, really well:

Memory stats from the Activity Monitor, showing a virtual memory size of 332.71 GB but Real Memory Size stat of 9.3 MB

Once you start setting other values for those indices, you'll quickly see the memory stats increase to gigabytes instead of megabytes:

while True:

    index = tuple(np.random.randint(array.shape[0], size=2))

    array[index] = np.random.uniform(-10 ** -307, 10 ** 307)

or you can push the limit further by assigning to every index (in batches, so you can watch the memory grow):

array = array.reshape((-1,))

for i in range(0, array.shape[0], 10**5):

    array[i:i + 10**5] = np.random.uniform(-10 ** -307, 10 ** 307, 10**5)

The process is eventually terminated; my Macbook Pro doesn't have enough swap space to store hard-to-compress gigabytes of random data:

>>> array = array.reshape((-1,))

>>> for i in range(0, array.shape[0], 10**5):

...     array[i:i + 10**5] = np.random.uniform(-10 ** -307, 10 ** 307, 10**5)

...

Killed: 9

If you don't want this to happen, use resource.setrlimit() to set limits on RLIMIT_STACK to, say 2 ** 14, at which point the OS will segfault Python when it exceeds the limits.

edited 3 hours ago

answered 7 hours ago

Martijn Pieters♦

715k13825002313

Memory compression should only matter after allocation has already succeeded. The problem here is probably rather either memory limits (ulimits on linux for example) or more likely that the allocator doesn't find a 300GB sized chunk. If you split those up into 100 3GB pieces it would probably work on windows or linux (with big enough swap) as well.

– inf
7 hours ago

@inf: I don't have 300GB free on my SSD. I do run out of memory when I start filling the array, randomly.

– Martijn Pieters♦
7 hours ago

Define "run out of memory", do you get a MemoryError or just start filling RAM, swapping and get OOMed?

– inf
6 hours ago

@inf: I'm a little reluctant to actually let it run.. As the memory has been allocated by the OS (tracemalloc confirms Python has been given the memory allocation), there won't be a MemoryError, so it'll start swapping and eventually OOMed. But before that point this laptop will be hard to use for a while as everything else is swapped out first.

– Martijn Pieters♦
6 hours ago

I understand :) But that's what I mean. The allocation doesn't even succeed on ubuntu and linux and hence the MemoryError.

– inf
5 hours ago

add a comment |

For your ndarray, you may have requested ~332GB of memory, but it's all a contiguous sequence of NUL bytes at the moment, and that compresses really, really well:

Memory stats from the Activity Monitor, showing a virtual memory size of 332.71 GB but Real Memory Size stat of 9.3 MB

Once you start setting other values for those indices, you'll quickly see the memory stats increase to gigabytes instead of megabytes:

while True:

    index = tuple(np.random.randint(array.shape[0], size=2))

    array[index] = np.random.uniform(-10 ** -307, 10 ** 307)

or you can push the limit further by assigning to every index (in batches, so you can watch the memory grow):

array = array.reshape((-1,))

for i in range(0, array.shape[0], 10**5):

    array[i:i + 10**5] = np.random.uniform(-10 ** -307, 10 ** 307, 10**5)

The process is eventually terminated; my Macbook Pro doesn't have enough swap space to store hard-to-compress gigabytes of random data:

>>> array = array.reshape((-1,))

>>> for i in range(0, array.shape[0], 10**5):

...     array[i:i + 10**5] = np.random.uniform(-10 ** -307, 10 ** 307, 10**5)

...

Killed: 9

If you don't want this to happen, use resource.setrlimit() to set limits on RLIMIT_STACK to, say 2 ** 14, at which point the OS will segfault Python when it exceeds the limits.

edited 3 hours ago

answered 7 hours ago

Martijn Pieters♦

715k13825002313

Memory compression should only matter after allocation has already succeeded. The problem here is probably rather either memory limits (ulimits on linux for example) or more likely that the allocator doesn't find a 300GB sized chunk. If you split those up into 100 3GB pieces it would probably work on windows or linux (with big enough swap) as well.

– inf
7 hours ago

@inf: I don't have 300GB free on my SSD. I do run out of memory when I start filling the array, randomly.

– Martijn Pieters♦
7 hours ago

Define "run out of memory", do you get a MemoryError or just start filling RAM, swapping and get OOMed?

– inf
6 hours ago

@inf: I'm a little reluctant to actually let it run.. As the memory has been allocated by the OS (tracemalloc confirms Python has been given the memory allocation), there won't be a MemoryError, so it'll start swapping and eventually OOMed. But before that point this laptop will be hard to use for a while as everything else is swapped out first.

– Martijn Pieters♦
6 hours ago

I understand :) But that's what I mean. The allocation doesn't even succeed on ubuntu and linux and hence the MemoryError.

– inf
5 hours ago

add a comment |

For your ndarray, you may have requested ~332GB of memory, but it's all a contiguous sequence of NUL bytes at the moment, and that compresses really, really well:

Memory stats from the Activity Monitor, showing a virtual memory size of 332.71 GB but Real Memory Size stat of 9.3 MB

Once you start setting other values for those indices, you'll quickly see the memory stats increase to gigabytes instead of megabytes:

while True:

    index = tuple(np.random.randint(array.shape[0], size=2))

    array[index] = np.random.uniform(-10 ** -307, 10 ** 307)

or you can push the limit further by assigning to every index (in batches, so you can watch the memory grow):

array = array.reshape((-1,))

for i in range(0, array.shape[0], 10**5):

    array[i:i + 10**5] = np.random.uniform(-10 ** -307, 10 ** 307, 10**5)

The process is eventually terminated; my Macbook Pro doesn't have enough swap space to store hard-to-compress gigabytes of random data:

>>> array = array.reshape((-1,))

>>> for i in range(0, array.shape[0], 10**5):

...     array[i:i + 10**5] = np.random.uniform(-10 ** -307, 10 ** 307, 10**5)

...

Killed: 9

If you don't want this to happen, use resource.setrlimit() to set limits on RLIMIT_STACK to, say 2 ** 14, at which point the OS will segfault Python when it exceeds the limits.

edited 3 hours ago

answered 7 hours ago

Martijn Pieters♦

715k13825002313

For your ndarray, you may have requested ~332GB of memory, but it's all a contiguous sequence of NUL bytes at the moment, and that compresses really, really well:

Memory stats from the Activity Monitor, showing a virtual memory size of 332.71 GB but Real Memory Size stat of 9.3 MB

Once you start setting other values for those indices, you'll quickly see the memory stats increase to gigabytes instead of megabytes:

while True:

    index = tuple(np.random.randint(array.shape[0], size=2))

    array[index] = np.random.uniform(-10 ** -307, 10 ** 307)

or you can push the limit further by assigning to every index (in batches, so you can watch the memory grow):

array = array.reshape((-1,))

for i in range(0, array.shape[0], 10**5):

    array[i:i + 10**5] = np.random.uniform(-10 ** -307, 10 ** 307, 10**5)

The process is eventually terminated; my Macbook Pro doesn't have enough swap space to store hard-to-compress gigabytes of random data:

>>> array = array.reshape((-1,))

>>> for i in range(0, array.shape[0], 10**5):

...     array[i:i + 10**5] = np.random.uniform(-10 ** -307, 10 ** 307, 10**5)

...

Killed: 9

If you don't want this to happen, use resource.setrlimit() to set limits on RLIMIT_STACK to, say 2 ** 14, at which point the OS will segfault Python when it exceeds the limits.

edited 3 hours ago

answered 7 hours ago

Martijn Pieters♦

715k13825002313

edited 3 hours ago

answered 7 hours ago

Martijn Pieters♦

715k13825002313

answered 7 hours ago

Martijn Pieters♦

715k13825002313

answered 7 hours ago

Martijn Pieters♦

715k13825002313

Memory compression should only matter after allocation has already succeeded. The problem here is probably rather either memory limits (ulimits on linux for example) or more likely that the allocator doesn't find a 300GB sized chunk. If you split those up into 100 3GB pieces it would probably work on windows or linux (with big enough swap) as well.

– inf
7 hours ago

@inf: I don't have 300GB free on my SSD. I do run out of memory when I start filling the array, randomly.

– Martijn Pieters♦
7 hours ago

Define "run out of memory", do you get a MemoryError or just start filling RAM, swapping and get OOMed?

– inf
6 hours ago

@inf: I'm a little reluctant to actually let it run.. As the memory has been allocated by the OS (tracemalloc confirms Python has been given the memory allocation), there won't be a MemoryError, so it'll start swapping and eventually OOMed. But before that point this laptop will be hard to use for a while as everything else is swapped out first.

– Martijn Pieters♦
6 hours ago

I understand :) But that's what I mean. The allocation doesn't even succeed on ubuntu and linux and hence the MemoryError.

– inf
5 hours ago

add a comment |

Memory compression should only matter after allocation has already succeeded. The problem here is probably rather either memory limits (ulimits on linux for example) or more likely that the allocator doesn't find a 300GB sized chunk. If you split those up into 100 3GB pieces it would probably work on windows or linux (with big enough swap) as well.

– inf
7 hours ago

@inf: I don't have 300GB free on my SSD. I do run out of memory when I start filling the array, randomly.

– Martijn Pieters♦
7 hours ago

Define "run out of memory", do you get a MemoryError or just start filling RAM, swapping and get OOMed?

– inf
6 hours ago

@inf: I'm a little reluctant to actually let it run.. As the memory has been allocated by the OS (tracemalloc confirms Python has been given the memory allocation), there won't be a MemoryError, so it'll start swapping and eventually OOMed. But before that point this laptop will be hard to use for a while as everything else is swapped out first.

– Martijn Pieters♦
6 hours ago

I understand :) But that's what I mean. The allocation doesn't even succeed on ubuntu and linux and hence the MemoryError.

– inf
5 hours ago

Memory compression should only matter after allocation has already succeeded. The problem here is probably rather either memory limits (ulimits on linux for example) or more likely that the allocator doesn't find a 300GB sized chunk. If you split those up into 100 3GB pieces it would probably work on windows or linux (with big enough swap) as well.

– inf
7 hours ago

@inf: I don't have 300GB free on my SSD. I do run out of memory when I start filling the array, randomly.

– Martijn Pieters♦
7 hours ago

Define "run out of memory", do you get a MemoryError or just start filling RAM, swapping and get OOMed?

– inf
6 hours ago

@inf: I'm a little reluctant to actually let it run.. As the memory has been allocated by the OS (tracemalloc confirms Python has been given the memory allocation), there won't be a MemoryError, so it'll start swapping and eventually OOMed. But before that point this laptop will be hard to use for a while as everything else is swapped out first.

– Martijn Pieters♦
6 hours ago

I understand :) But that's what I mean. The allocation doesn't even succeed on ubuntu and linux and hence the MemoryError.

– inf
5 hours ago

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

Gl54jod,hZ5eIWAyfozVqX

搜尋此網誌

Yrurtj