Here’s another way to crash MATLAB!
Now that we know MATLAB’s memory manager has some problems with actually cleaning up too many objects at once, we can have some fun with it!
function pushdown(N, crash)
%an N of more than about 1000000 with crash=1 is interesting.
a = {};
for i = 1:N
a = {rand() a};
end
if ~exist('crash', 'var') || ~crash
%feed it to matlab's deallocator one piece at a time, or it will choke.
for i = 1:N
a = a{2};
end
end
end
First try running:
>> pushdown(1000000)
Then try:
>> pushdown(1000000, 1)
What do you get? I get a hard crash.
So not only can you blow the interpreter stack with deallocation, you can blow the C stack as well!
I mean, a system can get by for some purposes without garbage collection… but not when reference-counting is done this badly.
oh wow
You know how errors in destructors are transformed into warnings?
I was doing some comparisons various strategies for filling out arrays in MATLAB. I looked at the docs and noticed an implementation of a doubly linked list, and thought, why not include that and see how badly it performs?
Just pull the code for dlnode out of your own MATLAB help files:
edit ([docroot '/techdoc/matlab_oop/examples/@dlnode/dlnode.m']);
then copy and paste it right into a file named dlnode.m on your matlab path, and then try this:
tail = dlnode(0); for i = 1:510 new = dlnode(i); insertAfter(new, tail); tail = new; end clear tail; clear new;
The output from this overflows the command window buffer so I won’t post the whole thing. Here’s an excerpt:
In dlnode>dlnode.delete at 64 In dlnode>dlnode.delete at 64 In dlnode>dlnode.delete at 64 In dlnode>dlnode.delete at 64 In dlnode>dlnode.delete at 64 In dlnode>dlnode.delete at 64 Warning: The following error was caught while executing 'dlnode' class destructor: Maximum recursion limit of 500 reached. Use set(0,'RecursionLimit',N) to change the limit. Be aware that exceeding your available stack space can crash MATLAB and/or your computer. > In dlnode>dlnode.delete at 64 In dlnode>dlnode.delete at 64 In dlnode>dlnode.delete at 64 In dlnode>dlnode.delete at 64 In dlnode>dlnode.delete at 64 In dlnode>dlnode.delete at 64 In dlnode>dlnode.delete at 64
….Yeah, not only is the error you get transformed into a warning, but what an error it was! Linked chains of handle objects pile up destructors on the stack and overflow if too large of a group of handle objects goes out of scope at once. And a goddamn stack overflow gets exception-funneled down into a warning!
Your computer has multiple gigabytes of memory. You can not make a linked list more than 500 items long without breaking MATLAB. Oh, wow.
Well, you can increase the recursion limit, but not far before you get hard crashes.
So how badly does the Mathworks’ own example of a linked list perform, anyway? Well, building a linked list N items long turns out to be between O(N^2) and O(N^3). Wowsers. (For the data-structures-impaired among you, it ought to be O(N).) And it’s factors of ten slower than the next slowest way of building up a list of indeterminate length.
These all — the absurd slowness of object property access, the recursive piling up of destructors on the stack, the exception-swallowing — these are all knock-on effects of the Mathworks’ insane notion that they can do automatic memory management without a garbage collector and guarantee deterministic destruction and support arbitrarily connected reference graphs. Well, a system built under these constraints can do at least one of three things: either it will have accomplished what in the history of computer science has never before been accomplished, or its time and space complexity is going to blow up, or it won’t actually work as advertised.
They might as well have assumed P=NP while they were at it.
UPDATE: It gets worse! 500 was the limit in R2010a, but as of R2011B, there’s twice as much recursion on finalization, so you can only make a list 250 items long!
No Lists
MATLAB is a language supposedly intended to help the user process data. Mind-bogglingly, it does not provide lists, which are pretty fundamental to processing data in the real world. What’s more, it does not even include the possibility of implementing a reasonable facsimile of a list.
If you’ve been indoctrinated into MATLAB, you’ve probably been told about the evils of resizing an array inside of a loop, and how you must avoid that by preallocating. Lately MATLAB presents a helpful compiler warning that tells you if you appear to be resizing an array in a loop. See, if you have an array X that is 10 elements long, and you put a value into X(11), MATLAB has to allocate an entirely new array, with 11 elements, then copy the original 10 elements into the new memory space, copy your new value in, then reclaim the memory taken by the original 10 element array. This takes a bit of time. What’s worse, the longer your list gets the more data it has to copy and the more time that takes — so if you keep adding things onto the end of your array, it transforms an O(n) loop into an O(n2).
So to avoid that massive penalty you can preallocate — make an array with zeros() or similar that’s large enough to hold all your results, before you go and compute your results. The problem with this aside from the extra lines of code (which decreases readability by taking up the reader’s mind with bookkeeping, as well as making a potential entry point for bugs if you mess up calculating the exact size you need) is that a lot of the time it’s just impossible to predict how many slots you need in the array before you actually go and collect the data that needs to go in the array.
One such situation is reading from a log file. When I run an experiment the end product is a text log file that details what happened in the experiment, in chronological order. Since I can’t know at the beginning of the experiment how many trials it has — some trials are thrown out for various reasons and the experiment might just terminate — I can’t preallocate the array to hold trial data when I read in the log file for analysis. Nor can I predict how many events (e.g. fixation breaks, cue onsets) a trial has before I read it in.
Additionally, there are a whole class of algorithms that rely on having a list that you can add or remove entries on the beginning or end of — breadth first graph traversal, for instance, and open up an algorithms text to a random page to see more — and it basically becomes impossible to implement those algorithms in MATLAB.
It’s almost natural to look at the way MATLAB handles adding an n+1th element to an n-element array and wonder, “gee, why recopy all that data? Why not just allocate one new place for entry 11, and add a link from the 10 elements to the eleventh element so that you just make your 10 element list a little longer?” If you could do that, you’d have a ‘list’, it’s such a natural thing to think of that almost every high level language you could consider using — except MATLAB — provides lists as a fundamental datatype. And low level languages like C provide you the tools to construct your own lists and other data structures. MATLAB is rather unique in allowing neither solution within the language.
Do other languages provide lists?
A better question might be, “do any languages in use today not provide lists or the opportunity to build them?”
However, since the two alternatives I’d like to push scientific computing users toward are R and Python, let’s see what they offer.
- Python offers a builtin list datatype, and appending operations in particular are much faster, on average, than MATLAB’s. Python also provides the collections library which offers several different implementations of lists fine tuned to particular tasks — of course, all are transparently usable with the same syntax as regular lists, and can be used wherever a list can be used. If you really want compact arrays with no overhead and don’t care about appending, Python also offers that as well as the multidimensional MATLAB-like arrays of the Numeric package.
- R‘s vector and list data types are implemented with a fixed array, like MATLAB’s, but R is really a LISP under the hood, so it definitely has the flexibility to implement data structures in user space. The standard library package Containers gives you some nice implementations.
Do any other languages used today not provide lists or let you create them?
I can’t think of a single language I’ve used, other than MATLAB, and maybe exercises in school where you write out longhand all the state transitions for a Turing machine, that doesn’t. The 1977 version of FORTRAN, maybe doesn’t. I suppose this tells you that MATLAB met its original late-seventies design goal of being easier to use than FORTRAN. It is now 2009.
What do you mean, I can’t implement a list? Doesn’t MATLAB have like objects ‘n’ stuff?
Give it a try, good luck! The first problem you find is that MATLAB does not support references or pointers (more on this in a later post) which are necessary to build your own list. Perhaps you will try MATLAB’s new ‘handle objects’ or notice that ‘function handles’ are essentially references to their local workspace variables. What you will find is that instantiating a whole damn object or closure for every reference you want to take is incredibly slow — and even if you manage to implement it, the way that MATLAB invokes methods on closures and objects does something very weird with memory management that guarantees you will not meet the o(1) append/delete that you wanted (more on that in a later post too.)
I suppose you could always make like a FORTRAN77 programmer, allocate a gobsmacking huge global array, and roll your own malloc into it. Heh.
What about JAVA, doesn’t that provide lists? And can’t you access it from MATLAB?
Can JAVA variables hold MATLAB data types? Or even handles to MATLAB data types? No. Not even just as an opaque object. (Someone please help this guy!) So you can pass simple numeric values and strings, but cells or function handles or objects (even “handle objects”) can’t be organized in Java containers. So this approach doesn’t get you very far.
What’s the best I can do if I insist on using MATLAB, yet I need to collect values in a non-preallocated way?
Something you can do in MATLAB is make a cell array of cell arrays of cell arrays, This lets you have an ersatz pushdown stack, the closest you can get to a sequence container:
n = 0;
stack = {};
function push(x)
stack = {x stack};
n = n + 1;
end
function x = pop()
x = stack{1};
stack = stack{2};
end
function out = readout()
out = cell(1, n);
for i = n:-1:1
out{i} = pop();
end
end
So you accumulate values with push() and when you’re done you read off an array with readout(). Note that if you try to encapsulate this technique to instantiate stacks using handle objects, or handles to nested functions, you’re in for a nasty surprise as you will not find your speed advantage — a later post to this blog will explore why. I have written a helper function accumulate.m that works around this problem via a persistent struct (with the consequence that you will leak memory if you don’t clean up your lists). Here it is:
function [push, readout] = accumulate(dim)
%function [push, readout] = accumulate(DIM)
%
%Peter Meilstrup, 2008
%
%Records a list of data (e.g. streaming eye position or other
%streaming input data) in memory without the speed penalty of reallocating
%arrays in a loop.
%
%Returns two function handles, 'push', and 'readout', you push data in
%using the pirst, then at the end of your accumulation read it out using
%the second. The accumulated data is concatenated along the dimension DIM.
%
% Example:
%
% >> [push, readout] = linkedlist(2)
% push =
% @linkedlist/doPush
% readout =
% @linkedlist/doReadout
% >> push([1;2]);
% >> push([3 4;5 6]);
% >> push([0; 0]);
% >> readout()
% ans =
% 1 3 4 0
% 2 5 6 0
persistent lists;
persistent counter;
if isempty(lists)
lists = struct();
counter = 0;
end
if ~exist('dim', 'var')
dim = 1;
end
elements = 0;
name = '';
push = @doPush;
readout = @doReadout;
function doPush(what)
if isempty(name)
name = ['t' num2str(counter)];
counter = counter +1;
lists.(name) = {};
elements = 0;
end
lists.(name) = {lists.(name) what};
elements = elements + 1;
end
function out = doReadout()
if ~isempty(name)
l = lists.(name);
lists = rmfield(lists, name);
name = '';
else
l = {};
end
if ~isempty(l)
out = cell(1, elements);
for i = elements:-1:1
out{i} = l{2};
l = l{1};
end
out = cat(dim, out{:});
else
out = [];
end
end
end
ADDENDA: