You know how errors in destructors are transformed into warnings?
I was doing some comparisons various strategies for filling out arrays in MATLAB. I looked at the docs and noticed an implementation of a doubly linked list, and thought, why not include that and see how badly it performs?
Just pull the code for dlnode out of your own MATLAB help files:
edit ([docroot '/techdoc/matlab_oop/examples/@dlnode/dlnode.m']);
then copy and paste it right into a file named dlnode.m on your matlab path, and then try this:
tail = dlnode(0);
for i = 1:510
new = dlnode(i);
insertAfter(new, tail);
tail = new;
end
clear tail;
clear new;
The output from this overflows the command window buffer so I won’t post the whole thing. Here’s an excerpt:
In dlnode>dlnode.delete at 64
In dlnode>dlnode.delete at 64
In dlnode>dlnode.delete at 64
In dlnode>dlnode.delete at 64
In dlnode>dlnode.delete at 64
In dlnode>dlnode.delete at 64
Warning: The following error was caught while executing 'dlnode' class
destructor:
Maximum recursion limit of 500 reached. Use set(0,'RecursionLimit',N)
to change the limit. Be aware that exceeding your available stack space can
crash MATLAB and/or your computer.
> In dlnode>dlnode.delete at 64
In dlnode>dlnode.delete at 64
In dlnode>dlnode.delete at 64
In dlnode>dlnode.delete at 64
In dlnode>dlnode.delete at 64
In dlnode>dlnode.delete at 64
In dlnode>dlnode.delete at 64
….Yeah, not only is the error you get transformed into a warning, but what an error it was! Linked chains of handle objects pile up destructors on the stack and overflow if too large of a group of handle objects goes out of scope at once. And a goddamn stack overflow gets exception-funneled down into a warning!
Your computer has multiple gigabytes of memory. You can not make a linked list more than 500 items long without breaking MATLAB. Oh, wow.
Well, you can increase the recursion limit, but not far before you get hard crashes.
So how badly does the Mathworks’ own example of a linked list perform, anyway? Well, building a linked list N items long turns out to be between O(N^2) and O(N^3). Wowsers. (For the data-structures-impaired among you, it ought to be O(N).) And it’s factors of ten slower than the next slowest way of building up a list of indeterminate length.
These all — the absurd slowness of object property access, the recursive piling up of destructors on the stack, the exception-swallowing — these are all knock-on effects of the Mathworks’ insane notion that they can do automatic memory management without a garbage collector and guarantee deterministic destruction and support arbitrarily connected reference graphs. Well, a system built under these constraints can do at least one of three things: either it will have accomplished what in the history of computer science has never before been accomplished, or its time and space complexity is going to blow up, or it won’t actually work as advertised.
They might as well have assumed P=NP while they were at it.
UPDATE: It gets worse! 500 was the limit in R2010a, but as of R2011B, there’s twice as much recursion on finalization, so you can only make a list 250 items long!
Permalink
1 Comment
Errors that happen during onCleanup are transformed into warnings? Really?
It doesn’t help that cleanup functions also can’t be closures — they can’t actually respond to data about a resource that was gathered during a program. But first things first.
Hey: if an error happens in my code, that is an error. My program should not continue unless it specifically handles that error. If an error happens while my program is trying to clean up after itself, that means something is wrong and MATLAB should not force my program to blithely continue and wreak further havoc.
I’ve restrained myself from picking too hard on The Mathworks’ decision to promise deterministic destruction for closures and objects, even though it has unacceptable performance penalties. The reason for the restraint is that I can see the argument for object lifecycle management: when your objects correspond to exclusive resources you hold, you do want to have control and guarantees over when they get released.
Well, I just looked into onCleanup and as usual, the Mathworks fucked it up: You cannot write robust programs using onCleanup, because exceptions during cleanup are swallowed.
So I’m going to have to start kicking at the thirty misfeature pileup where MATLAB’s memory management meets its error handling, after all.
There are a number of languages that offer both automatic memory management and exceptions. A few of them are Python, R, Java, and MATLAB. All of these except MATLAB deal with resource cleanup easily with a try/finally statement, which MATLAB lacks, and most also offer some extra sugar in the form of a try-with-resource, which MATLAB tries to do with deterministic destructors, and fails.
One of these things is not like the others:
Python try/finally
“If finally is present, it specifies a ‘cleanup’ handler…. If there is a saved exception, it is re-raised at the end of the finally clause. If the finally clause raises another exception or executes a return or break statement, the saved exception is lost.”
Python with
“That way, if the caller needs to tell whether the __exit__() invocation *failed* (as opposed to successfully cleaning up before
propagating the original error), it can do so.”
Java try/finally
“If a finally clause is executed because of abrupt completion of a try block and the finally clause itself completes abruptly, then the reason for the abrupt completion of the try block is discarded and the new reason for abrupt completion is propagated from there.”
Java try-with-resources
“If exceptions are thrown from both the try block and the try-with-resources statement, then the method readFirstLineFromFile throws the exception thrown from the try block; the exception thrown from the try-with-resources block is suppressed. In Java SE 7 and later, you can retrieve suppressed exceptions”
R tryCatch
“The finally expression is then evaluated in the context in which tryCatch was called; that is, the handlers supplied to the current tryCatch call are not active when the finally expression is evaluated.”
MATLAB delete
“A delete method should not generate errors”
One of these things is not like the others, and the one that fucked up is of course MATLAB.
Permalink
1 Comment
MATLAB went decades without having any mechanism resembling function namespaces. If you downloaded code from two authors for use in a single project, and both toolboxes defined a function of the same name, well, you were in for a headache.
Case in point: the Psychtoolbox provides a function called ‘RandSample.’ The statistics toolbox, on the other hand, provides its own, somewhat different function named ‘randsample.’ If you wrote some code using the Statistics toolbox “randsample”, and the person trying to use is has Psychtoolbox installed, or vice versa, you were in for problems. The code fails inside of whatever ‘randsample’ you happen to have installed; if you are reasonably quick at deciding the problem is not with randSample itself, you jump up the stack trace a step and look at how randsample was called. (Which reminds me how MATLAB’s default behavior on an error is to print out the stack starting with the root…. and if the stack is too deep, it chops off the top of the stack. Whereas everyone who’s remotely sane needs to see the stack starting at the top and working downwards if there is to be any hope at debugging.)
Anyway, you look at this code you downloaded that calls ‘randsample,’ and you try to work out what it’s doing. You compare the code’s usage with the documentation which you access, perhaps by typing ‘help randsample.’ (Oh, but as we will discover below, there’s a delicious way MATLAB will screw you over if you try to read the doc for the function that is now failing.) Presuming you are reading the right documentation string, you try to work out what the calling code is trying to do. Only belatedly do you check ‘which randsample’ and discover that there are two of them (or not — in case the functions being confused are capitalized differently, and you are trying to use bad old code written before case sensitivity – ‘which’ has no option to ignore case, which is only the tip of the iceberg…) You realize, finally, that the code you’re trying to run is calling the wrong ‘randsample.’ And then you are presented with a dilemma: do you reorder your path to use the correct function? But what if there is other code using the one that is already on top of the path? Choose which is better, or more popular, rename the other one to ‘randsample_bad,’ and go on a global grep-and-replace to find out which of your library functions are using the worse one, and change the name in all the places they use it? If you take that path, the next time you upgrade a new version of something you changed, your edit is going to disappear. Maybe you start maintaining patches against the upstream versions of the code you have to change, just because you are unfortunate enough to want to use more than one person’s toolbox. If the original author even knows what a version control system is so that it can provide an “upstream version.” But there’s no hope of that: you didn’t get the code from a version control repository, you got it from the Mathworks File Exchange.
So if you have any large number of MATLAB functions, you have a headache around ensuring that no two functions can have the same name. The Mathworks File Exchange even includes a tool that inspects your uploaded code to see if the names you’ve given your functions are unique enough — you get downgraded for ‘collisions.’ (It was too hard to fix the language to make it more usable and to make code sharing easier, you see, so they fixed the website to make it less usable and to code sharing harder.) This exerts a selective pressure on the ecosystem of user-written functions, with a couple of effects. The first effect is that MATLAB function names tend to be extremely cryptic; you end up with names like (and this is just from Mathworks toolboxes and not user contributions) ‘etfe,’ ’tf2ca,’ and ‘cumtrapz’. The second effect is that people pack as many completely different behaviors into the same function as they can think of, just to avoid having a second function/file to drag around. The function’s behavior changes arbitrarily based on how many arguments are given, how many output arguments are taken, the classes of the inputs, and so on; helpfully there is an InputParser class that is written with sufficient unnecessary generality to preserve the completely arbitrary and pattern-free manner in which functions parse their arguments. Very frequently people wind up building functions that behave differently for an input of size 1 than they would if you logically extrapolated the behavior on larger array inputs down to an array of size one. The lack of namespacing is thus a contributing factor to the innumerable problems with small numbers that MATLAB users have to deal with.
Up until very recently, there was simply no way for your code to specify that it wanted the Statistics toolbox’s ‘randsample’ instead of some imposter. Now lately, (which is to say, about 15 to 20 years too lately compared to competing languages,) TMW has introduced “packages” which try to break up the global function namespace. (Technically it’s not a global function namespace, it’s a global function search path, which is something that is more complicated without being more helpful.) We’ll see if packages help going forward into the future. They certainly don’t help with all the existing package-less code out there.
Oh, but there are more wrinkles! At some point TMW decided to switch from a case insensitive global function namespace to a case sensitive one. That’s nice, I suppose, in that one function can be called ‘randsample’ and the other can be called ‘RandSample’ and they are technically distinct. Still, if you shipped your code to someone who had one not the other, they wouldn’t get an obvious error like ‘no such function’ but a cryptic warning about case insensitivity being deprecated that they’re all to used to seeing and ignoring, followed by a failed program because the imposter randsample got called anyway.
Which brings me to today’s problem. When TMW introduced case sensitivity among function names, they didn’t even fix their own functions to reflect the change. Take, for instance, ‘help’ and ‘doc.’ The help for ‘help’ says (in R2009b),
HELP FUN displays a description of and syntax for the function FUN.
When FUN is in multiple directories on the MATLAB path, HELP displays
information about the first FUN found on the path.
Great! Let’s say I’m having a problem with some code that’s calling ‘randsample.’ Which is the first ‘randsample’ on the path?
K>> which randsample
/Applications/MATLAB_R2009b.app/toolbox/stats/randsample.m
Great! How do I use it?
help randsample
x=RandSample(list,[dims])
Returns a random sample from a list. The optional second argument may be
used to request an array (of size dims) of independent samples. E.g.
RandSample(-1:1,[10,10]) returns a 10x10 array of samples from the list
-1:1. RandSample is a quick way to generate samples (e.g. visual noise)
...
Wait, visual noise? is that really the help string for randsample?
K>> system(['head -10 ' which('randsample')])
function y = randsample(s, n, k, replace, w)
%RANDSAMPLE Random sample, with or without replacement.
% Y = RANDSAMPLE(N,K) returns Y as a vector of K values sampled
% uniformly at random, without replacement, from the integers 1:N.
Well, 'help' is showing me the wrong help! Maybe I'm too used to using 'help' but it's old and everyone uses 'doc' now. What's the documentation for Psychtoolbox's 'RandSample'?
K>> which RandSample
/Users/peter/eyetracking/library/osx/Psychtoolbox/PsychProbability/RandSample.m
K>> doc RandSample
At this point I am presented with a doc window about... the Statistics Toolbox 'randsample.' Which, again, is not what I asked for.
If The Mathworks can't correctly navigate the stupid function namespace they created for themselves, when they try to implement basic things used every five minutes, like 'help' and 'doc,' how can they expect their users to tolerate it?
Permalink
4 Comments