oh wow

May 15, 2012 at 12:54 am (crap data structures, dumb memory management, errors in error handling, thirty misfeature pileup)

You know how errors in destructors are transformed into warnings?

I was doing some comparisons various strategies for filling out arrays in MATLAB. I looked at the docs and noticed an implementation of a doubly linked list, and thought, why not include that and see how badly it performs?

Just pull the code for dlnode out of your own MATLAB help files:

 edit  ([docroot '/techdoc/matlab_oop/examples/@dlnode/dlnode.m']);

then copy and paste it right into a file named dlnode.m on your matlab path, and then try this:

tail = dlnode(0);
for i = 1:510
	new = dlnode(i);
	insertAfter(new, tail);
	tail = new;
end
clear tail;
clear new;

The output from this overflows the command window buffer so I won’t post the whole thing. Here’s an excerpt:

  In dlnode>dlnode.delete at 64
  In dlnode>dlnode.delete at 64
  In dlnode>dlnode.delete at 64
  In dlnode>dlnode.delete at 64
  In dlnode>dlnode.delete at 64
  In dlnode>dlnode.delete at 64
Warning: The following error was caught while executing 'dlnode' class
destructor:
Maximum recursion limit of 500 reached. Use set(0,'RecursionLimit',N)
to change the limit.  Be aware that exceeding your available stack space can
crash MATLAB and/or your computer. 
> In dlnode>dlnode.delete at 64
  In dlnode>dlnode.delete at 64
  In dlnode>dlnode.delete at 64
  In dlnode>dlnode.delete at 64
  In dlnode>dlnode.delete at 64
  In dlnode>dlnode.delete at 64
  In dlnode>dlnode.delete at 64

….Yeah, not only is the error you get transformed into a warning, but what an error it was! Linked chains of handle objects pile up destructors on the stack and overflow if too large of a group of handle objects goes out of scope at once. And a goddamn stack overflow gets exception-funneled down into a warning!

Your computer has multiple gigabytes of memory. You can not make a linked list more than 500 items long without breaking MATLAB. Oh, wow.

Well, you can increase the recursion limit, but not far before you get hard crashes.

So how badly does the Mathworks’ own example of a linked list perform, anyway? Well, building a linked list N items long turns out to be between O(N^2) and O(N^3). Wowsers. (For the data-structures-impaired among you, it ought to be O(N).) And it’s factors of ten slower than the next slowest way of building up a list of indeterminate length.

These all — the absurd slowness of object property access, the recursive piling up of destructors on the stack, the exception-swallowing — these are all knock-on effects of the Mathworks’ insane notion that they can do automatic memory management without a garbage collector and guarantee deterministic destruction and support arbitrarily connected reference graphs. Well, a system built under these constraints can do at least one of three things: either it will have accomplished what in the history of computer science has never before been accomplished, or its time and space complexity is going to blow up, or it won’t actually work as advertised.

They might as well have assumed P=NP while they were at it.

UPDATE: It gets worse! 500 was the limit in R2010a, but as of R2011B, there’s twice as much recursion on finalization, so you can only make a list 250 items long!

Permalink 1 Comment

Errors and crashes and namespace inadequacies, just a typical day.

May 11, 2012 at 5:41 pm (a namespace, errors in error handling, my kingdom for a namespace)

I mostly work on MATLAB R2010a because there is a particular library (which is the only reason I am forced to use MATLAB) that only builds on 32-bit, and R2010a was the last 32-bit release.

I recently encountered a couple of bugs in the MATLAB interpreter and wanted to check if they had been fixed in future versions.

Well, launching R2011B, I find a lot of stuff not working. Hell, I try to “edit” a file and it returns me:

>> edit crashme.m
/home/peterm/eyetracking/matlab-bugs/crashme.m
Error using edit (line 66)
Not enough input arguments.

Now, that’s really weird, because obviously I supplied an argument to “edit.” What the hey? I try editing some other files, it works okay. I try invoking edit like edit(‘crashme.m’) and it still fails. Not enough input arguments?

Usually as a programmer there’s a little mental hurdle you have to leap over to even begin to think maybe the problem is with the system and not with what I’m doing After all, I’m just sitting here banging at the keyboard, and presumably if I’ve selected reliable tools, the likelihood that I banged a wrong button ought to be higher than the likelihood that my tools are busted.

Years of experience has shown that when dealing with MATLAB that little mental hurdle does me no good. So here I go debugging MATLAB’s own code.

Now all I know is it failed in line 66 of “edit.m”, which is preceded by this many endifs (uh, the
|| operator, have you guys heard of it?)

    57	                                end
    58	                            end
    59	                        end
    60	                    end
    61	                end
    62	            end
    63	        end
    64	    end
    65	catch exception
    66	    throw(exception); % throw so that we don't display stack trace
    67	end

AAAAAAARGH. NO. Do. Not. Do. This.

As I said in previously the entire purpose of exceptions is to propagate out information about the manner of failure — not to disguise the manner of failure behind a lie. I don’t give a shit if it’s an internal function. Propagate out the actual information so I don’t have to break out the debugger on your busted code.

Well, now the only recourse is to reach for the debugger. This is risky: after all, the mere act of bringing up a file in the editor (as the MATLAB GUI does whenever a breakpoint is reached) might very well call “edit” somewhere along the line, so if I set a breakpoint in “edit”, it might go chasing up its own tailpipe.

So I save my work before proceeding.

Setting a breakpoint there on line 66, I find:

K>> edit crashme.m
/home/peterm/eyetracking/matlab-bugs/crashme.m
66      throw(exception); % throw so that we don't display stack trace
K>> getReport(exception)

ans =

Error using sprintf
Not enough input arguments.

Error in message (line 8)
string = sprintf(varargin{:});

Error in edit>openEditor (line 234)
            errMessage = message('MATLAB:Editor:EditorInstantiationFailure');

Error in edit>openWithFileSystem (line 458)
    openEditor(pathName);

Error in edit (line 51)
                        if ~openWithFileSystem(argName, ~isSimpleFile(argName))


K>> 

OH COOL. IT’S AN ERROR YOU HAD WHILE TRYING TO PRINT AN ERROR MESSAGE.

Ah, we see that the original error, which at first claimed nonsensically to be a case of “not enough arguments to edit” was really “not enough arguments to sprintf.” Interesting. And sprintf was called by… Ah! What we have here is yet another goddamned namespace problem. The call to message is actually reaching my function:

K>> which message
/home/peterm/eyetracking/code/graphics/message.m

Are you getting why you should not funnel exceptions yet? If the people at The Mathworks had simply not written a try/catch clause there, I would have seen the cause of the problem without doing anything like rolling up my sleeves.

It seems my “message.m” is shadowing some other “message.m.” Now, I’m reasonably careful. When I originally chose the function name “message”, I looked at the landscape of MATLAB’s stupid global search path (because when nothing inhabits anything like a namespace, you have to tread carefully) and found that “message” is only used by a couple of toolbox methods that should be safe as long as MATLAB’s method dispatch works how it’s supposed to (ha):

>> which -all message
/Applications/MATLAB_R2010a.app/toolbox/shared/spcuilib/@uiservices/message.m                 % uiservices method
/Applications/MATLAB_R2010a.app/toolbox/shared/filterdesignlib/@FilterDesignDialog/message.m  % FilterDesignDialog method

Of course I should have expected that Mathworks would leave no pronounceable string of letters available to users, and ensure that anyone’s code would mysteriously break if they had thought to name something “message.” In 2011b, I find,

K>> which -all message
/home/peterm/eyetracking/code/graphics/message.m
/usr/local/matlab11/toolbox/matlab/lang/message.m % Shadowed

Inspecting “message.m” reveals that it’s a newly added function and that it’s for internal use only. If you had previously written something called “message”…. tough luck. Nor warnings or nothing, just mysterious failures you have to debug.

Mathworks, what on earth are you doing putting more pollution in the global namespace if you supposedly implemented packages back in r2008a? Didn’t you declare back in 2009 that there would be a package you would be moving your internal shit into?

(It’s not been done because the namespace mechanism barely works, is my best guess.)

Anyway, after moving my “message” out of the way…. I can finally open the file in my editor. Beats me why “edit” was choking on giving me an error message in the first place, since it completed successfully. And yeah, the runtime bug I found is still there. Check this out if you want to crash your Matlab:

function crashme()
    crash1();

    function crash1()
        
        x = crash2();
        x();

        function x = crash2()
            x = evalin('caller', '@() eval(''1'');');
        end
    end
end

Permalink 4 Comments

Cleaning up after yourself: Don’t be a Skinner pigeon.

March 6, 2012 at 12:11 am (errors in error handling)

If you have worked in science, you have almost certainly seen when an experimental rig has a software problem. As an increasing portion of what experimental rigs do moves into software, much and much more of lab rigging involves software troubleshooting. To the extent that your lab rigging requires writing original software (which is, approximately, the extent to which your lab rig involves software, multiplied by the extent to which your work involves doing anything original at all), some of the software you will have to troubleshoot will be your own.

This isn’t such a bad position to be in: It’s often much easier to troubleshoot your own mistakes than mistakes made by other people. One of the reasons is you have a better idea of what you were trying to do. But there are yet things you can do to make it even easier on yourself.

Given that you will write programs, and given that you, like me, are imperfect, your programs and rigs will break sometimes. When they do, it is a great help to have some form of error handling in your programs. Now, the phrase “error handling” for a lot of people conjures up ideas about software that knows how to compensate when the disk crashes, or the network cable comes unplugged, or a transient gamma ray zaps a bit in your memory.

This might be true if you’re working in something like telecom, where you system has to soldier on in the face of machine failure, broken cables, and software crashes affecting other calls. Hardcore!

Erlang: The Movie

These kinds of systems are amazing. Scientific data collection systems are not, and don’t need to be.

I don’t worry about gamma rays or unplugged cables. Such environmental interruptions in the have too many and too unpredictable of causes to anticipate, and they cause far too few of the errors to worry about. So what causes the great majority of errors that might befall my program? I do. By the time I’m ready to use a program in my rig, I’ve generated at least hundreds if not thousands of errors, each pertaining to a previous version of the program that I made a mistake in. Probably a few more happen as I’m beginning data collection.

That puts things in a new light, doesn’t it? You don’t “handle” errors, not really. Error handling is really about shortening the debugging loop. Report the error to the programmer -> programmer attempts a fix -> Restart and try again. Those are the three parts. The shorter this loop, the faster you can get your system working.

And you know what a great benefit a short cycle from trial to feedback is. That’s what most old hands, you know, the academics who gave up on learning anything new right after they switched from Fortran or C to Matlab, that’s the first thing they question when faced with the notion that something else might be better. “Does it have an interactive prompt?” Interactive feedback! A read-eval-print loop! Well, jeez, talk about things everyone has these days — by which I mean, things Lisp had in 1958, and what everyone else had already copied all the way back when Cleve Moler decided to make an interactive shell over FORTRAN numerics libraries. You can’t even define functions at Matlab’s quasi-REPL! This is the depth of ignorance that keeps people using Matlab. But I digress.

The “error handling” needs for scientific data collection (at least in my field) are very straightforward. You don’t want a high-availability system; you want a high-troubleshootability system. If something goes wrong in your experiment, you need to be the first person to know about it. You certainly don’t want your experimental apparatus to continue on blithely doing the wrong thing and collecting the wrong data (if any at all). Error handling is a misnomer: you want your system to exhibit the complete refusal to continue in the face of errors, just report so you can fix then restart.

So, good news. Compared to high-availability systems that languages like Erlang were designed to support, it is a complete cakewalk to write software that complains at the slightest insult, folds up whenever there’s a stiff breeze, and turns itself off:

Which is not to say that I’ve often seen such a system. Especially not one written in Matlab. There is some complexity to the process of shutting down. Claude Shannon’s Ultimate Machine isn’t just a machine turning itself off. If it merely shut itself off, the lid would stay open, the hand would stay there, and it’d be a chore to reset the machine to its original state so that you could reboot it. Actually, most of the complexity in the Ultimate Machine is about maintaining power to the mechanism just long enough to retract the hand into the box, close the lid and be ready to be switched on again. That’s the “restart” part of the report->fix->restart cycle, you see. The less you have to do to reboot, the faster you can make your program work.

Which brings me to pigeons.

You’ve probably sat beside someone on a rig and watched their reaction when something goes wrong in the software. It’s a funny reaction, only made sad because I see it so often.

Shut everything down, power cycle, and reload everything.

Sometimes it reaches such absurd heights that people memorize, and replicate, over hundreds of trials, an exact sequence of powering-on equipment and launching-of-programs, in a stereotypical order, just because it was what got the rig to function one time.

It’s enough to recall B.F. Skinner’s pigeons — the ones who memorized a whole sequence of exaggerated movements, just because they happened to produce that movement sometime close to when they received a reward.

What drives this ridiculous behavior among experimenters is nothing more or less than software that doesn’t clean up after itself.

Maybe a program on machine A leaves a network connection open and so machine B ain’t listening when it tries to connect again. Or it leaves a filehandle open and it can’t re-open the file again until you quit the runtime and restart it. Maybe the embedded software on your hardware spike windower was written by the kind of C programmers who think longjmp is the same as exceptions, so they never released an internal resource.

And maybe superstitious boot sequences are justified sometimes. Maybe that’s the only way to deal with components that don’t report what’s wrong and don’t reset themselves to a state where they can start over. The only recourse to the experimenter is to reboot fucking everything.

That ain’t the way you should write your program, though. Oh don’t be an inhabitant of a Skinner box you built yourself. What a waste of time.

Shorten the reboot cycle.

We’ll explore how your program can clean up after itself, restoring to the ready-to-boot state; and how it’s particularly hard to do when you write it in Matlab.

Permalink Leave a Comment

Cleaning up after yourself, prologue.

June 22, 2011 at 3:32 pm (errors in error handling, thirty misfeature pileup)

Errors that happen during onCleanup are transformed into warnings? Really?

It doesn’t help that cleanup functions also can’t be closures — they can’t actually respond to data about a resource that was gathered during a program. But first things first.

Hey: if an error happens in my code, that is an error. My program should not continue unless it specifically handles that error. If an error happens while my program is trying to clean up after itself, that means something is wrong and MATLAB should not force my program to blithely continue and wreak further havoc.

I’ve restrained myself from picking too hard on The Mathworks’ decision to promise deterministic destruction for closures and objects, even though it has unacceptable performance penalties. The reason for the restraint is that I can see the argument for object lifecycle management: when your objects correspond to exclusive resources you hold, you do want to have control and guarantees over when they get released.

Well, I just looked into onCleanup and as usual, the Mathworks fucked it up: You cannot write robust programs using onCleanup, because exceptions during cleanup are swallowed.

So I’m going to have to start kicking at the thirty misfeature pileup where MATLAB’s memory management meets its error handling, after all.

There are a number of languages that offer both automatic memory management and exceptions. A few of them are Python, R, Java, and MATLAB. All of these except MATLAB deal with resource cleanup easily with a try/finally statement, which MATLAB lacks, and most also offer some extra sugar in the form of a try-with-resource, which MATLAB tries to do with deterministic destructors, and fails.

One of these things is not like the others:

Python try/finally
“If finally is present, it specifies a ‘cleanup’ handler…. If there is a saved exception, it is re-raised at the end of the finally clause. If the finally clause raises another exception or executes a return or break statement, the saved exception is lost.”
Python with

“That way, if the caller needs to tell whether the __exit__() invocation *failed* (as opposed to successfully cleaning up before
propagating the original error), it can do so.”
Java try/finally
“If a finally clause is executed because of abrupt completion of a try block and the finally clause itself completes abruptly, then the reason for the abrupt completion of the try block is discarded and the new reason for abrupt completion is propagated from there.”
Java try-with-resources
“If exceptions are thrown from both the try block and the try-with-resources statement, then the method readFirstLineFromFile throws the exception thrown from the try block; the exception thrown from the try-with-resources block is suppressed. In Java SE 7 and later, you can retrieve suppressed exceptions”
R tryCatch
“The finally expression is then evaluated in the context in which tryCatch was called; that is, the handlers supplied to the current tryCatch call are not active when the finally expression is evaluated.”
MATLAB delete
“A delete method should not generate errors”

One of these things is not like the others, and the one that fucked up is of course MATLAB.

Permalink 1 Comment

Follow

Get every new post delivered to your Inbox.