Rhetorical question.

July 9, 2011 at 4:10 pm (Uncategorized)

Is anyone providing a useful and innovative service like this for MATLAB?

Surprise me. All the MATLAB users I know are maintaining their own clusters and licensing at great hassle and cost.

Permalink Leave a Comment

Cleaning up after yourself, prologue.

June 22, 2011 at 3:32 pm (errors in error handling, thirty misfeature pileup)

Errors that happen during onCleanup are transformed into warnings? Really?

It doesn’t help that cleanup functions also can’t be closures — they can’t actually respond to data about a resource that was gathered during a program. But first things first.

Hey: if an error happens in my code, that is an error. My program should not continue unless it specifically handles that error. If an error happens while my program is trying to clean up after itself, that means something is wrong and MATLAB should not force my program to blithely continue and wreak further havoc.

I’ve restrained myself from picking too hard on The Mathworks’ decision to promise deterministic destruction for closures and objects, even though it has unacceptable performance penalties. The reason for the restraint is that I can see the argument for object lifecycle management: when your objects correspond to exclusive resources you hold, you do want to have control and guarantees over when they get released.

Well, I just looked into onCleanup and as usual, the Mathworks fucked it up: You cannot write robust programs using onCleanup, because exceptions during cleanup are swallowed.

So I’m going to have to start kicking at the thirty misfeature pileup where MATLAB’s memory management meets its error handling, after all.

There are a number of languages that offer both automatic memory management and exceptions. A few of them are Python, R, Java, and MATLAB. All of these except MATLAB deal with resource cleanup easily with a try/finally statement, which MATLAB lacks, and most also offer some extra sugar in the form of a try-with-resource, which MATLAB tries to do with deterministic destructors, and fails.

One of these things is not like the others:

Python try/finally
“If finally is present, it specifies a ‘cleanup’ handler…. If there is a saved exception, it is re-raised at the end of the finally clause. If the finally clause raises another exception or executes a return or break statement, the saved exception is lost.”
Python with

“That way, if the caller needs to tell whether the __exit__() invocation *failed* (as opposed to successfully cleaning up before
propagating the original error), it can do so.”
Java try/finally
“If a finally clause is executed because of abrupt completion of a try block and the finally clause itself completes abruptly, then the reason for the abrupt completion of the try block is discarded and the new reason for abrupt completion is propagated from there.”
Java try-with-resources
“If exceptions are thrown from both the try block and the try-with-resources statement, then the method readFirstLineFromFile throws the exception thrown from the try block; the exception thrown from the try-with-resources block is suppressed. In Java SE 7 and later, you can retrieve suppressed exceptions”
R tryCatch
“The finally expression is then evaluated in the context in which tryCatch was called; that is, the handlers supplied to the current tryCatch call are not active when the finally expression is evaluated.”
MATLAB delete
“A delete method should not generate errors”

One of these things is not like the others, and the one that fucked up is of course MATLAB.

Permalink Leave a Comment

Profiling

June 20, 2011 at 10:20 pm (doing it wrong, matlab is bad at math)

Occasionally I try to work around problems with MATLAB’s shitty slow performance by using its “profiler.”

A profiler is simple in concept: It records how much time a program spends in various functions, so you can tell which ones are slowing you down. There are a couple of different ways to implement a profiler: You could arrange to interrupt the language runtime every millisecond or so and see what functions you’re in the middle of executing. Alternately, you could arrange to record the time every function call begins or ends. Both these techniques involve either instrumenting the runtime, or concurrency and runtime introspection — that’s the kind of runtime stuff that MATLAB doesn’t provide users access to, so instead of being able to actually write a profiler, we have to rely on what profiling ability TMW’s programmers have already built into the runtime.

But fundamentally, that’s the hard part. After you’ve instrumented your runtime and are recording data while the program runs, everything after that is just tabulating and adding up numbers.

Which leads me to ask, how the fuck is MATLAB fucking up at the “adding up numbers” part so badly?

No, I do not have 36 CPUs in my machine, and mainLoop>go does not recurse on itself at all, either.

what1.png

More after the jump… Read the rest of this entry »

Permalink Leave a Comment

Let’s talk colormaps.

May 7, 2011 at 2:04 pm (powerfully stupid graphics)

Picture 17.png

T. Lennert and J. Martinez-Trujillo. Strength of response suppression to distracter stimuli determines
attentional-filtering performance in primate prefrontal neurons. Neuron, 70(1):141–52, Apr 2011. [pubmed]

[for less ranty, try here.]

Dear everyone who uses colormaps: The ‘jet’ colormap, the default colormap in MATLAB, is a perceptual disaster. STOP USING THAT SHIT. I mean, just what is happening in the lower right sector of these plots?

Okay, more background. The scale is a raster of a modulation index (area under ROC, to be particular) for a group of neurons, plotted over the time from stimulus onset. The big salient feature of each plot is a yellow band. You see the yellow band because each raster plot sorts its neurons according to the “latency,” that is, the earliest time that the modulation passed some arbitrary threshold. Because that arbitrary threshold coincides with yellow, the highest-luminance value on the colormap, effectively the authors have chosen a computational and graphical procedure that says “Hey, sort my data that it makes a nice yellow stripe down the middle, NO MATTER WHAT THE DATA ACTUALLY CONTAINS.”

And what is happening to the right of that stripe? Well, because your spatial resolution for chroma differences (and especially for blue) is much worse than that for luminance differences (which is why sane colormaps, that are not “jet”, always have a monotonic luminance component), you have to get your nose right up to the screen to decide that under the stripe is a pretty good mixup of blue and red and yellow — all over the scale. In other words, some of these cells are well modulated past where they pass the arbitrary threshold, but a lot of cells stop being modulated at all after they dinged the threshold. Which is conveniently difficult to discern due to the colormap — and kind of raises the question of how reasonable that threshold setting is, or the cell inclusion criteria are.

We’ll just note in passing that the sorting is done separately for EACH subplot, so that row-by-row comparisons of the cells can’t be done, and the “neuron number” scale on the left is pretty much meaningless. Which also, by the way, fully undermines the claim that modulation for larger ordinal differences(*) happens faster.

I hope you noticed after you got close up to the screen, that the dark red and dark blue values are a lot harder to distinguish than, say, the yellow-to-cyan colors that makes up the middle of the scale. You know, we should be easily able to see when things are at opposite ends of the scale, right? I mean, if we’re plotting our data, I mean.

What kind of insane colormap has the property that values spanning the extreme ends of the scale stand out less, and can’t be distinguished as easily as values in the noisy middle? Why, it’s MATLAB’s default colormap, of course!

You know, ggplot uses a much better preset for its color maps, as well as better alternatives. Just sayin’.

(*) You want to know the ironic punchline? This catastrophe is figure 4 of a paper ABOUT NEURAL PROCESSING OF ORDINAL COLOR SCALES. Seriously!

I’m not sure whether this belongs on the ranting-about-neuroscience blog, or the ranting-about-MATLAB blog. so it goes on both places.

Permalink 5 Comments

Graphics: you’re doing it wrong.

November 1, 2010 at 6:19 pm (doing it wrong, lying documentation, powerfully stupid graphics)

GETFRAME returns a movie frame. The frame is a snapshot
    of the current axis.

No it bloody well isn’t. GETFRAME (as least on OS X) captures a snapshot of the portion of the display screen that might or might or might not have the current axis on top. If you have a long-running script to produce an animation, you might be tempted to switch over to read email or a PDF while your animation renders. In which case, when all is finished, you’ll find yourself a nice AVI file full of your email and none of your calculations. You want to render animations using GETFRAME, better have a single task computer to dedicate to it, and make sure to turn off the screen saver. What is this, 1992?

Permalink 4 Comments

Every time…

September 26, 2010 at 9:12 pm (powerfully stupid graphics)

Every time I open up MATLAB and try to do something with it, it rewards me with another anachronistic stupidity.

Turns out it is seriously NOT POSSIBLE as of R2009b to have a single figure, containing plots of two images, where each image uses a different colormap.

In the world where you haven’t had to worry about outputting to indexed-color graphics buffers for a decade or so.

Permalink Leave a Comment

Matlab doesn’t know how to draw one ball out of an urn containing one ball.

June 2, 2010 at 12:33 pm (matlab is bad at math, trouble with small numbers)

You are stimulating a discrete Markov process. You have a left-stochastic(*) matrix X where X(j,i) gives the probability of transitioning from state i to the state j. There are a lot of states, and most of the state transition probabilities are zero, so to fit your system into memory, you have built X as a sparse matrix.

You have the index of the current state, i; you want to simulate the next step and randomly draw a value for the index the next state j.

Easy enough to begin with, we find the indices and probabilities of the potential next states like this:

nextStates = find(X(:,i));
weights = X(nextStates,i);

So now we need to generate a random draw from a discrete distribution, where we know the probability of each value of the distribution. Isn’t there a function that does that? Oh yes, randsample. After looking through the help for randsample (a task that is more difficult than it sounds) we might write:

j = randsample(nextStates, 1, 1, full(weights));

Now a really easy question: How and why does this break? Answer below the fold. (hint: look at the category of the post)

(*) while almost all the mathematical literature uses right-stochastic matrices, in MATLAB the sparse matrices are mush slower if you use them that way.

Read the rest of this entry »

Permalink 3 Comments

The Mathworks don’t even know how to look up functions in their own global namespace.

May 30, 2010 at 12:59 am (lying documentation, my kingdom for a namespace, thirty misfeature pileup, unexpressive language)

MATLAB went decades without having any mechanism resembling function namespaces. If you downloaded code from two authors for use in a single project, and both toolboxes defined a function of the same name, well, you were in for a headache.

Case in point: the Psychtoolbox provides a function called ‘RandSample.’ The statistics toolbox, on the other hand, provides its own, somewhat different function named ‘randsample.’ If you wrote some code using the Statistics toolbox “randsample”, and the person trying to use is has Psychtoolbox installed, or vice versa, you were in for problems. The code fails inside of whatever ‘randsample’ you happen to have installed; if you are reasonably quick at deciding the problem is not with randSample itself, you jump up the stack trace a step and look at how randsample was called. (Which reminds me how MATLAB’s default behavior on an error is to print out the stack starting with the root…. and if the stack is too deep, it chops off the top of the stack. Whereas everyone who’s remotely sane needs to see the stack starting at the top and working downwards if there is to be any hope at debugging.)

Anyway, you look at this code you downloaded that calls ‘randsample,’ and you try to work out what it’s doing. You compare the code’s usage with the documentation which you access, perhaps by typing ‘help randsample.’ (Oh, but as we will discover below, there’s a delicious way MATLAB will screw you over if you try to read the doc for the function that is now failing.) Presuming you are reading the right documentation string, you try to work out what the calling code is trying to do. Only belatedly do you check ‘which randsample’ and discover that there are two of them (or not — in case the functions being confused are capitalized differently, and you are trying to use bad old code written before case sensitivity – ‘which’ has no option to ignore case, which is only the tip of the iceberg…) You realize, finally, that the code you’re trying to run is calling the wrong ‘randsample.’  And then you are presented with a dilemma: do you reorder your path to use the correct function? But what if there is other code using the one that is already on top of the path? Choose which is better, or more popular, rename the other one to ‘randsample_bad,’ and go on a global grep-and-replace to find out which of your library functions are using the worse one, and change the name in all the places they use it? If you take that path, the next time you upgrade a new version of something you changed, your edit is going to disappear. Maybe you start maintaining patches against the upstream versions of the code you have to change, just because you are unfortunate enough to want to use more than one person’s toolbox. If the original author even knows what a version control system is so that it can provide an “upstream version.” But there’s no hope of that: you didn’t get the code from a version control repository, you got it from the Mathworks File Exchange.

So if you have any large number of MATLAB functions, you have a headache around ensuring that no two functions can have the same name. The Mathworks File Exchange even includes a tool that inspects your uploaded code to see if the names you’ve given your functions are unique enough — you get downgraded for ‘collisions.’ (It was too hard to fix the language to make it more usable and to make code sharing easier, you see, so they fixed the website to make it less usable and to code sharing harder.) This exerts a selective pressure on the ecosystem of user-written functions, with a couple of effects. The first effect is that MATLAB function names tend to be extremely cryptic; you end up with names like (and this is just from Mathworks toolboxes and not user contributions) ‘etfe,’  ’tf2ca,’ and ‘cumtrapz’. The second effect is that people pack as many completely different behaviors into the same function as they can think of, just to avoid having a second function/file to drag around. The function’s behavior changes arbitrarily based on how many arguments are given, how many output arguments are taken, the classes of the inputs, and so on; helpfully there is an InputParser class that is written with sufficient unnecessary generality to preserve the completely arbitrary and pattern-free manner in which functions parse their arguments. Very frequently people wind up building functions that behave differently for an input of size 1 than they would if you logically extrapolated the behavior on larger array inputs down to an array of size one. The lack of namespacing is thus a contributing factor to the innumerable problems with small numbers that MATLAB users have to deal with.

Up until very recently, there was simply no way for your code to specify that it wanted the Statistics toolbox’s ‘randsample’ instead of some imposter. Now lately, (which is to say, about 15 to 20 years too lately compared to competing languages,) TMW has introduced “packages” which try to break up the global function namespace. (Technically it’s not a global function namespace, it’s a global function search path, which is something that is more complicated without being more helpful.) We’ll see if packages help going forward into the future. They certainly don’t help with all the existing package-less code out there.

Oh, but there are more wrinkles! At some point TMW decided to switch from a case insensitive global function namespace to a case sensitive one. That’s nice, I suppose, in that one function can be called ‘randsample’ and the other can be called ‘RandSample’ and they are technically distinct. Still, if you shipped your code to someone who had one not the other, they wouldn’t get an obvious error like ‘no such function’ but a cryptic warning about case insensitivity being deprecated that they’re all to used to seeing and ignoring, followed by a failed program because the imposter randsample got called anyway.

Which brings me to today’s problem. When TMW introduced case sensitivity among function names, they didn’t even fix their own functions to reflect the change. Take, for instance, ‘help’ and ‘doc.’ The help for ‘help’ says (in R2009b),

HELP FUN displays a description of and syntax for the function FUN.

When FUN is in multiple directories on the MATLAB path, HELP displays
information about the first FUN found on the path.

Great! Let’s say I’m having a problem with some code that’s calling ‘randsample.’ Which is the first ‘randsample’ on the path?

K>> which randsample
/Applications/MATLAB_R2009b.app/toolbox/stats/randsample.m

Great! How do I use it?

help randsample
  x=RandSample(list,[dims])

  Returns a random sample from a list. The optional second argument may be
  used to request an array (of size dims) of independent samples. E.g.
RandSample(-1:1,[10,10]) returns a 10x10 array of samples from the list
  -1:1.  RandSample is a quick way to generate samples (e.g. visual noise)
  ...

Wait, visual noise? is that really the help string for randsample?

K>> system(['head -10 ' which('randsample')])
function y = randsample(s, n, k, replace, w)
%RANDSAMPLE Random sample, with or without replacement.
%   Y = RANDSAMPLE(N,K) returns Y as a vector of K values sampled
%   uniformly at random, without replacement, from the integers 1:N.

Well, ‘help’ is showing me the wrong help! Maybe I’m too used to using ‘help’ but it’s old and everyone uses ‘doc’ now. What’s the documentation for Psychtoolbox’s ‘RandSample’?

K>> which RandSample
/Users/peter/eyetracking/library/osx/Psychtoolbox/PsychProbability/RandSample.m
K>> doc RandSample

At this point I am presented with a doc window about… the Statistics Toolbox ‘randsample.’ Which, again, is not what I asked for.

If The Mathworks can’t correctly navigate the stupid function namespace they created for themselves, when they try to implement basic things used every five minutes, like ‘help’ and ‘doc,’ how can they expect their users to tolerate it?

Permalink 3 Comments

Another day, another way it doesn’t do what it say

May 29, 2010 at 3:14 am (lying documentation, matlab is bad at math, Simple trivia about its fundamental behavior that you probably can't answer)

The documentation for sparse claims,

S = sparse(i,j,s,m,n,nzmax) uses vectors i, j, and s to generate an m-by-n sparse matrix such that S(i(k),j(k)) = s(k).

This claim is, plainly, false. That is to say, there are easy to find values of i, j and s where

sparse(i,j,s,m,n,nzmax)

produces numerically wildly different results from

X = zeros(m, n)
X(sub2ind([m n], i(:), j(:))) = s(:);

and, therefore,

S = sparse(i,j,s,m,n,nzmax)
if all( S( sub2ind(size(S), i, j) == s )
print("documentation passes!")
else
print("documentation is lying!")
end

produces the expected result (i.e. the result you expect from its being written on this blog.)

Can you figure out under what condition sparse behaves differently from the promises the documentation makes?

Hint: I’m populating a stochastic matrix that encodes dynamics over a discretized state space. That space has boundary conditions, so that while in the middle of the state space you might diffuse in N dimensions to your 2^N nearest grid points, at the edges of the state space you are stuck at the wall and have to reflect onto fewer points.

For extra credit, comment on the ease of converting code using non-sparse matrices to code using sparse matrices, given that it won’t even be numerically the same to begin with.

Permalink Leave a Comment

Multiple output arguments

May 28, 2010 at 2:57 am (crap data structures, Uncategorized, unexpressive language)

There are many reasons to be told why multiple output arguments in Matlab were a poor design decision. But here is a start. In an unsuccessful attempt to find a code that’s less than O(N*M) in storage for a problem that in any implementation in a real language is O(N) in storage, (*) I saw that someone wants to know,

Is there a similar way to get position information out of a matrix.

I tried something like,
[row,col]=min(min(matrix))
but that returns only the final row that the element is in, not its exact location.

This poor fellow wants to find the indices of a global minimum of a matrix. Now, min by itself finds the column-wise minimum of a matrix, and in a second output returns the row indices. Unless it’s a matrix with a size of 1 in the first dimension, in which case it mysteriously changes behavior and returns the row-wise minimum of a matrix. If the size of the matrix is 1 in both the first and second dimensions, it changes behavior again and gives you the minimum along the third dimension. No external indication is given of which dimension of minimum min happens to be pursuing… but I digress. The questioner just wants a 2-d min!

But here’s the thing: while composing min twice will give the global minimum of a 2-D array, there is no way you can compose two functions in one expression that makes use of the second output. So there is a failure to generalize: [m,i] = min(x) finds a minimum in one dimension, and tells you where the minimum is. The generalization m = min(min(x)) finds a minimum across two dimensions, but there is no way to use this form to find where the minimum is.

But here’s thing: there’s no simple expression that does it. Best I could come up with is:

i = ind2sub(size(x), second_output_of(@min, x(:)))

That will work, leaving second_output_of to be implemented by anyone who perceives a gaping lack in the language at this point — you see, while composing functions with multiple outputs is an ability virtually mandated by the presence of multiple outputs and the mathematical nature of the problems you might want to solve in MATLAB, the ability to use the second output in an expression is just plain missing.

Well, here’s an even more arcane version that uses only built in functions:

i = ind2sub(size(x), find(x == min(x(:)))

This doubles both the time and space requirements as well as bypassing the second output of min() altogether! What a lesson — if you want a simple matlab expression to find the location of a min in 2-d, you can’t use the existing function (the second output of min) that finds the location of a min in 1-D! Instead, you have to hunt down find, ind2sub, and the incantation (:) and live with code that’s twice as slow. There is no simple expression without these incantations that does it; if you want the simple compositionality of applying min twice you are limited to using multiple statements with intermediary temporary variables:

[m,r] = min(x);

[m,c] = min(m);

i = [r(c) c];

While functions with single outputs have the full compositionality of algebra available to them (well, unless you want do something simple like subscripting the output of a function, ahem), second and subsequent outputs are not composable unless you first assign their outputs to intermediary variables. This is unacceptable. A requirement of intermediary variables is evil. There is a longer post planned on this, but you may search for clues why, here and here, before I go live with that.

This is by no means the only problem with multiple outputs in MATLAB. More later.

(*) the answer to my original problem was something in the extra add-on ($$$) Signal Processing Toolbox. To sort points along grid lines. Well, I’m not the one paying for my license, except via externalities and frustration.

Permalink 1 Comment

Next page »

Follow

Get every new post delivered to your Inbox.