Some folks at Mathworks read this blog. I know because I get referrals from Mathworks internal wikis and bug trackers.
I also know because I’ve seen a few documentation changes. For example, you guys have updated the documentation for “sparse” to reflect that it adds together overlapping indices rather than overwriting them like normal arrays; you updated the docs on “randsample” to reflect that it draws random samples from arrays only if they are at least 2 elements long; you even updated the docs for “getframe” to clarify that you need to turn off the fucking screen saver and walk away from the computer like it’s 1992.
Ahem. You guys are missing the point. Let me repeat from before.
It’s not that MATLAB’s behavior isn’t documented; it’s that the behavior is stupid, and leads to errors when your users (reasonably) assume that behavior would be consistent over array dimensions, or that behavior would be consistent between different but related functions, or that behavior would be consistent within a single function, or that things would just not be busted in general.
I liked the previous, unmodified documentation. It reflected the intentions of your programmers; clearly they were trying to implement the reasonable and useful things they described. It’s too bad you don’t have the wherewithal to finish the job you clearly meant to do and fix the stupid behavior.
GETFRAME returns a movie frame. The frame is a snapshot of the current axis.
No it bloody well isn’t.
GETFRAME (as least on OS X) captures a snapshot of the portion of the display screen that might or might or might not have the current axis on top. If you have a long-running script to produce an animation, you might be tempted to switch over to read email or a PDF while your animation renders. In which case, when all is finished, you’ll find yourself a nice AVI file full of your email and none of your calculations. You want to render animations using
GETFRAME, better have a single task computer to dedicate to it, and make sure to turn off the screen saver. What is this, 1992?
MATLAB went decades without having any mechanism resembling function namespaces. If you downloaded code from two authors for use in a single project, and both toolboxes defined a function of the same name, well, you were in for a headache.
Case in point: the Psychtoolbox provides a function called ‘RandSample.’ The statistics toolbox, on the other hand, provides its own, somewhat different function named ‘randsample.’ If you wrote some code using the Statistics toolbox “randsample”, and the person trying to use is has Psychtoolbox installed, or vice versa, you were in for problems. The code fails inside of whatever ‘randsample’ you happen to have installed; if you are reasonably quick at deciding the problem is not with randSample itself, you jump up the stack trace a step and look at how randsample was called. (Which reminds me how MATLAB’s default behavior on an error is to print out the stack starting with the root…. and if the stack is too deep, it chops off the top of the stack. Whereas everyone who’s remotely sane needs to see the stack starting at the top and working downwards if there is to be any hope at debugging.)
Anyway, you look at this code you downloaded that calls ‘randsample,’ and you try to work out what it’s doing. You compare the code’s usage with the documentation which you access, perhaps by typing ‘help randsample.’ (Oh, but as we will discover below, there’s a delicious way MATLAB will screw you over if you try to read the doc for the function that is now failing.) Presuming you are reading the right documentation string, you try to work out what the calling code is trying to do. Only belatedly do you check ‘which randsample’ and discover that there are two of them (or not — in case the functions being confused are capitalized differently, and you are trying to use bad old code written before case sensitivity – ‘which’ has no option to ignore case, which is only the tip of the iceberg…) You realize, finally, that the code you’re trying to run is calling the wrong ‘randsample.’ And then you are presented with a dilemma: do you reorder your path to use the correct function? But what if there is other code using the one that is already on top of the path? Choose which is better, or more popular, rename the other one to ‘randsample_bad,’ and go on a global grep-and-replace to find out which of your library functions are using the worse one, and change the name in all the places they use it? If you take that path, the next time you upgrade a new version of something you changed, your edit is going to disappear. Maybe you start maintaining patches against the upstream versions of the code you have to change, just because you are unfortunate enough to want to use more than one person’s toolbox. If the original author even knows what a version control system is so that it can provide an “upstream version.” But there’s no hope of that: you didn’t get the code from a version control repository, you got it from the Mathworks File Exchange.
So if you have any large number of MATLAB functions, you have a headache around ensuring that no two functions can have the same name. The Mathworks File Exchange even includes a tool that inspects your uploaded code to see if the names you’ve given your functions are unique enough — you get downgraded for ‘collisions.’ (It was too hard to fix the language to make it more usable and to make code sharing easier, you see, so they fixed the website to make it less usable and to code sharing harder.) This exerts a selective pressure on the ecosystem of user-written functions, with a couple of effects. The first effect is that MATLAB function names tend to be extremely cryptic; you end up with names like (and this is just from Mathworks toolboxes and not user contributions) ‘etfe,’ ’tf2ca,’ and ‘cumtrapz’. The second effect is that people pack as many completely different behaviors into the same function as they can think of, just to avoid having a second function/file to drag around. The function’s behavior changes arbitrarily based on how many arguments are given, how many output arguments are taken, the classes of the inputs, and so on; helpfully there is an InputParser class that is written with sufficient unnecessary generality to preserve the completely arbitrary and pattern-free manner in which functions parse their arguments. Very frequently people wind up building functions that behave differently for an input of size 1 than they would if you logically extrapolated the behavior on larger array inputs down to an array of size one. The lack of namespacing is thus a contributing factor to the innumerable problems with small numbers that MATLAB users have to deal with.
Up until very recently, there was simply no way for your code to specify that it wanted the Statistics toolbox’s ‘randsample’ instead of some imposter. Now lately, (which is to say, about 15 to 20 years too lately compared to competing languages,) TMW has introduced “packages” which try to break up the global function namespace. (Technically it’s not a global function namespace, it’s a global function search path, which is something that is more complicated without being more helpful.) We’ll see if packages help going forward into the future. They certainly don’t help with all the existing package-less code out there.
Oh, but there are more wrinkles! At some point TMW decided to switch from a case insensitive global function namespace to a case sensitive one. That’s nice, I suppose, in that one function can be called ‘randsample’ and the other can be called ‘RandSample’ and they are technically distinct. Still, if you shipped your code to someone who had one not the other, they wouldn’t get an obvious error like ‘no such function’ but a cryptic warning about case insensitivity being deprecated that they’re all to used to seeing and ignoring, followed by a failed program because the imposter randsample got called anyway.
Which brings me to today’s problem. When TMW introduced case sensitivity among function names, they didn’t even fix their own functions to reflect the change. Take, for instance, ‘help’ and ‘doc.’ The help for ‘help’ says (in R2009b),
HELP FUN displays a description of and syntax for the function FUN. When FUN is in multiple directories on the MATLAB path, HELP displays information about the first FUN found on the path.
Great! Let’s say I’m having a problem with some code that’s calling ‘randsample.’ Which is the first ‘randsample’ on the path?
K>> which randsample /Applications/MATLAB_R2009b.app/toolbox/stats/randsample.m
Great! How do I use it?
help randsample x=RandSample(list,[dims]) Returns a random sample from a list. The optional second argument may be used to request an array (of size dims) of independent samples. E.g. RandSample(-1:1,[10,10]) returns a 10x10 array of samples from the list -1:1. RandSample is a quick way to generate samples (e.g. visual noise) ...
Wait, visual noise? is that really the help string for randsample?
K>> system(['head -10 ' which('randsample')]) function y = randsample(s, n, k, replace, w) %RANDSAMPLE Random sample, with or without replacement. % Y = RANDSAMPLE(N,K) returns Y as a vector of K values sampled % uniformly at random, without replacement, from the integers 1:N.
Well, 'help' is showing me the wrong help! Maybe I'm too used to using 'help' but it's old and everyone uses 'doc' now. What's the documentation for Psychtoolbox's 'RandSample'?
K>> which RandSample /Users/peter/eyetracking/library/osx/Psychtoolbox/PsychProbability/RandSample.m K>> doc RandSample
At this point I am presented with a doc window about... the Statistics Toolbox 'randsample.' Which, again, is not what I asked for.
If The Mathworks can't correctly navigate the stupid function namespace they created for themselves, when they try to implement basic things used every five minutes, like 'help' and 'doc,' how can they expect their users to tolerate it?
May 29, 2010 at 3:14 am (lying documentation, matlab is bad at math, Simple trivia about its fundamental behavior that you probably can't answer)
The documentation for
S = sparse(i,j,s,m,n,nzmax) uses vectors i, j, and s to generate an m-by-n sparse matrix such that S(i(k),j(k)) = s(k).
This claim is, plainly, false. That is to say, there are easy to find values of i, j and s where
produces numerically wildly different results from
X = zeros(m, n)
X(sub2ind([m n], i(:), j(:))) = s(:);
S = sparse(i,j,s,m,n,nzmax)
if all( S( sub2ind(size(S), i, j) == s )
print("documentation is lying!")
produces the expected result (i.e. the result you expect from its being written on this blog.)
Can you figure out under what condition
sparse behaves differently from the promises the documentation makes?
Hint: I’m populating a stochastic matrix that encodes dynamics over a discretized state space. That space has boundary conditions, so that while in the middle of the state space you might diffuse in N dimensions to your 2^N nearest grid points, at the edges of the state space you are stuck at the wall and have to reflect onto fewer points.
For extra credit, comment on the ease of converting code using non-sparse matrices to code using sparse matrices, given that it won’t even be numerically the same to begin with.