The Mathworks don’t even know how to look up functions in their own global namespace.

May 30, 2010 at 12:59 am (lying documentation, my kingdom for a namespace, thirty misfeature pileup, unexpressive language)

MATLAB went decades without having any mechanism resembling function namespaces. If you downloaded code from two authors for use in a single project, and both toolboxes defined a function of the same name, well, you were in for a headache.

Case in point: the Psychtoolbox provides a function called ‘RandSample.’ The statistics toolbox, on the other hand, provides its own, somewhat different function named ‘randsample.’ If you wrote some code using the Statistics toolbox “randsample”, and the person trying to use is has Psychtoolbox installed, or vice versa, you were in for problems. The code fails inside of whatever ‘randsample’ you happen to have installed; if you are reasonably quick at deciding the problem is not with randSample itself, you jump up the stack trace a step and look at how randsample was called. (Which reminds me how MATLAB’s default behavior on an error is to print out the stack starting with the root…. and if the stack is too deep, it chops off the top of the stack. Whereas everyone who’s remotely sane needs to see the stack starting at the top and working downwards if there is to be any hope at debugging.)

Anyway, you look at this code you downloaded that calls ‘randsample,’ and you try to work out what it’s doing. You compare the code’s usage with the documentation which you access, perhaps by typing ‘help randsample.’ (Oh, but as we will discover below, there’s a delicious way MATLAB will screw you over if you try to read the doc for the function that is now failing.) Presuming you are reading the right documentation string, you try to work out what the calling code is trying to do. Only belatedly do you check ‘which randsample’ and discover that there are two of them (or not — in case the functions being confused are capitalized differently, and you are trying to use bad old code written before case sensitivity — ‘which’ has no option to ignore case, which is only the tip of the iceberg…) You realize, finally, that the code you’re trying to run is calling the wrong ‘randsample.’  And then you are presented with a dilemma: do you reorder your path to use the correct function? But what if there is other code using the one that is already on top of the path? Choose which is better, or more popular, rename the other one to ‘randsample_bad,’ and go on a global grep-and-replace to find out which of your library functions are using the worse one, and change the name in all the places they use it? If you take that path, the next time you upgrade a new version of something you changed, your edit is going to disappear. Maybe you start maintaining patches against the upstream versions of the code you have to change, just because you are unfortunate enough to want to use more than one person’s toolbox. If the original author even knows what a version control system is so that it can provide an “upstream version.” But there’s no hope of that: you didn’t get the code from a version control repository, you got it from the Mathworks File Exchange.

So if you have any large number of MATLAB functions, you have a headache around ensuring that no two functions can have the same name. The Mathworks File Exchange even includes a tool that inspects your uploaded code to see if the names you’ve given your functions are unique enough — you get downgraded for ‘collisions.’ (It was too hard to fix the language to make it more usable and to make code sharing easier, you see, so they fixed the website to make it less usable and to code sharing harder.) This exerts a selective pressure on the ecosystem of user-written functions, with a couple of effects. The first effect is that MATLAB function names tend to be extremely cryptic; you end up with names like (and this is just from Mathworks toolboxes and not user contributions) ‘etfe,’  ‘tf2ca,’ and ‘cumtrapz’. The second effect is that people pack as many completely different behaviors into the same function as they can think of, just to avoid having a second function/file to drag around. The function’s behavior changes arbitrarily based on how many arguments are given, how many output arguments are taken, the classes of the inputs, and so on; helpfully there is an InputParser class that is written with sufficient unnecessary generality to preserve the completely arbitrary and pattern-free manner in which functions parse their arguments. Very frequently people wind up building functions that behave differently for an input of size 1 than they would if you logically extrapolated the behavior on larger array inputs down to an array of size one. The lack of namespacing is thus a contributing factor to the innumerable problems with small numbers that MATLAB users have to deal with.

Up until very recently, there was simply no way for your code to specify that it wanted the Statistics toolbox’s ‘randsample’ instead of some imposter. Now lately, (which is to say, about 15 to 20 years too lately compared to competing languages,) TMW has introduced “packages” which try to break up the global function namespace. (Technically it’s not a global function namespace, it’s a global function search path, which is something that is more complicated without being more helpful.) We’ll see if packages help going forward into the future. They certainly don’t help with all the existing package-less code out there.

Oh, but there are more wrinkles! At some point TMW decided to switch from a case insensitive global function namespace to a case sensitive one. That’s nice, I suppose, in that one function can be called ‘randsample’ and the other can be called ‘RandSample’ and they are technically distinct. Still, if you shipped your code to someone who had one not the other, they wouldn’t get an obvious error like ‘no such function’ but a cryptic warning about case insensitivity being deprecated that they’re all to used to seeing and ignoring, followed by a failed program because the imposter randsample got called anyway.

Which brings me to today’s problem. When TMW introduced case sensitivity among function names, they didn’t even fix their own functions to reflect the change. Take, for instance, ‘help’ and ‘doc.’ The help for ‘help’ says (in R2009b),

HELP FUN displays a description of and syntax for the function FUN.

When FUN is in multiple directories on the MATLAB path, HELP displays
information about the first FUN found on the path.

Great! Let’s say I’m having a problem with some code that’s calling ‘randsample.’ Which is the first ‘randsample’ on the path?

K>> which randsample
/Applications/MATLAB_R2009b.app/toolbox/stats/randsample.m

Great! How do I use it?

help randsample
  x=RandSample(list,[dims])

  Returns a random sample from a list. The optional second argument may be
  used to request an array (of size dims) of independent samples. E.g.
RandSample(-1:1,[10,10]) returns a 10x10 array of samples from the list
  -1:1.  RandSample is a quick way to generate samples (e.g. visual noise)
  ...

Wait, visual noise? is that really the help string for randsample?

K>> system(['head -10 ' which('randsample')])
function y = randsample(s, n, k, replace, w)
%RANDSAMPLE Random sample, with or without replacement.
%   Y = RANDSAMPLE(N,K) returns Y as a vector of K values sampled
%   uniformly at random, without replacement, from the integers 1:N.

Well, 'help' is showing me the wrong help! Maybe I'm too used to using 'help' but it's old and everyone uses 'doc' now. What's the documentation for Psychtoolbox's 'RandSample'?

K>> which RandSample
/Users/peter/eyetracking/library/osx/Psychtoolbox/PsychProbability/RandSample.m
K>> doc RandSample

At this point I am presented with a doc window about... the Statistics Toolbox 'randsample.' Which, again, is not what I asked for.

If The Mathworks can't correctly navigate the stupid function namespace they created for themselves, when they try to implement basic things used every five minutes, like 'help' and 'doc,' how can they expect their users to tolerate it?

About these ads

6 Comments

  1. Matlab doesn’t know how to draw one ball out of an urn containing one ball. « Abandon MATLAB said,

    [...] that does that? Oh yes, randsample. After looking through the help for randsample (a task that is more difficult than it sounds) we might write: j = randsample(nextStates, 1, 1, [...]

  2. Jason Moore said,

    I was using ‘what’ to check if there were mat files in my directory. My directory is named ‘CalibrationData’. Turns out there is a function somewhere on the Matlab path called ‘@calibrationdata’, and the following code kept failing even though I had mat files in the directory:

    dirinfo = what(‘CalibrationData’);
    matfiles = dirinfo.mat;
    if isempty(matfiles) ~= 1
    display(‘Yes there are mat files!’)

    Matlab name space issues are very annoying…

  3. Dan said,

    I’ve only been using MATLAB for a year and initially couldn’t believe namespace stupidity. I defined my own dummy “length.m” in order to test shadowing. When I ran it I got a really strange answer. Well that was because I had a variable named length in my workspace created by a third party script (some people have yet to learn the magical word “function”). I was about to jump out the window.

    However my favourite has to be things similar to:
    for i = 1:y_size %i is a builtin. You just shadowed it.
    for j = 1:x_size %Guess what j is by default?
    This code seems to be all over the place, including in Mathworks product docs (e.g. parfor).

  4. cellocgw said,

    Similar: all of us use “foo” and “bar” for throwaway variables. Even Matlab does here and there in the documentation examples. But, (surprise), “bar” is the name of a function which draws a barchart. Which leads me to one of my favorite rants: use of “(” both to contain args of a function and to contain indices of a matrix. There’s a very good reason R uses the “[” and “[[” operators for the latter and reserves “(” for function args.

  5. Gunnar Farnebäck said,

    > Up until very recently, there was simply no way for your code to specify that it wanted the Statistics toolbox’s ‘randsample’ instead of some imposter.

    Actually there was, sort of, but there’s nothing pretty about it. It goes something like this (variations possible):
    1. Save your pwd.
    2. Ask which about the location of some other Statistics toolbox function that you trust more not to be shadowed.
    3. cd to that directory.
    4. randsample_func = @randsample;
    5. cd back to your saved path.
    6. Issue calls to randsample through the randsample_func function pointer.

    Slightly too insane for use in ordinary code, unless you’re desperate, but workable when you write your own impostor function and want to fall back to the original in certain cases.

    • crowding said,

      Yes, I tried implementing this for the laughs when someone mentioned it on comp.soft-sys.matlab like 8 years ago. Do you have any idea how long it takes for MATLAB to rehash the entire search path every time you change it? Worthless if you’re trying to get any computation done.

      “No way” is an elision of “no practical way.” After all, every language can simulate turing machines and hence every other language. The only difference is how well.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 39 other followers

%d bloggers like this: