You’re fixing the wrong thing.

March 7, 2012 at 3:02 am (doing it wrong, lying documentation, trouble with small numbers)

Some folks at Mathworks read this blog. I know because I get referrals from Mathworks internal wikis and bug trackers.

I also know because I’ve seen a few documentation changes. For example, you guys have updated the documentation for “sparse” to reflect that it adds together overlapping indices rather than overwriting them like normal arrays; you updated the docs on “randsample” to reflect that it draws random samples from arrays only if they are at least 2 elements long; you even updated the docs for “getframe” to clarify that you need to turn off the fucking screen saver and walk away from the computer like it’s 1992.

Ahem. You guys are missing the point. Let me repeat from before.

It’s not that MATLAB’s behavior isn’t documented; it’s that the behavior is stupid, and leads to errors when your users (reasonably) assume that behavior would be consistent over array dimensions, or that behavior would be consistent between different but related functions, or that behavior would be consistent within a single function, or that things would just not be busted in general.

I liked the previous, unmodified documentation. It reflected the intentions of your programmers; clearly they were trying to implement the reasonable and useful things they described. It’s too bad you don’t have the wherewithal to finish the job you clearly meant to do and fix the stupid behavior.

Permalink 3 Comments

Matlab doesn’t know how to draw one ball out of an urn containing one ball.

June 2, 2010 at 12:33 pm (matlab is bad at math, trouble with small numbers)

You are stimulating a discrete Markov process. You have a left-stochastic(*) matrix X where X(j,i) gives the probability of transitioning from state i to the state j. There are a lot of states, and most of the state transition probabilities are zero, so to fit your system into memory, you have built X as a sparse matrix.

You have the index of the current state, i; you want to simulate the next step and randomly draw a value for the index the next state j.

Easy enough to begin with, we find the indices and probabilities of the potential next states like this:

nextStates = find(X(:,i));
weights = X(nextStates,i);

So now we need to generate a random draw from a discrete distribution, where we know the probability of each value of the distribution. Isn’t there a function that does that? Oh yes, randsample. After looking through the help for randsample (a task that is more difficult than it sounds) we might write:

j = randsample(nextStates, 1, 1, full(weights));

Now a really easy question: How and why does this break? Answer below the fold. (hint: look at the category of the post)

(*) while almost all the mathematical literature uses right-stochastic matrices, in MATLAB the sparse matrices are mush slower if you use them that way.

Read the rest of this entry »

Permalink 8 Comments

Graphics puzzle.

August 20, 2009 at 10:50 pm (matlab is bad at math, powerfully stupid graphics, Simple trivia about its fundamental behavior that you probably can't answer, trouble with small numbers)

Hey, I basically wrote the entire answer to the previous indexing puzzle as a comment in response to a request for a hint from a MathWorks employee. I didn’t dig up any examples for it — which is not a function of not having ready examples in my code, but more of a function of having too many examples, and the prospect of digging through all of my previous pissed-off code comments I’ve made every time I ran into the same dumb issue isn’t something that makes me feel good about my productivity in the years I’ve been using MATLAB. So I’ll dig up examples sooner or later.

Funny that I started this blog when I noticed I had reached some critical mass (a blogsweight) of code comments detailing stupid things MATLAB does. Funny because I haven’t even started looking through those comments.

Tonight I was just trying to make a plot on log-log axes — ironically, I was trying to clean and tighten up some graphics code in my archives to make it as fair a comparison as possible for when I go on to show the same graph produced in R using much less code. I wanted two scales and sets of tickmarks on the graph, so I was plotting on two overlapping axes created via plotyy. Some data points on one axis, some lines on the other axis. Something like this:

[ax, h1, h2] = plotyy([1 8], [10/3 90], [2 4], [10 30]);
set(h2, 'Marker', '.', 'LineStyle', 'none');
set(ax(1), 'Xscale', 'log', 'Yscale', 'log');
set(ax(2), 'Xscale', 'log', 'Yscale', 'log'...
    , 'XLim', get(ax(1), 'Xlim'), 'YLim', get(ax(1), 'Ylim'));

Now, correct me if I’m mistaken on the mathematics of logarithms, which I’m not, but if two overlapping axes are both on a log-log scale, and have the same limits — and you can verify that the axes are set up the same–

>> cellfun(@(x) isequal(get(ax(1), x), get(ax(2), x)), {'Xlim', 'Ylim', 'Xscale', 'Yscale', 'XLimMode', 'YLimMode'})
ans =
     1     1     1     1     1     1

– for these log-log axes with identical limits, points (2,10) and (4, 30) ought to lie on top of the line from (1, 10/3) to (8, 90), right?

crap

I’ll give you some time to work out what the problem is.

Or switch to a system that actually plots what you tell it to plot.

If I had wanted crap graphics that bore no relation to the input data I’d use Excel.

Permalink 4 Comments

Programatically creating a struct

August 12, 2009 at 2:14 am (crap data structures, trouble with small numbers, unexpressive language)

Say you are parsing a file and you read a list of column names, then a list of values to put into those column names. Simple enough, obviously you are going to put them into a 1×1 struct array. Perhaps use the ‘struct’ function to do this. Given a cell array of field names snames, and a cell array of values to associate with those field names svals, we can just use struct() to cook up a struct:

arglist = {snames{:};svals{:}}
s = struct(arglist{:})

Goodness there’s a lot of matlab-ism’s packed into that. To work out how that works you need to know: that x{} extracts cell array contents into a “comma separated list,” which is not an actual list like a data structure you can access (there aren’t any of those) but a syntax that unpacks a cell array or struct field access (but not a regular array or function call) into a place where the parser otherwise expects expressions separated by commas; that putting comma-separated expressions like {x, y, z; a, b, c} orders an array in row-major order BUT : orders an array in column-major order, and so on. Let’s ignore the fact that reasonable goddamn languages like Python or R write the above simply as s = dict(zip(snames, svalues)) or s = svalues; names(s)<-snames, respectively, and press on. Did you notice the condition in which the above will fail and break horribly?

You do remember how struct() works, right? You give the field name, then the value, then the second field name, then the second value, the good old ersatz broken named arguments pattern. And the GREAT thing about the broken ersatz named arguments pattern is that every function that implements it does the input parsing just a LITTLE BIT differently. (After writing dozens of name-value parsers, each working a little bit differently, MathWorks went ahead and came up with an InputParser class which gives you all the flexibility to continue making every argument list parser work just a little different.)

Take struct(), for instance. (Please!). Just as an exercise, try making this struct by writing an invocation of struct():

s =
    a: 1
    b: 2

The answer to this exercise is:

s = struct('a', 1, 'b', 2)

But your data are complicated: Some of the structure fields are themselves cell arrays. Now try using struct to make this struct with a singleton cell array:

s =
    a: 1
    b: {[2]}

Got that? Let’s try this one with an empty cell array.

s =
    a: 1
    b: {}

And this one with multiple elements in a cell in a member of the structure:

s =
    a: 1
    b: {'foo', 'bar'}

If you’ve tried this and were surprised, read on.

%There being no cell2struct, MATLAB's builtin function struct() seems to
%be the only simple way to efficiently programatically initialize a
%structure. But the behavior of struct() is not simple. It has
%*astonishing* behavior on cell array arguments.
%
%>> struct('a', 1)
%ans =
%    a: 1
%>> struct('a', {1})
%ans =
%    a: 1
%>> struct('a', {1, 2})
%ans =
%1x2 struct array with fields:
%    a
%>> struct('a', {})
%ans =
%0x0 struct array with fields:
%    a
%
% There are two overlapping behaviors: the behavior when the value to put
% in a field is a cell array, and the value when it is any other type. This
% makes it tricky to use struct() to make scalar whose field values are
% themselves cell arrays:
%
% >> struct('a', {1, 2}, 'b', {3, 4, 5})
%??? Error using ==> struct
%Array dimensions of input 4 must match those of input 2 or be scalar.
%
% To acheive the intended effect, you must double-wrap such arguments:
%
%>> struct('a', {{1, 2}}, 'b', {{3, 4, 5}})
%ans =
%    a: {[1]  [2]}
%    b: {[3]  [4]  [5]}
%
% Further complicating this is that struct() does 'scalar expansion' of
% non-cell or singleton cell arguments
%
%>> struct('a', {1, 2}, 'b', {3}, 'c', [2 4 5])
%ans =
%1x2 struct array with fields:
%    a
%    b
%>> ans.b
%ans =
%     3
%ans =
%     3
%>> ans.c
%ans =
%     2    4    5
%ans =
%     2    4    5
%
%From this we can conclude that the only consistent, general way to use
%struct() is to wrap all value arguments in cell arrays.

All right! So now you know how to REALLY combine a list of names and a list of values into a structure.

In Python:

s = dict(zip(snames, svalues))

In R:

names(svalues) <- snames

And in MATLAB:

svals = num2cell(svals);

arglist = {snames{:};svals{:}};

s = struct(arglist{:});

Permalink 3 Comments

Trouble with small numbers

August 1, 2009 at 10:40 pm (trouble with small numbers)

Yesterday’s post, and today’s, introduces the “trouble with small numbers” category. More so than any other system I’ve worked with, MATLAB’s functions do strikingly different things for inputs that are 1 element long vs. 2 elements long vs. 3 elements long vs. 0 elements long. This is a hair-pulling annoyance for the programmer who dare write an algorithm that works with arrays that can be of varying sizes (gasp!) This sounds a bit vague but that’s sort of because this failing is not localized to any one syntactic feature*; rather it’s grown like mildew all over the language and toolboxes. Trying to write any program with generality in MATLAB entails walking through a minefield of these unpredictable corner cases. If it still sounds vague, I promise that after a few dozen posts on this category of failing alone you’ll get the idea.

*though the fact that arrays can have any number N of dimensions as long as N is not 1, for some reason, probably has to do with a lot of it.

Anyhow, I was synthesizing some audio, and wanted to ramp in and out at the beginning and end of my signal. Here’s what I did:

function signal = ramp(signal, rampin, rampout)
    %impose a ramp over the first and last N samples of a signal.
    signal(1:rampin) = signal(1:rampin) ...
        .* linspace(1/(rampin+2, 1-1/(rampin+2), rampin);
    signal(end-rampout+1:end) = signal(end-rampout+1:end) ...
        .* linspace(1-1/(rampout+2), 1/(rampout+2), rampout);

 

Now, I incorporated this ramp into the function that synthesized my sound; playing with the results, I decided that I wanted a ramp at the end of the sound but not the beginning So I naturally set rampin=0. Who can guess that happened?

That’s right:

While linspace(a,b,3) returns an array of length 3;
and linspace(a,b,2) returns an array of length 2;
and linspace(a,b,1) returns an array of length 1;
but linspace(a,b,0) returns an array of length 1.

Crash and burn.

Remember, it’s not that MATLAB’s behavior here isn’t documented (it is); it’s that the behavior is stupid, and leads to errors when you (reasonably) assume that the behavior would be consistent over reasonable inputs. If I ask linspace to give me an array of length zero, why wouldn’t it?

If you have an example where the present behavior of linspace is better than actually returning the length of array requested, of course, feel free to comment. I’d like to see how contrived it is.

Might as well compare to the behavior of the linspace-equivalent in R, which gets it right:

> seq(0,1,len=3)
[1] 0.0 0.5 1.0
> seq(0,1,len=2)
[1] 0 1
> seq(0,1,len=1)
[1] 0
> seq(0,1,len=0)
integer(0)

and in Python, which also gets it right:

>>> from scipy import linspace
>>> linspace(0,1,3)
array([ 0. ,  0.5,  1. ])
>>> linspace(0,1,2)
array([ 0.,  1.])
>>> linspace(0,1,1)
array([ 0.])
>>> linspace(0,1,0)
array([], dtype=float64)

Permalink Leave a Comment

For your thoughts, a question in three parts

August 1, 2009 at 7:00 am (Simple trivia about its fundamental behavior that you probably can't answer, trouble with small numbers)

Something to chew on: You are presumably aware of array indexing/slicing, in which you have an expression

C = A(B)

and both A and B are arrays (and B is a non-logical array of integers).

1. Write from memory the rules determining what the size of array C will be, in terms of the sizes of arrays A and B.

2. You just want to linearly index into A and get a vector back as the result of your indexing operation. For all A, is there an array B such that A(B) is a column vector?

3. Proceed to rant about how completely dumb that is. Compare with every other array-slicing language on the planet, if necessary.

Permalink 3 Comments

Follow

Get every new post delivered to your Inbox.