No Named arguments

July 31, 2009 at 7:37 am (essential band-aid helper functions, unexpressive language)

Tom Moertel points out some very interesting and useful consequences of the way R evaluates function arguments, including its built-in named arguments, which, to reiterate a point for the MATLAB-raised audience, it has. Built-in named arguments, that is.

In MATLAB, there is no built in way to give named arguments to your function. The tradition in cases where you would like optional, named arguments is to call with name-value pairs, so that you call functions like so:

f('tens', 2, 'ones', 6)

For the most part, it’s up to the author of a function to parse such an argument list. If you occasionally look at the functions defined in the library and toolboxes, you realize something very depressing: Mathworks engineers have for the most part done the input parsing ad-hoc for every function that takes their kind of ersatz ‘named arguments.’ For example, check out ‘optimset.m’ in the Optimization toolbox, then check out ‘statset.m’ in the Stats toolbox. The people you send your license fee are people who used that money to write both* of those long, completely different, yet functionally isomorphic pieces of code and did not realize anything was amiss.

*Not just both: there are dozens of ad-hoc name/value parsers in the toolboxes and many times more limited subsets of that functionality.

The name-value calling convention was firmly established when I picked up MATLAB 5 (it was part of that version’s hilarious attempt at introducing ‘object-oriented’ features). So it must have taken The Mathworks at least ten years to realize that implementing an ad-hoc argument parser every single time they wanted a little calling flexibility — in their own in-house toolboxes! — was dumb. So finally they introduced the “InputParser” class.

With the InputParser class, you take something that would be expressed in, for example, R as:

f <- function(ones=1, tens=1) {

and write the equivalent Matlab code:

function out = f(varargin)

p = InputParser;
p.addParamValue('ones', 1);
p.addParamValue('tens', 1);
p.parse(varargin{:})
ones = p.Results.ones;
tens = p.Results.tens;

Wow. Let’s compare that to the equivalent Python:

def f(ones=1,tens=1):

So, uh, thanks for that, MATLAB.

Now, if you don’t need the full “flexibility” of the InputParser class, which reads like it was done by an intern from a Java-only undergraduate program,* you can try my function “namedOptions,” a little lighter weight (and with some slightly different capabilities.)

* Seriously, it’s like a LoopObject except not meant as a joke.

Myself, I will be reading up more on those well-thought-out function arguments and scoping rules in R.

Permalink 1 Comment

No Lists

July 28, 2009 at 12:16 am (crap data structures, dumb memory management, essential band-aid helper functions, unexpressive language)

MATLAB is a language supposedly intended to help the user process data. Mind-bogglingly, it does not provide lists, which are pretty fundamental to processing data in the real world. What’s more, it does not even include the possibility of implementing a reasonable facsimile of a list.

If you’ve been indoctrinated into MATLAB, you’ve probably been told about the evils of resizing an array inside of a loop, and how you must avoid that by preallocating. Lately MATLAB presents a helpful compiler warning that tells you if you appear to be resizing an array in a loop. See, if you have an array X that is 10 elements long, and you put a value into X(11), MATLAB has to allocate an entirely new array, with 11 elements, then copy the original 10 elements into the new memory space, copy your new value in, then reclaim the memory taken by the original 10 element array. This takes a bit of time. What’s worse, the longer your list gets the more data it has to copy and the more time that takes — so if you keep adding things onto the end of your array, it transforms an O(n) loop into an O(n2).

So to avoid that massive penalty you can preallocate — make an array with zeros() or similar that’s large enough to hold all your results, before you go and compute your results. The problem with this aside from the extra lines of code (which decreases readability by taking up the reader’s mind with bookkeeping, as well as making a potential entry point for bugs if you mess up calculating the exact size you need) is that a lot of the time it’s just impossible to predict how many slots you need in the array before you actually go and collect the data that needs to go in the array.

One such situation is reading from a log file. When I run an experiment the end product is a text log file that details what happened in the experiment, in chronological order. Since I can’t know at the beginning of the experiment how many trials it has — some trials are thrown out for various reasons and the experiment might just terminate — I can’t preallocate the array to hold trial data when I read in the log file for analysis. Nor can I predict how many events (e.g. fixation breaks, cue onsets) a trial has before I read it in.

Additionally, there are a whole class of algorithms that rely on having a list that you can add or remove entries on the beginning or end of — breadth first graph traversal, for instance, and open up an algorithms text to a random page to see more — and it basically becomes impossible to implement those algorithms in MATLAB.

It’s almost natural to look at the way MATLAB handles adding an n+1th element to an n-element array and wonder, “gee, why recopy all that data? Why not just allocate one new place for entry 11, and add a link from the 10 elements to the eleventh element so that you just make your 10 element list a little longer?” If you could do that, you’d have a ‘list’, it’s such a natural thing to think of that almost every high level language you could consider using — except MATLAB — provides lists as a fundamental datatype. And low level languages like C provide you the tools to construct your own lists and other data structures. MATLAB is rather unique in allowing neither solution within the language.

Do other languages provide lists?

A better question might be, “do any languages in use today not provide lists or the opportunity to build them?”

However, since the two alternatives I’d like to push scientific computing users toward are R and Python, let’s see what they offer.

  • Python offers a builtin list datatype, and appending operations in particular are much faster, on average, than MATLAB’s. Python also provides the collections library which offers several different implementations of lists fine tuned to particular tasks — of course, all are transparently usable with the same syntax as regular lists, and can be used wherever a list can be used. If you really want compact arrays with no overhead and don’t care about appending, Python also offers that as well as the multidimensional MATLAB-like arrays of the Numeric package.
  • R‘s vector and list data types are implemented with a fixed array, like MATLAB’s, but R is really a LISP under the hood, so it definitely has the flexibility to implement data structures in user space. The standard library package Containers gives you some nice implementations.

Do any other languages used today not provide lists or let you create them?

I can’t think of a single language I’ve used, other than MATLAB, and maybe exercises in school where you write out longhand all the state transitions for a Turing machine, that doesn’t. The 1977 version of FORTRAN, maybe doesn’t. I suppose this tells you that MATLAB met its original late-seventies design goal of being easier to use than FORTRAN. It is now 2009.

What do you mean, I can’t implement a list? Doesn’t MATLAB have like objects ‘n’ stuff?

Give it a try, good luck! The first problem you find is that MATLAB does not support references or pointers (more on this in a later post) which are necessary to build your own list. Perhaps you will try MATLAB’s new ‘handle objects’ or notice that ‘function handles’ are essentially references to their local workspace variables. What you will find is that instantiating a whole damn object or closure for every reference you want to take is incredibly slow — and even if you manage to implement it, the way that MATLAB invokes methods on closures and objects does something very weird with memory management that guarantees you will not meet the o(1) append/delete that you wanted (more on that in a later post too.)

I suppose you could always make like a FORTRAN77 programmer, allocate a gobsmacking huge global array, and roll your own malloc into it. Heh.

What about JAVA, doesn’t that provide lists? And can’t you access it from MATLAB?

Can JAVA variables hold MATLAB data types? Or even handles to MATLAB data types? No. Not even just as an opaque object. (Someone please help this guy!) So you can pass simple numeric values and strings, but cells or function handles or objects (even “handle objects”) can’t be organized in Java containers. So this approach doesn’t get you very far.

What’s the best I can do if I insist on using MATLAB, yet I need to collect values in a non-preallocated way?

Something you can do in MATLAB is make a cell array of cell arrays of cell arrays, This lets you have an ersatz pushdown stack, the closest you can get to a sequence container:

n = 0;
stack = {};

function push(x)
	stack = {x stack};
	n = n + 1;
end

function x = pop()
	x = stack{1};
	stack = stack{2};
end

function out = readout()
	out = cell(1, n);
	for i = n:-1:1
		out{i} = pop();
	end
end

So you accumulate values with push() and when you’re done you read off an array with readout(). Note that if you try to encapsulate this technique to instantiate stacks using handle objects, or handles to nested functions, you’re in for a nasty surprise as you will not find your speed advantage — a later post to this blog will explore why. I have written a helper function accumulate.m that works around this problem via a persistent struct (with the consequence that you will leak memory if you don’t clean up your lists). Here it is:

function [push, readout] = accumulate(dim)
%function [push, readout] = accumulate(DIM)
%
%Peter Meilstrup, 2008
%
%Records a list of data (e.g. streaming eye position or other
%streaming input data) in memory without the speed penalty of reallocating
%arrays in a loop.
%
%Returns two function handles, 'push', and 'readout', you push data in
%using the pirst, then at the end of your accumulation read it out using
%the second. The accumulated data is concatenated along the dimension DIM.
%
% Example:
%
% >> [push, readout] = linkedlist(2)
% push =
%     @linkedlist/doPush
% readout =
%     @linkedlist/doReadout
% >> push([1;2]);
% >> push([3 4;5 6]);
% >> push([0; 0]);
% >> readout()
% ans =
%      1     3     4     0
%      2     5     6     0

persistent lists;
persistent counter;

if isempty(lists)
    lists = struct();
    counter = 0;
end

if ~exist('dim', 'var')
    dim = 1;
end

elements = 0;
name = '';

push = @doPush;
readout = @doReadout;

    function doPush(what)
        if isempty(name)
            name = ['t' num2str(counter)];
            counter = counter +1;
            lists.(name) = {};
            elements = 0;
        end
        lists.(name) = {lists.(name) what};
        elements = elements + 1;
    end

    function out = doReadout()
        if ~isempty(name)
            l = lists.(name);
            lists = rmfield(lists, name);
            name = '';
        else
            l = {};
        end
        if ~isempty(l)
            out = cell(1, elements);
            for i = elements:-1:1
                out{i} = l{2};
                l = l{1};
            end
            out = cat(dim, out{:});
        else
            out = [];
        end
    end
end

ADDENDA:

Permalink 4 Comments

Follow

Get every new post delivered to your Inbox.