MATLAB can’t read plain text data out of a wet paper bag.

August 7, 2012 at 1:48 am (crap data structures, matlab doesn't talk to anyone but matlab, powerfully stupid graphics, unexpressive language)

I’m working with someone, and they asked for some of my intermediate data. In the interests of what I thought would be maximum interoperability with whatever data analysis system they preferred, I gave them a .csv file.

It looked something like this:

"target_spacing","subject","sensitivity","direction_content"
2.0943951023932,"as",4.38700587042417,0.1
2.0943951023932,"cj",3.21032026169591,0.1
2.0943951023932,"jb",6.29391042968021,0.1
2.0943951023932,"je",-0.0328585668332463,0.1
2.0943951023932,"ko",-0.431840201907717,0.1
2.0943951023932,"mc",-4.49124425170629,0.1
2.0943951023932,"ml",-0.367323617983639,0.1
2.0943951023932,"nj",2.8780230556088,0.1

Simple enough, right? For example, if you were using R, and you wanted to do a scatterplot of “sensitivity” versus “target_spacing,” symbols coded by subject and colors coded by direction content, you do something like:

library(ggplot2)
data <- read.csv('dataset.csv')
qplot(data=data, target_spacing, sensitivity, 
      color=direction_content, pch=subject, 
      geom="point")

On the other hand, the MATLAB script I got back from this person looked more like this:

[ num, txt, data]=xlsread('datafile.csv', 'A2:Q192');

sublist= unique(txt(:, 5));
spacinglist=unique(num(:, 4));
colorlist=hot(length(dirlist)+3);
symlist={'o', 's' '*' 'x' 'd' '^' 'v' '>' '<' 'p' 'h' 'v' '>' '<' 'p' 'h'};

for s=1:length(sublist)
    for d=1:length(dirlist)
        for sp=1:length(spacinglist)
            ind=find( num(:, 3)==dirlist(d) & num(:, 4)==spacinglist(sp) & strcmp(txt(:,5),sublist{s}));
            if ~isempty(ind)
                plot(spacinglist(sp), num(ind, 6), [symlist{s}], ...
                 'MarkerSize', 5, 'Color', colorlist(d, : ),'MarkerFaceColor', ...
                 colorlist(d, : ));
                 hold on
            end
        end
    end
end

Well, this is fairly typical for MATLAB code as it is found in the wild Give people matrices and they use explicit numeric indices for everything. Aside from the difficulties of making marker shape and color covary with dimensions of the data, you have to open up the file in Excel and count along its columns to see what variable they think they’re plotting (and after some head-scratching it turns out they weren’t, actually, plotting what they thought.)

The factor-of-three reduction in code size on asking R to do the same thing MATLAB does is pretty typical too.

One of the very useful features of R is that you can assign names almost everywhere you would use an index. So, you never have to worry about whether column 4 is “target_spacing” or something else. You just say “target_spacing”.

For example, let’s say you have some nice rectilinear data, like this cross-tabulation of hair color, and eye color, and sex in a group of students:

data <- data(HairEyeColor)
> HairEyeColor
, , Sex = Male

       Eye
Hair    Brown Blue Hazel Green
  Black    32   11    10     3
  Brown    53   50    25    15
  Red      10   10     7     7
  Blond     3   30     5     8

, , Sex = Female

       Eye
Hair    Brown Blue Hazel Green
  Black    36    9     5     2
  Brown    66   34    29    14
  Red      16    7     7     7
  Blond     4   64     5     8

This is just a 3-D array, like Matlab’s 3-D arrays (Interestingly, Matlab only added multi-D arrays after someone got fed up with the lack of them and went off to write the Numeric package for Python.) And as an aside, NumPy and R have consistent rules for indexing in N-D (where N can be 1, 2, 3, or more), while MATLAB forgets about 1 dimensional arrays entirely, and as for consistency, utterly screwed it up.

Ahem. As I was saying, unlike an array in Matlab, arrays in R can have nice, human-interpretable names attached to their rows, columns, and slices. You can see them in the printout above, or get and set them explicitly with dimnames:

> dimnames(HairEyeColor)
$Hair
[1] "Black" "Brown" "Red"   "Blond"

$Eye
[1] "Brown" "Blue"  "Hazel" "Green"

$Sex
[1] "Male"   "Female"

An array with dimnames, allows you to access elements by name, not number. So if you want to slice just the blond, brown-eyed people in this sample, you can just say:

> HairEyeColor['Blond', 'Brown',]
  Male Female 
     3      4

That’s the same as writing HairEyeColor[4, 1,], only you can actually see what it’s trying to accomplish.

Now, I wish that you would be able to go a step further and write HairEyeColor[Eye='Brown',Hair='Blue',], and not worry about which order the dimensions come in, but R’s not perfect. Just useful. Actually, you can do that sort of thing with PANDAS, a Python library billing itself as “R’s data.frame on steroids.”

Meanwhile, if you pay an additional tithe to the Mathworks, you can get the Statistics toolbox, whose “dataset” class is more or less R’s data.frame with hyponatremia. (No ‘NA’, you see.)

Anyway, if you ever ask me to remember that “Female” is 1 (in this dataset) and “Hazel” is 3, well, look, remembering arbitrary correspondences between names and numbers is something humans are just really bad at and computers are very good at, OK? If you’re writing analysis scripts and you find yourself flipping back to the speadcheet to count columns… just don’t. Why would you do a job the computer should be doing for you?

Being able to refer to things by name makes your code more likely to work. For instance, in the previous example, a humanely designed system would be able to look at a statement you’ve messed up, like HairEyeColor[Eye='Brown',Hair='Blue',] and come back with "uh, there's no such thing as "blue hair" in this dataset." Which is miles better than coming back with the results you didn’t want.

Okay. Before I had the first gin and tonic and decided to cover a topic or two on the syllabus of Stuff That’s In Every Useful Programming Language Except MATLAB 101, I had this script someone sent me, that read in some data from a CSV file I’d sent them. And they were using numeric indices into the data because they had just loaded in the data as an array, using xlsread, which which doesn’t do anything useful about column headers. But you ought to be able to load each column of data into a separate field of a struct, use the column headers as struct field names, and refer to them by name that way, you know, and that’d be doing pretty good for MATLAB. So I was planning to tweak this code and send it back with a note about “here’s a nice way to do it better and let the computer take more of your mental load” (this person teaches a course on MATLAB for scientists, you see, so I want to slightly reduce the fucking brain damage that gets propagated out into the academic world.)

All you’d have to do is, instead of reading a CSV as a matrix, use the function that reads from a CSV file and uses the column headings to assign the fields in a struct. You know, that function. The one that does the single bleeding obvious thing to to with CSV files. You know, the CSV-reading function. I mean for all I rant about it, people get work done with MATLAB. It’s just impossible that Matlab can’t read the world’s most ubiquitous tabular data format usefully. Right?

Well, let’s try it.

The first thing I find is csvread. Aside from being deprecated according to the documentation, there’s another problem in that it only reads numeric data. Now, some of the columns in my file have things like a human observer’s initials, or categorical data that’s better expressed to humans with labels like “left” or “right” rather than trying to remember which one of those correspond to zero and 1. (R has a built in “factor” data type to handle categorically enumerable data, while MATLAB has bupkis.) So, csvread can’t cut it, because it only handles numeric data. Same problem with dlmread.

Next up we have xlsread. That’s what my collaborator used to begin with. Maybe it has an option to get the column names. Well, it won’t even read the file on my MacBook. Nor the Linux cluster we have in our lab. Ah, see, xlsread only reads a CSV file if it can farm its work out via a goddamn COM call to a motherfucking installed copy of Microsoft Excel, and it only knows how to do that on %$)@&#%%..% Windows. And, even if my computers met those conditions, xlsread doesn’t read a file with more than 2^16 rows. Man, I’ve got more than 2^16 rows sitting here just from asking people to look at things and press buttons. Lord help me if I ever have a real dataset.

CSV, you know, one of the world’s most ubiquitous, plain-text, human-readable file formats.

What next? There’s importdata which purports to DWIM the reading of tabular data. And there’s the “Data Import Wizard” which just turns out to be a wrapper for importdata.

Except importdata doesn’t handle the way quotes are used in CSV files. Even if that weren’t a problem, it doesn’t work at all. It detects that there’s a header row but it doesn’t actually give me the field names–why? Some experimentation reveals that it’s, again, completely incapable of handling non-numeric data in columns — even though it purports to put out separate ‘data’ and ‘textdata’ results! Here’s how ‘importdata’ mangles a perfectly straightforward file:

>> type testfile.txt

Height,Width,Depth,Weight,Label,Age,Speed
95.01,76.21,61.54,40.57,Charlie,20.28,1.53
23.11,45.65,79.19,93.55,Echo,19.87,74.68
60.68,1.85,92.18,91.69,Delta,60.38,44.51
48.60,82.14,73.82,41.03,Alpha,27.22,93.18
89.13,44.47,17.63,89.36,Romeo,19.88,46.60
>> [a delim nheaderlines] = importdata('testfile.txt')
a = 
        data: [5x2 double]
    textdata: {6x7 cell}
delim =
,
nheaderlines =
     1
>> a.textdata
ans = 
  Columns 1 through 6
    'Height'    'Width'    'Depth'    'Weight'    'Label'      'Age'
    '95.01'     '76.21'    '61.54'    '40.57'     'Charlie'    ''   
    '23.11'     '45.65'    '79.19'    '93.55'     'Echo'       ''   
    '60.68'     '1.85'     '92.18'    '91.69'     'Delta'      ''   
    '48.60'     '82.14'    '73.82'    '41.03'     'Alpha'      ''   
    '89.13'     '44.47'    '17.63'    '89.36'     'Romeo'      ''   
  Column 7
    'Speed'
    ''     
    ''     
    ''     
    ''     
    ''     
>> a.data
ans =
   20.2800    1.5300
   19.8700   74.6800
   60.3800   44.5100
   27.2200   93.1800
   19.8800   46.6000

So, it detects the delimiter and the single header row, but it doesn’t give back column names…why? The ‘textdata’ is full of perfectly reasonable numeric strings that haven’t been converted into, y’know, numbers, but some of them have been blanked out. The ‘data’ pulled out a minority of the numeric data but gives you no idea which columns it pulled out for you, and that’s the best that I’ve seen so far.

The File exchange was not helpful. csv2struct was just a wrapper for xlsread (requires Excel on Windows, limited to 65,535 rows.) txt2mat claimed to be ultimately versatile and able to handle mixed-datatype CSV files, but rejected everything I gave to it, unless I threw enough regular expressions at its options that I might as well have written my own CSV parser.

So I ended up writing my own fucking CSV parser. And at several points I got waylaid by things like:

  • textscan will skip over blank fields when they occur at the end of the line (which are common in actual data), and if there isn’t a consistent number of fields (that it doesn’t skip) per line, it will cheerfully forget line boundaries for you. So you need to do conversion line-at-a-time.

  • There’s no good way to convert a cell array of strings to numbers, str2double tries, but it outputs NaNs whenever there’s an empty string, or anything else it can’t convert. So there’s no way to tell whether some converted value is NaN because the file said “NaN” versus whether some value is NaN because the file said “booger police.” See, the thing is, NaN is a value in the IEEE 754 system that is used to represent undefined values, invalid operations, and the like. The purpose of NaN is to signal problems that happened with your arithmetic. NaN is not a missing value marker, unless you really want something to to obscure places where your math is going wrong. (This is why R allows explicitly missing — NA, not NaN — values, in any vector — not just floats.)

  • MATLAB’s version of sscanf can’t apply an alternate locale. If you ever interact with people outside the US you will see some CSV type files with conventions like: “,” for the decimal separator, “.” for a thousands place separator, and “;” for the field delimiter. That’s why the whole LOCALE facility in the C standard library exists. R provides the ability to set the locale for the purposes of reading such a file; whereas MATLAB’s documentation explicitly forbids setting the locale, even in a MEX function.

  • Speaking of MEX functions, I might have gotten this done faster if I had gone that route and done the parser in C/flex/bison like a grownup, instead of expecting MATLAB to be any help at all in doing stuff like converting several strings to numbers.

So, as you see, reading a CSV file into MATLAB entails a whole lot of bullshit.

By comparison, here’s how you read that exact same file in R.

> x = read.csv("testfile.txt")
> x
  Height Width Depth Weight   Label   Age Speed
1  95.01 76.21 61.54  40.57 Charlie 20.28  1.53
2  23.11 45.65 79.19  93.55    Echo 19.87 74.68
3  60.68  1.85 92.18  91.69   Delta 60.38 44.51
4  48.60 82.14 73.82  41.03   Alpha 27.22 93.18
5  89.13 44.47 17.63  89.36   Romeo 19.88 46.60
> class(x$Width)
[1] "numeric"
> class(x$Name)
[1] "factor"

You see? Ask R to read in a table, and it makes a good guess at the appropriate data types and headers, and you can refer to the components by their actual names. This stuff just isn’t so hard.

About these ads

20 Comments

  1. Richard said,

    This isn’t related, but you might find it interesting the Matlab can no longer open a file in the editor while running a computation in R2012a. This is a step back from R2008b which could achieve this feat.

    I’m referring to using the File->Open command as well, not typing edit foobar.m or similar during a computation. This means if I start a long-running computation and felt like doing some coding in the meantime I now have to use a different editor. Something I didn’t have to do four years ago. Now you must predict what you would like to work on and have it all open in advance if you don’t want to just sit and twiddle your thumbs while you wait.

    I was frustrated by this and tried to submit a bug report about it through the Matlab bug reporting dialog, only to find it also can’t be used during a computation either (it opens and lets you type out your report but you get to the end and find it can’t populate a drop-box while matlab’s running).

    • Antony said,

      If you open a file that is not in the MATLAB path with the editor, for the rest of that session that file will be called in preference to a file of the same name in the MATLAB path. The path is unaltered, and subsequently opening the file that is in the path does not affect the behavior.

      • crowding said,

        what.

        I can’t even be bothered to boot up MATLAB to confirm this.

  2. Dorothy Bishop (@deevybee) said,

    I would just cut and paste a file like this straight into Matlab. Just needs a few curly brackets around it and it is accessible. But I do only deal with fairly small files and I appreciate it’s non-optimal. Sometimes life seems too short and a quick fix is the answer.

    • crowding said,

      Watch this and never ever ever ever mention the words “cut and paste” around me with regard to data.

      Shit like cutting and pasting from Excel to Matlab got a group of patients exactly the wrong chemotherapy drugs in a human subjects trial.

  3. Jordi Gutiérrez Hermoso said,

    About NaN not being for missing values, actually, it is. I’ve had this debate too, thinking as you did that NaN can’t represent missing values, but after reading the damn IEEE 754 spec, it says that, yes, in fact, missing value is another interpretation of NaN, along with an erroneous calculation.

    Fun fact, which I also learned relatively recently: there are many bit patterns that are NaN: any bit pattern that has all of the exponent bits lit and a nonzero mantissa is a NaN. R’s NA value is in fact a special NaN value, with a particular bit pattern in the mantissa (I think the bit pattern is some sort of internal R joke about someone’s birthday or something like that). In Octave, we do have NA values, with the same bit-pattern as R’s NA value, but the problem is that “isnan” also returns true for NA values, because, y’know, NA values *are* NaN (I believe R behaves in the same way).

    This is why Alois Schögl has written a NaN toolbox for “stats” that skips all NaNs, because for *him*, NaNs only represent missing values (I’ve had many a flamewar with him about this).

    As to your overall point with csv2struct, point taken. I’m trying to implement csv2struct in Octave now (without needing xlsread).

    • crowding said,

      You can ‘interpret’ NaN as a missing value — as far as I’m aware you can ‘interpret’ the bitpattern of a NaN any damn way you please as the IEEE doesn’t say anything about what they are, only that some NaNs are produced by some operations. So you can interpret a certain NaN bit pattern as a signal to go down the pub and eat a sandwich and that’d fit the spec. But some NaNs are definitely produced by arithmetic errors, and without a language facility for telling error NaNs apart from whatever other interpretation you want to place on them, the only way you can interpret a NaN in a way that bears correspondence to reality is as “arithmetic error or some other interpretation.” Which I do not find useful.

      In R is.nan(NA) is false; you can tell them apart. The implementation being based on a NaN bitpattern is an implementation detail (and it is implemented in other ways for logical and character arrays).

      An appeal to IEEE 754 is only really useful if you’re going to really support IEEE 754; that includes being able to tell NaNs apart. (As long as we’re on IEEE 754, how about rounding modes, any hope there? Whoever does that would get at least a few of the hardcore numerical analyst crowd, if there are any of those people left.)

      • jordigh said,

        You can tell them apart, with some care… is.na(NaN) is also true. Furthermore, telling them apart requires checking the bit pattern in software. It’s slower than doing “x != x”, which is a single hardware instruction (gcc translates it to ucomisd). It makes a difference of a factor about 10, if memory serves.

        The IEEE 754 standard is quite clear that quiet NaNs’ bit pattern can be used to store “retrospective diagnostic information”, but checking this information can be slower than just checking if it’s NaN or not.

  4. Chris said,

    [I didn't read the post, and I wrote some bullshit suggestion that doesn't work. I didn't even copy-and-paste the example data above and check whether my suggestion produced something useful.]

  5. Maurits said,

    [I didn't read the post, and I wrote some bullshit suggestion that doesn't work. I didn't even copy-and-paste the example data above and check whether my suggestion produced something useful.]

  6. Bobby said,

    [I didn't read the post, and I wrote some bullshit suggestion that doesn't work. Also I was snarky in a "RTFM noob" sort of way, which is ironic because I didn't even copy-and-paste the example data above and check whether my suggestion produced something useful.]

  7. crowding said,

    A few of you have ben writing in in tones varying from “faux-helpful” to “internet asshole” to suggest that “dataset” reads a CSV file. Aside from not being a part of MATLAB proper, rather being part of the “Statistics toolbox,” one of many shitty paid add-ons that Mathworks peddles, dataset DOES NOT read CSV files. In fact dataset’s failure at interpreting quoted strings was unfixable enough that it was better worth my time to write my own CSV parser. So you can knock it off.

    If any of you had actually read this post you would see that an acceptable CSV reader needs to do a few things:

    A CSV parser should handle both numeric and string data (and hopefully booleans and dates). And do so automatically. In particular, it should not force you to write a fragile textscan format string that you have to change every time the columns in your spreadsheet get rearranged. The point of using CSV is to be a flexible interchange format; one thing ‘flexible’ entails is that the number and order of variables in the file change as an analysis develops. I am not willing to write a new format string every time there is a minor change, when the default “the type is numeric if they all look like numbers” works 99% of the time. Especially not the way textscan does it, where using a format string disables checking for line breaks! So you’re liable to get misaligned data that makes no sense.

    A CSV parser should read a leading header row and convert them into dataset variable names or struct field names.

    A CSV parser should handle data with more than 2^16 rows, unlike xlsread. For fuck’s sake this isn’t the damn 1980s, our computers have memory and we’re trying to do science here.

    A CSV parser should interpret quotes. If you are going to call yourself a CSV parser you just have to correctly handle quotes. Quotes are a part of the CSV format. If you read comma-delimited data without handlig quotes, you are not a CSV parser. There is a particular way that the CSV format does quoting, and you can read about it in RFC 4180.

    For example, if one variable column is named “subject”, which is represented in the header line file as the word "subject" inside quotes, then when I read the file, the variable name it produces should be “subject”, not, as dataset insists with the above, “x0x22subject0x22″.

    For another example, the first entry under the “subject” in the example data above is to be interpreted as as, without quotes, not "as" without the quotes striped. The quotes there are part of the CSV format, not part of the data. Anything that claims to interpret a data format should be able to interpret the standard syntax of that data format. This is not a difficult thing to
    understand.

    For a further example, if an entry under a field named “City” field contains the string “Eerie, Indiana”, a CSV parser needs to be able to interpret that as a string containing a comma, and a CSV parser should not produce misaligned data or barf with a “requires the same number of delimiters on each line” just because the file correctly followed the CSV format. [Being forced to construct a format string with "%q" in some places and not in others is not acceptable; as described above it easily produces misaligned data.]

    This is not a tall order; this is all just what R’s read.csv or Pandas’ read_csv give you out of the box.

  8. Bobby said,

    I can’t be bothered to check on the examples in the post I’m replying to. Freely available tools do a better job than the tools I use that cost thousands, and I find this somehow threatening to me.

    • Bobby said,

      I insist on a point by point rebuttal to a pile of insults!

      • crowding said,

        You have some amusingly wrong ideas, so you get toyed with for one comment.

        That’s an amusing suggestion. I suppose that I could use sed and a particularly complicated regular expression to transform an industry standard RFC 4180 file into a subset of whatever fucking format dataset is even capable of interpreting. And then you’d have to write the inverse transformation on the Matlab side, of course.

        The thing you are missing is that this is all just a a roundabout, incredibly stupid way of writing a parser, which, if you paid any fucking attention at all you’ll see I’ve already done you fucking idiot. Except that your way of doing it introduces an external dependency on UNIX, which, if you opened your seeing-eyes and pay the slightest bit of attention with your mind-brain you’ll remember I’m working with someone who uses MATLAB on its most popular platform, Windows. Because here in the adult world we collaborate with people, and use industry standard text data formats for interchange with people working on different platforms, using tools that can read those industry standard formats.

        And that’s an interesting perspective you have there, on the relationship between computing and masculinity! Let’s see if it works.

        One of the tropes associated with masculinity is that a man should be in control of things. Things that are in a subservient role to a man, such as a computer, should do his bidding. If the relationship between a man and his computer embodies masculinity, then the computer should do what computers do, in the way that the man tells it to. Follow?

        One of the other tropes associated with masculinity is that a man knows how to use his tools. As in: a computer is a tool for eliminating repetetive effort. A computer is a tool to delegate trivial and error-prone mental labor to. A computer is not a master for you do bullshit unnecessary labor for.

        So no, I don’t think adopting a subservient role to a fucking machine, repeatedly, mechanically doing for a computer what a computer ought to be doing for me, is particularly masculine. (If I use a more capable toolchain, it actually does it for me in one command — and MATLAB also does it for me, now that I’ve made it do so, you’ll recall if you have even half a second of short term memory.) Men get work done.

        On the other hand, committing to rewriting your completely unnecessary textscan formst string is exactly the kind of self-enslavement that would be done by the kind of UNIX weenie who mistakes knowing how to push buttons for getting work done, and has never learned to write a parser in his life.

        Men, in comparison, aren’t averse to trying tools that work better, or to correcting the defects in defective tools.

  9. Peter said,

    I am currently working on MATLAB project spanning almost 2000 lines. Those developers that made decisions about how MATLAB works should die a slow death.

  10. Rob said,

    Although it doesn’t fix all your concerns, the dataset object does provide a lot of the functionality found in R data frames.

    http://www.mathworks.co.uk/help/stats/dataset.html

    • crowding said,

      Which is why I referred to Matlab’s dataset class, it in the post above you are replying to, as an inferior version of data frames.

      (Much slower too, as MATLAB’s object system has ridiculously high overhead.)

  11. Zé Pedro said,

    Amazing blog. Obviously, and like anything else in life, R is remarkable at performing a number of tasks; not so noteworthy at performing others.

    Why would anyone go to such lengths to discredit some software, in this case Matlab, on [blah blah blah I like to hear myself pontificate, my emotional reaction to a rhetorical tone blinds me to absorbing any facts]

    • crowding said,

      If that’s what you have got out of this blog then you haven’t been reading very closely.

      R is absolutely not the best — in this very post I point out grounds on which Python’s Pandas package beats R — but you can’t see that that’s what I’m writing, because it’s a curious failing of nonrational people that if anyone ever points out better alternatives to the product they have invested in, loss aversion and the sunk cost fallacy rears its head; they’d rather circle their wagons around the instrument that wastes their time than try investing some time to learn a thing now that saves time in the future. My blog is not pitched to people who act purely on what Kahneman calls System 1. The confrontational tone is by design; for some people (the subset who are “reachable”) it is arresting enough that they pause, their System 2 has some time to take hold and let them actually listen to the embedded facts.

      The first time I coded an analysis and figure in R I completed it in less than 1/3 the lines of code than I had done a functionally equivalent task MATLAB. The time taken was equivalent including the time it took to learn R. That is, the time it took to obtain positive return on investment learning R was less than one week’s project. The sunk cost fallacy is a pure fallacy, though it may seem otherwise to someone who’s struggled enough learning MATLAB as their first language that they have no idea how transferable the good bits of knowledge are and how little they’ll miss the bad bits.

      And yet every day I wince at places where R wastes my time, and there are areas I would not even try to use R. I would absolutely not recommend R for experiment control over data collection, for example. There I’d go with Python, or if concurrency and real-time operation is the constraint, with Erlang.

      R’s also, on balance, a little bit worse at array-centric programming — but MATLAB is a pale shadow of J in that arena.

      But if you prefer writing your code in functional style, which in MATLAB requires a lot of structfun and arrayfun and so on, R is far superior — but Clojure and Incanter are better still.

      And if you’re wedded to for-loops, MATLAB does boast a JIT compiler which nicely speeds up for-loops for built-in arithmetic operations. I hesitate to even mention that because some folks’ brains shut down when they hear “JIT” and they stop measuring things, thinking that all their problems are solved. In practice, if your loops involve dispatching object methods or nested functions, or even calling non-built-in functions, MATLAB’s JIT does not help and it’s even slower than R. In any case, never mind because Julia spanks the pants off both R and MATLAB and the JIT extends to all Julia functions including its multidispatch type system (far superior to MATLAB’s objects).

      See, once you have the attitude that all tools suck, it becomes really easy to learn new tools and pick the most appropriate one for your purposes. (The secret is that every you learn a new tool, the easier it becomes to learn another one, and the better you become at the ones you already know.) I’ve been going along in this post, charting the tools I’m familiar with, trying to find a scenario under which I’d pick MATLAB, though, and, it’s not happening. Curious. Despite fifteen years of experience with MATLAB programming, doing things like implementing a reference object system for MATLAB years before they introduced their own (more verbose and slower) version, it’s not happening.

      Here’s the thing. The majority of outlets writing on MATLAB are sponsored by the Mathworks, or are written by people who have never considered anything else. In comparison, writings of people who are truly frustrated with, for example, R, are readily available. Lectures containing little but derision and at best grudging acceptance of R are readily available. — written and spoken by people who, yet, continue to use R, sometimes, because sometimes it’s the right choice for the job. Like I continue to use MATLAB sometimes, because of some legacy code (but the infrequency of posts to this blog indicates my success in moving to other systems.)

      This is the difference between an open project and a closed project. In an open project, people also discuss its flaws and proposed remedies in the open. The Mathworks does not discuss and debate their product’s flaws in the open — I only see the linkbacks from their intranet to this blog–so I am stepping in to perform the same service to the Mathworks that open community projects perform for themselves. (The Mathworks have even implemented their own walled-off inferior version of StackOverflow, not only on hopes that they retain moderation control and people don’t escape from the walled garden, but in hopes that people don’t notice how comp.soft-sys.matlab has for several years been both lower-traffic and dumber than either R-help or scipy-users (there’s your community support!)

      As far as “great lengths,” I estimate the time I have spent, to date, on this blog, to be on the order of 1/100 the time I would have saved myself by learning alternatives to MATLAB earlier than I did.

      If you’re a person who is able to try out and rationally evaluate a new tool, and does so with regularity, this blog is not addressed to you. However, your bloviation, which I have elided to help you save face, indicates otherwise.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 39 other followers

%d bloggers like this: