Tuesday, June 16, 2009

Simple scripting tasks

This afternoon Katharine asked me to do a very simple task. She needed to create a certain number of random groups from a population of students. The groups weren't purely random -- rather, they had to ensure an even distribution by grade level and gender.

Our student database is all in an online database, which, allows you to export data from it in a reasonably sane way. My approach to solving her task, which took 10-15 minutes to do, was as follows (I'll give the physical equivalent of each step):

1. Make an index card with each kid's name, class, and gender on it (export a CSV file)
2. Divide those cards into piles by grade level
3. Divide those piles into a male pile and female pile
4. Shuffle the decks of cards
5. "Deal" from each deck of cards into one "hand" per group needed.

I'm quite confident most people could come up with the algorithm above -- no programming skills needed. But, in our world, it's still not possible for most people to do the above programmatically. The physical solution is available, of course, but it's not repeatable, and it doesn't scale up well (the beauty of the program is it is equally easy to run for virtually any number of kids, groups, and factors you need to divide by).

My solution was roughly 50 lines of code, though it could have been much shorter. It strikes me that essentially this kind of data management is a large part of what computers are useful for -- whether it be converting one kind of table into another kind of table, or iterating over data to create things. Programs like Excel and tools like mail merge offer some of this power, but they usually fall short at some point[1].

So my question is, is a tool that would put this kind of programming into the reach of non-programmers conceivable? Available? In the works somewhere? I've often thought that a desktop-wide macro recorder would be enormously useful[2] but I'm not sure that's really what I'd want. Perhaps AppleScript gets at this, but I'm not sure that does what I want.

Here's an initial list of features that such a language or tool would need to allow:
  1. Easy access to data in rows, regardless or source, and easy export of data in rows, regardless of needed format.
  2. Easy actions on data in rows
  3. Easy filtering of data.
  4. Simple string operations, search operations, etc., to turn one form of data into another.
  5. Calculations and comparisons.
  6. Shuffling, randomizing, etc.
  7. Graphical interface or other tool that makes syntax errors an impossibility and make semantic errors difficult to make. Recording macros, for example, it is impossible to make a syntax error.
I realize this may be a completely insane idea. Perhaps people who think like me (and therefore want this tool) are already programmers (and therefore don't need it). Still, I have to think there's a population of computer users who can see what makes for an automatable task but simply lacks the tools to do it.

I also have to think that given the variety of ways that data presents itself, and the ways in which people regularly need to move data between one application and another, there are many tasks that can't be anticipated by programmers of any given application, but could be automated given the right environment.

[1]In this example, they fall short in a number of places. I've had them fail at much simpler tasks though -- I wanted Excel, for example, to look at a numeric grade and give me a letter grade. Because of the limited syntax of Excel functions, the obvious solution was to use nested if statements, but it limits the number of nested function calls at 7 -- alas, I had more than 7 grades (A,A-,B+,B,B-,C+,C,C-,D,F)

[2] Growing up, I had such a tool in a very early Mac OS and found it incredibly useful). But macros can't introduce elements like randomness or even simple calculations (unless you get very clever). Trying to imagine a macro language becoming powerful for basic scripting, I'd have to suddenly imagine GUI equivalents to powerful programming and commandline tools like the random library, grep, sed, and awk, and on and on. One enticing possibility is building on something like the firefox macro extension (imacros) -- given the diversity of webpages and webapps out there, nearly everything becomes possible via macros on the web.

No comments: