A collection of computer systems and programming tips that you may find useful.
 
Brought to you by Craic Computing LLC, a bioinformatics consulting company.

Wednesday, November 16, 2011

Running R scripts on the Command Line


Using the R statistics package via a GUI is great for one off tasks or for learning the language, but for repeated tasks I want the ability to create and run scripts from the UNIX command line.

There are several ways to do this:

R CMD BATCH executes R code in a script file with output being sent to a file.
$ R CMD BATCH myscript.R myoutputfile
If no output file is given then the output goes to myscript.Rout. There is no way that I know of to have it go to STDOUT. Passing parameters to your script with this approach is a pain. Here is an example script:
args <- commandArgs(TRUE)
for (i in 1:length(args)){
   print(args[i])
}

This is invoked with this command:
$ R CMD BATCH  --no-save --no-restore --slave --no-timing "--args foo=1  bar='/my/path/filename'" myscript.R tmp
In particular, note the strange quoting of the arguments, preceded by --args inside the outer quotes - it's not a typo!


That command produces this output in file 'tmp':
[1] "foo=1"
[1] "bar='/my/path/filename'"
All those '--' options are necessary ! Try leaving out --slave and --no-timing and you'll see why.

Thankfully there is a better option ...

Rscript is an executable that is part of the standard installation

You can add a 'shebang' line to the file with your R script, invoking Rscript, make the file executable and run it directly, just like any other Perl, Python or Ruby script.

You don't need those extra options as they are the default for Rscript, and you pass command line options directly without any of that quoting nonsense.

Here is an example script:
#!/usr/bin/env Rscript
args <- commandArgs(TRUE)
for (i in 1:length(args)){
  print(args[i])
}
Running this with arguments:
$ ./myscript.R foo bar
produces this output on STDOUT (which you can then redirect as you choose)
[1] "foo"
[1] "bar"
Much nicer - but we've still got those numeric prefixes. If you are passing the output to another progran these are a major pain.

The way to avoid those is to use cat() instead of print() - BUT you need to explicitly include the newline character as a separate argument to the cat() function
#!/usr/bin/env Rscript
args <- commandArgs(TRUE)
for (i in 1:length(args)){
  cat(args[i], "\n")
}
results in this output:

$ ./myscript.R foo bar
foo 
bar 


For the sake of completeness, you can run R scripts with a shebang line that invokes R directly. But Rscript seems to be the best solution.


If you want to pass command line arguments as attribute pairs then you need to parse them out within your script. I haven't got this working in a general sense yet. What I want is to pass arguments like this:
$ ./myscript.R infile="foo" outfile='bar'
But I'm not quite there yet...

















4 comments:

Davis said...
This comment has been removed by a blog administrator.
William Jarrold said...

Thanks so much!!! Worked right out of the box.

William Jarrold said...

Thanks so much! Worked right out of the box.

www.phillipburger.net/wordpress said...

Thanks for including the information on the technique for quoting when an argument value is a string. I had previously had my arguments quoted differently and spent hours trying to figure out the problem.

Archive of Tips