Read a contiguous set of columns from sheet into an R data.frame. Uses the RInterface for speed.

readColumns(
  sheet,
  startColumn,
  endColumn,
  startRow,
  endRow = NULL,
  as.data.frame = TRUE,
  header = TRUE,
  colClasses = NA,
  ...
)

Arguments

sheet

a Worksheet object.

startColumn

a numeric value for the starting column.

endColumn

a numeric value for the ending column.

startRow

a numeric value for the starting row.

endRow

a numeric value for the ending row. If NULL it reads all the rows in the sheet. If you request more than the existing rows in the sheet, the result will be truncated by the actual row number.

as.data.frame

a logical value indicating if the result should be coerced into a data.frame. If FALSE, the result is a list with one element for each column.

header

a logical value indicating whether the first row corresponding to the first element of the rowIndex vector contains the names of the variables.

colClasses

a character vector that represents the class of each column. Recycled as necessary, or if NA an attempt is made to guess the type of each column by reading the first row of data. Only numeric, character, Date, POSIXct, column types are accepted. Anything else will be coverted to a character type. If the length is less than the number of columns requested, replicate it.

...

other arguments to data.frame, for example stringsAsFactors

Value

A data.frame or a list, depending on the as.data.frameargument.

Details

Use the readColumns function when you want to read a rectangular block of data from an Excel worksheet. If you request columns which are blank, these will be read in as empty character "" columns. Internally, the loop over columns is done in R, the loop over rows is done in Java, so this function achieves good performance when number of rows >> number of columns.

Excel internally stores dates and datetimes as numeric values, and does not keep track of time zones and DST. When a numeric column is formatted as a datetime, it will be converted into POSIXct class with a GMT timezone. If you need a Date column, you need to specify explicitly using colClasses argument.

For a numeric column Excels's errors and blank cells will be returned as NaN values. Excel's #N/A will be returned as NA. Formulas will be evaluated. For a chracter column, blank cells will be returned as "".

See also

read.xlsx2 for reading entire sheets. See also addDataFrame for writing a data.frame to a sheet.

Author

Adrian Dragulescu

Examples


if (FALSE) {

  file <- system.file("tests", "test_import.xlsx", package = "xlsx")

  wb     <- loadWorkbook(file)
  sheets <- getSheets(wb)

  sheet <- sheets[["all"]]
  res <- readColumns(sheet, startColumn=3, endColumn=10, startRow=3,
    endRow=7)

  sheet <- sheets[["NAs"]]
  res <- readColumns(sheet, 1, 6, 1,  colClasses=c("Date", "character",
    "integer", rep("numeric", 2),  "POSIXct"))


}