data.frame
.R/read.xlsx.R
, R/read.xlsx2.R
read.xlsx.Rd
The read.xlsx
function provides a high level API for reading data
from an Excel worksheet. It calls several low level functions in the
process. Its goal is to provide the conveniency of
read.table
by borrowing from its signature.
read.xlsx(
file,
sheetIndex,
sheetName = NULL,
rowIndex = NULL,
startRow = NULL,
endRow = NULL,
colIndex = NULL,
as.data.frame = TRUE,
header = TRUE,
colClasses = NA,
keepFormulas = FALSE,
encoding = "unknown",
password = NULL,
...
)
read.xlsx2(
file,
sheetIndex,
sheetName = NULL,
startRow = 1,
colIndex = NULL,
endRow = NULL,
as.data.frame = TRUE,
header = TRUE,
colClasses = "character",
password = NULL,
...
)
the path to the file to read.
a number representing the sheet index in the workbook.
a character string with the sheet name.
a numeric vector indicating the rows you want to extract.
If NULL
, all rows found will be extracted, unless startRow
or
endRow
are specified.
a number specifying the index of starting row. For
read.xlsx
this argument is active only if rowIndex
is
NULL
.
a number specifying the index of the last row to pull. If
NULL
, read all the rows in the sheet. For read.xlsx
this
argument is active only if rowIndex
is NULL
.
a numeric vector indicating the cols you want to extract.
If NULL
, all columns found will be extracted.
a logical value indicating if the result should be
coerced into a data.frame
. If FALSE
, the result is a list
with one element for each column.
a logical value indicating whether the first row corresponding
to the first element of the rowIndex
vector contains the names of the
variables.
For read.xlsx
a character vector that represent the
class of each column. Recycled as necessary, or if the character vector is
named, unspecified values are taken to be NA
. For read.xlsx2
see readColumns
.
a logical value indicating if Excel formulas should be shown as text in and not evaluated before bringing them in.
encoding to be assumed for input strings. See
read.table
.
a String with the password.
other arguments to data.frame
, for example
stringsAsFactors
A data.frame or a list, depending on the as.data.frame
argument. If some of the columns are read as NA's it's an indication that
the colClasses
argument has not been set properly.
If the sheet is empty, return NULL
. If the sheet does not exist,
return an error.
The function pulls the value of each non empty cell in the worksheet into a
vector of type list
by preserving the data type. If
as.data.frame=TRUE
, this vector of lists is then formatted into a
rectangular shape. Special care is needed for worksheets with ragged data.
An attempt is made to guess the class type of the variable corresponding to
each column in the worksheet from the type of the first non empty cell in
that column. If you need to impose a specific class type on a variable, use
the colClasses
argument. It is recommended to specify the column
classes and not rely on R
to guess them, unless in very simple cases.
Excel internally stores dates and datetimes as numeric values, and does not
keep track of time zones and DST. When a datetime column is brought into ,
it is converted to POSIXct
class with a GMT timezone.
Occasional rounding errors may appear and the and Excel string
representation my differ by one second. For read.xlsx2
bring in a
datetime column as a numeric one and then convert to class POSIXct
or
Date
. Also rounding the POSIXct
column in R usually does the
trick too.
The read.xlsx2
function does more work in Java so it achieves better
performance (an order of magnitude faster on sheets with 100,000 cells or
more). The result of read.xlsx2
will in general be different from
read.xlsx
, because internally read.xlsx2
uses
readColumns
which is tailored for tabular data.
Reading of password protected workbooks is supported for Excel 2007 OOXML format only.
write.xlsx
for writing xlsx
documents. See
also readColumns
for reading only a set of columns into R.
if (FALSE) {
file <- system.file("tests", "test_import.xlsx", package = "xlsx")
res <- read.xlsx(file, 1) # read first sheet
head(res)
# NA. Population Income Illiteracy Life.Exp Murder HS.Grad Frost Area
# 1 Alabama 3615 3624 2.1 69.05 15.1 41.3 20 50708
# 2 Alaska 365 6315 1.5 69.31 11.3 66.7 152 566432
# 3 Arizona 2212 4530 1.8 70.55 7.8 58.1 15 113417
# 4 Arkansas 2110 3378 1.9 70.66 10.1 39.9 65 51945
# 5 California 21198 5114 1.1 71.71 10.3 62.6 20 156361
# 6 Colorado 2541 4884 0.7 72.06 6.8 63.9 166 103766
# >
# To convert an Excel datetime colum to POSIXct, do something like:
# as.POSIXct((x-25569)*86400, tz="GMT", origin="1970-01-01")
# For Dates, use a conversion like:
# as.Date(x-25569, origin="1970-01-01")
res2 <- read.xlsx2(file, 1)
}