读取 Feather 文件（Arrow IPC 文件） — read_feather • Arrow R 包

Feather 为数据帧提供二进制列式序列化。它旨在使数据帧的读写高效，并使数据在不同数据分析语言之间共享变得容易。read_feather() 可以读取 Feather 版本 1 (V1)（一个自 2016 年开始提供的旧版本）和版本 2 (V2)（即 Apache Arrow IPC 文件格式）。read_ipc_file() 是 read_feather() 的别名。

用法

read_feather(file, col_select = NULL, as_data_frame = TRUE, mmap = TRUE)

read_ipc_file(file, col_select = NULL, as_data_frame = TRUE, mmap = TRUE)

参数

file: 字符文件名或 URI、连接、raw 向量、Arrow 输入流，或带有路径的 FileSystem (SubTreeFileSystem)。如果是文件名或 URI，将打开一个 Arrow InputStream 并在完成后关闭。如果提供了输入流，它将保持打开状态。
col_select: 要保留的列名的字符向量，类似于 data.table::fread() 中的 "select" 参数，或 tidy selection 规范的列，如 dplyr::select() 中使用。
as_data_frame: 函数应该返回一个 tibble（默认）还是一个 Arrow Table？
mmap: 逻辑值：是否内存映射文件（默认为 TRUE）

返回值

如果 as_data_frame 为 TRUE（默认），则返回一个 tibble，否则返回一个 Arrow Table

另请参阅

FeatherReader 和 RecordBatchReader 用于读取 Arrow IPC 数据的更低级别访问。

示例

# We recommend the ".arrow" extension for Arrow IPC files (Feather V2).
tf <- tempfile(fileext = ".arrow")
on.exit(unlink(tf))
write_feather(mtcars, tf)
df <- read_feather(tf)
dim(df)
#> [1] 32 11
# Can select columns
df <- read_feather(tf, col_select = starts_with("d"))