RecordBatchWriter 类 — RecordBatchWriter • Arrow R 包

Apache Arrow 定义了两种序列化数据以便进行进程间通信 (IPC) 的格式：一种是“流”格式，另一种是“文件”格式，也称为 Feather。RecordBatchStreamWriter 和 RecordBatchFileWriter 分别是用于将 record batch 写入这些格式的接口。

有关如何使用这些类的指导，请参阅示例部分。

工厂

RecordBatchFileWriter$create() 和 RecordBatchStreamWriter$create() 工厂方法实例化该对象并采用以下参数

sink 一个 OutputStream
schema 一个用于写入数据的 Schema
use_legacy_format 逻辑值：写入格式化的数据，以便 Arrow 库版本 0.14 及更低版本可以读取它。默认为 FALSE。您还可以通过设置环境变量 ARROW_PRE_0_15_IPC_FORMAT=1 来启用此功能。
metadata_version：一个字符串，如“V5”或等效的整数，表示 Arrow IPC MetadataVersion。默认值 (NULL) 将使用最新版本，除非环境变量 ARROW_PRE_1_0_METADATA_VERSION=1，在这种情况下将使用 V4。

方法

$write(x): 写入一个 RecordBatch, Table, 或者 data.frame, 并适当分派到下面的方法
$write_batch(batch): 将一个 RecordBatch 写入到流
$write_table(table): 将一个 Table 写入到流
$close(): 关闭流。请注意，这表示文件结束或流结束——它不会关闭与 sink 的连接。需要单独关闭连接。

参见

write_ipc_stream() 和 write_feather() 为将数据写入这些格式提供了一个更简单的界面，并且足以满足许多用例。write_to_raw() 是将数据序列化到缓冲区的版本。

示例

tf <- tempfile()
on.exit(unlink(tf))

batch <- record_batch(chickwts)

# This opens a connection to the file in Arrow
file_obj <- FileOutputStream$create(tf)
# Pass that to a RecordBatchWriter to write data conforming to a schema
writer <- RecordBatchFileWriter$create(file_obj, batch$schema)
writer$write(batch)
# You may write additional batches to the stream, provided that they have
# the same schema.
# Call "close" on the writer to indicate end-of-file/stream
writer$close()
# Then, close the connection--closing the IPC message does not close the file
file_obj$close()

# Now, we have a file we can read from. Same pattern: open file connection,
# then pass it to a RecordBatchReader
read_file_obj <- ReadableFile$create(tf)
reader <- RecordBatchFileReader$create(read_file_obj)
# RecordBatchFileReader knows how many batches it has (StreamReader does not)
reader$num_record_batches
#> [1] 1
# We could consume the Reader by calling $read_next_batch() until all are,
# consumed, or we can call $read_table() to pull them all into a Table
tab <- reader$read_table()
# Call as.data.frame to turn that Table into an R data.frame
df <- as.data.frame(tab)
# This should be the same data we sent
all.equal(df, chickwts, check.attributes = FALSE)
#> [1] TRUE
# Unlike the Writers, we don't have to close RecordBatchReaders,
# but we do still need to close the file connection
read_file_obj$close()