跳至内容

Apache Arrow 定义了两种用于 序列化用于进程间通信 (IPC) 的数据 的格式:一种“流”格式和一种“文件”格式,被称为 Feather。RecordBatchStreamReaderRecordBatchFileReader 是分别用于从这些格式的输入源访问记录批次的接口。

有关如何使用这些类的指导,请参阅示例部分。

工厂

RecordBatchFileReader$create()RecordBatchStreamReader$create() 工厂方法实例化对象并接受一个名为类名参数

方法

  • $read_next_batch():返回一个 RecordBatch,遍历读取器。如果读取器中没有更多批次,它将返回 NULL

  • $schema:返回一个 Schema(活动绑定)

  • $batches():返回一个 RecordBatch 列表

  • $read_table():将读取器的 RecordBatch 收集到一个 Table

  • $get_batch(i):对于 RecordBatchFileReader,通过整数索引返回特定批次。

  • $num_record_batches():对于 RecordBatchFileReader,查看文件中包含多少个批次。

另请参阅

read_ipc_stream()read_feather() 为从这些格式读取数据提供了更简单的接口,并且足以满足许多用例。

示例

tf <- tempfile()
on.exit(unlink(tf))

batch <- record_batch(chickwts)

# This opens a connection to the file in Arrow
file_obj <- FileOutputStream$create(tf)
# Pass that to a RecordBatchWriter to write data conforming to a schema
writer <- RecordBatchFileWriter$create(file_obj, batch$schema)
writer$write(batch)
# You may write additional batches to the stream, provided that they have
# the same schema.
# Call "close" on the writer to indicate end-of-file/stream
writer$close()
# Then, close the connection--closing the IPC message does not close the file
file_obj$close()

# Now, we have a file we can read from. Same pattern: open file connection,
# then pass it to a RecordBatchReader
read_file_obj <- ReadableFile$create(tf)
reader <- RecordBatchFileReader$create(read_file_obj)
# RecordBatchFileReader knows how many batches it has (StreamReader does not)
reader$num_record_batches
#> [1] 1
# We could consume the Reader by calling $read_next_batch() until all are,
# consumed, or we can call $read_table() to pull them all into a Table
tab <- reader$read_table()
# Call as.data.frame to turn that Table into an R data.frame
df <- as.data.frame(tab)
# This should be the same data we sent
all.equal(df, chickwts, check.attributes = FALSE)
#> [1] TRUE
# Unlike the Writers, we don't have to close RecordBatchReaders,
# but we do still need to close the file connection
read_file_obj$close()