CSV 读取选项 — csv_read_options • Arrow R 包

CSV 读取选项

用法

csv_read_options(
  use_threads = option_use_threads(),
  block_size = 1048576L,
  skip_rows = 0L,
  column_names = character(0),
  autogenerate_column_names = FALSE,
  encoding = "UTF-8",
  skip_rows_after_names = 0L
)

参数

use_threads: 是否使用全局 CPU 线程池
block_size: 我们从 IO 层请求的块大小；也决定了 use_threads 为 TRUE 时的块大小。
skip_rows: 读取数据前要跳过的行数（默认为 0）。
column_names: 提供列名的字符向量。如果长度为 0（默认值），除非 autogenerate_column_names 为 TRUE，否则将解析第一个非跳过行以生成列名。
autogenerate_column_names: 逻辑值：生成列名而不是使用第一个非跳过行（默认值）？如果 TRUE，则列名将为 "f0"、"f1"、...、"fN"。
encoding: 文件编码。 (默认为 "UTF-8")
skip_rows_after_names: 在列名后要跳过的行数（默认为 0）。此数字可以大于一个块中的行数，并且空行也算在内。应用顺序如下： - 应用 skip_rows（如果非零）； - 读取列名（除非设置了 column_names）； - 应用 skip_rows_after_names（如果非零）。

示例

tf <- tempfile()
on.exit(unlink(tf))
writeLines("my file has a non-data header\nx\n1\n2", tf)
read_csv_arrow(tf, read_options = csv_read_options(skip_rows = 1))
#> # A tibble: 2 x 1
#>       x
#>   <int>
#> 1     1
#> 2     2
open_csv_dataset(tf, read_options = csv_read_options(skip_rows = 1))
#> FileSystemDataset with 1 csv file
#> 1 columns
#> x: int64