读取 JSON 文件 — read_json_arrow • Arrow R Package

围绕 JsonTableReader 的包装器，用于将换行符分隔的 JSON (ndjson) 文件读取到数据框或 Arrow 表中。

用法

read_json_arrow(
  file,
  col_select = NULL,
  as_data_frame = TRUE,
  schema = NULL,
  ...
)

参数

file

字符文件名或 URI、连接、文字数据（单个字符串或 raw 向量）、Arrow 输入流或带有路径的 FileSystem (SubTreeFileSystem)。

如果为文件名，则在完成后将打开并关闭内存映射的 Arrow InputStream；将从文件扩展名检测压缩并自动处理。如果提供输入流，它将保持打开状态。

要被识别为文字数据，输入必须用 I() 包裹。

col_select

要保留的列名的字符向量，如 data.table::fread() 的“select”参数，或 tidy selection specification 的列，如 dplyr::select() 中使用的。

as_data_frame

该函数应返回一个 tibble（默认）还是一个 Arrow Table？

schema

Schema 描述表。

...

传递给 JsonTableReader$create() 的其他选项

值

一个 tibble，或者如果 as_data_frame = FALSE，则为一个 Table。

详情

如果传递一个路径，将从文件扩展名（例如 .json.gz）检测和处理压缩。

如果未提供 schema，则 Arrow 数据类型从数据中推断

JSON 空值转换为 null() 类型，但可以回退到任何其他类型。
JSON 布尔值转换为 boolean()。
JSON 数字转换为 int64()，如果遇到非整数，则回退到 float64()。
“YYYY-MM-DD”和“YYYY-MM-DD hh:mm:ss”类型的 JSON 字符串转换为 timestamp(unit = "s")，如果发生转换错误，则回退到 utf8()。
JSON 数组转换为 list_of() 类型，并且推理递归地对 JSON 数组的值进行。
嵌套的 JSON 对象转换为 struct() 类型，并且推理递归地对 JSON 对象的值进行。

当 as_data_frame = TRUE 时，Arrow 类型会进一步转换为 R 类型。

示例

tf <- tempfile()
on.exit(unlink(tf))
writeLines('
    { "hello": 3.5, "world": false, "yo": "thing" }
    { "hello": 3.25, "world": null }
    { "hello": 0.0, "world": true, "yo": null }
  ', tf, useBytes = TRUE)

read_json_arrow(tf)
#> # A tibble: 3 x 3
#>   hello world yo   
#>   <dbl> <lgl> <chr>
#> 1  3.5  FALSE thing
#> 2  3.25 NA    NA   
#> 3  0    TRUE  NA   

# Read directly from strings with `I()`
read_json_arrow(I(c('{"x": 1, "y": 2}', '{"x": 3, "y": 4}')))
#> # A tibble: 2 x 2
#>       x     y
#>   <int> <int>
#> 1     1     2
#> 2     3     4