数组类 — Array • Arrow R 包

一个 Array 是一个不可变的数据数组，具有某种逻辑类型和某种长度。大多数逻辑类型都包含在基础 Array 类中；还有 DictionaryArray、ListArray 和 StructArray 的子类。

工厂

Array$create() 工厂方法实例化一个 Array 并采用以下参数

x：一个 R 向量、列表或 data.frame
type：x 的可选数据类型。如果省略，将从数据中推断类型。

Array$create() 将返回 Array 的适当子类，例如给定 R 因子时的 DictionaryArray。

要直接组成一个 DictionaryArray，调用 DictionaryArray$create()，它接受两个参数

x：用于字典索引的 R 向量或整数的 Array
dict：字典值的 R 向量或 Array（类似 R 因子级别，但不仅限于字符串）

用法

a <- Array$create(x)
length(a)

print(a)
a == a

方法

$IsNull(i)：如果索引处的值为空，则返回 true。不进行边界检查
$IsValid(i)：如果索引处的值有效，则返回 true。不进行边界检查
$length()：此数组包含的元素数量的大小
$nbytes()：数组元素消耗的总字节数
$offset：进入另一个数组数据的相对位置，以实现零拷贝切片
$null_count：数组中空条目的数量
$type：数据的逻辑类型
$type_id()：类型 ID
$Equals(other)：此数组是否等于 other
$ApproxEquals(other) :
$Diff(other)：返回一个字符串，表达两个数组之间的差异
$data()：返回底层的 ArrayData
$as_vector()：转换为 R 向量
$ToString()：数组的字符串表示形式
$Slice(offset, length = NULL)：构造具有指示的偏移量和长度的数组的零拷贝切片。如果 length 为 NULL，则切片将到达数组的末尾。
$Take(i)：返回一个 Array，其中包含由整数（R 向量或 Array Array）i 给出的位置的值。
$Filter(i, keep_na = TRUE)：返回一个 Array，其中包含逻辑向量（或 Arrow 布尔 Array）i 为 TRUE 的位置的值。
$SortIndices(descending = FALSE)：返回一个整数位置的 Array，可用于按升序或降序重新排列 Array
$RangeEquals(other, start_idx, end_idx, other_start_idx) :
$cast(target_type, safe = TRUE, options = cast_options(safe))：更改数组中的数据以更改其类型。
$View(type)：构造具有给定类型的此数组的零拷贝视图。
$Validate()：执行任何验证检查，以确定数组内部数据中明显的矛盾之处。这可能是一项昂贵的检查，可能为 O(length)

示例

my_array <- Array$create(1:10)
my_array$type
#> Int32
#> int32
my_array$cast(int8())
#> Array
#> <int8>
#> [
#>   1,
#>   2,
#>   3,
#>   4,
#>   5,
#>   6,
#>   7,
#>   8,
#>   9,
#>   10
#> ]

# Check if value is null; zero-indexed
na_array <- Array$create(c(1:5, NA))
na_array$IsNull(0)
#> [1] FALSE
na_array$IsNull(5)
#> [1] TRUE
na_array$IsValid(5)
#> [1] FALSE
na_array$null_count
#> [1] 1

# zero-copy slicing; the offset of the new Array will be the same as the index passed to $Slice
new_array <- na_array$Slice(5)
new_array$offset
#> [1] 5

# Compare 2 arrays
na_array2 <- na_array
na_array2 == na_array # element-wise comparison
#> Array
#> <bool>
#> [
#>   true,
#>   true,
#>   true,
#>   true,
#>   true,
#>   null
#> ]
na_array2$Equals(na_array) # overall comparison
#> [1] TRUE