Arrow Flight RPC#

Arrow Flight 是一个基于 Arrow 数据的高性能数据服务 RPC 框架,构建在 gRPCIPC 格式之上。

Flight 围绕 Arrow 记录批次的流进行组织,这些流可以从一个服务下载或上传到另一个服务。一套元数据方法提供了流的发现和内省,以及实现应用程序特定方法的能力。

方法和消息线格式由 Protobuf 定义,这使得可以与可能单独支持 gRPC 和 Arrow 但不支持 Flight 的客户端进行互操作。然而,Flight 实现包含了进一步的优化,以避免 Protobuf 使用中的开销(主要围绕避免过多的内存复制)。

RPC 方法和请求模式#

Flight 定义了一组 RPC 方法,用于上传/下载数据、检索数据流的元数据、列出可用数据流以及实现应用程序特定的 RPC 方法。Flight 服务实现这些方法中的一部分,而 Flight 客户端可以调用这些方法中的任何一个。

数据流由描述符(FlightDescriptor 消息)标识,描述符可以是一个路径或任意的二进制命令。例如,描述符可以编码 SQL 查询、分布式文件系统上文件的路径,甚至是一个序列化的 Python 对象;应用程序可以根据需要使用此消息。

因此,一个 Flight 客户端可以连接到任何服务并执行基本操作。为了方便这一点,Flight 服务预期支持一些常见的请求模式,接下来将介绍。当然,应用程序可以忽略兼容性,简单地将 Flight RPC 方法视为其自身目的的低级构建块。

有关所涉及方法和消息的完整详细信息,请参阅协议缓冲区定义

下载数据#

希望下载数据的客户端将

%% Licensed to the Apache Software Foundation (ASF) under one %% or more contributor license agreements. See the NOTICE file %% distributed with this work for additional information %% regarding copyright ownership. The ASF licenses this file %% to you under the Apache License, Version 2.0 (the %% "License"); you may not use this file except in compliance %% with the License. You may obtain a copy of the License at %% %% https://apache.org/licenses/LICENSE-2.0 %% %% Unless required by applicable law or agreed to in writing, %% software distributed under the License is distributed on an %% "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY %% KIND, either express or implied. See the License for the %% specific language governing permissions and limitations %% under the License. sequenceDiagram autonumber participant Client participant Metadata Server participant Data Server Client->>Metadata Server: GetFlightInfo(FlightDescriptor) Metadata Server->>Client: FlightInfo{endpoints: [FlightEndpoint{ticket: Ticket}, …]} Note over Client, Data Server: This may be parallelized loop for each endpoint in FlightInfo.endpoints Client->>Data Server: DoGet(Ticket) Data Server->>Client: stream of FlightData end

通过 DoGet 检索数据。#

  1. 为他们感兴趣的数据集构建或获取 FlightDescriptor

    客户端可能已经知道他们想要的描述符,或者他们可以使用 ListFlights 等方法来发现它们。

  2. 调用 GetFlightInfo(FlightDescriptor) 以获取 FlightInfo 消息。

    Flight 不要求数据与元数据位于同一服务器上。因此,FlightInfo 包含数据所在位置的详细信息,以便客户端可以从适当的服务器获取数据。这以 FlightInfo 中一系列 FlightEndpoint 消息的形式编码。每个端点表示包含响应数据子集的一些位置。

    一个端点包含一个位置列表(服务器地址),可以从中检索数据,以及一个 Ticket,这是一个不透明的二进制令牌,服务器将用它来标识请求的数据。

    如果 FlightInfo.ordered 为 true,则表示来自不同端点的数据之间存在某种顺序。客户端应生成与来自每个端点的数据按顺序从前到后连接起来的结果相同。

    如果 FlightInfo.ordered 为 false,客户端可以以任意顺序返回来自任何端点的数据。来自任何特定端点的数据必须按顺序返回,但来自不同端点的数据可以交错,以允许并行获取。

    请注意,由于某些客户端可能会忽略 FlightInfo.ordered,如果顺序很重要且无法确保客户端支持,服务器应返回单个端点。

    响应还包含其他元数据,例如模式,以及可选的数据集大小估计。

  3. 消费服务器返回的每个端点。

    要消费一个端点,客户端应连接到端点中的一个位置,然后使用端点中的票证调用 DoGet(Ticket)。这将为客户端提供 Arrow 记录批次流。

    如果服务器希望指示数据位于本地服务器而非其他位置,则可以返回空的位置列表。然后客户端可以重用与原始服务器的现有连接来获取数据。否则,客户端必须连接到指示的位置之一。

    服务器可以将“自身”列为与其他服务器位置并列的一个位置。通常这需要服务器知道其公共地址,但它也可以使用特殊的 URI 字符串 arrow-flight-reuse-connection://? 来告诉客户端,他们可以重用与同一服务器的现有连接,而无需能够命名自身。请参阅下面的连接重用

    通过这种方式,端点内部的位置也可以被视为执行旁路负载平衡或服务发现功能。端点可以表示被分区或以其他方式分布式的数据。

    客户端必须消费所有端点才能检索完整的数据集。客户端可以按任何顺序消费端点,甚至可以并行消费,或者将端点分配给多台机器进行消费;这取决于应用程序来实现。客户端还可以使用 FlightInfo.ordered。有关 FlightInfo.ordered 的详细信息,请参阅上一项。

    每个端点可能有一个过期时间(FlightEndpoint.expiration_time)。如果端点有过期时间,客户端可以在过期时间到达之前通过 DoGet 多次获取数据。否则,DoGet 请求是否可以重试由应用程序定义。过期时间表示为 google.protobuf.Timestamp

    如果过期时间较短,客户端可以通过 RenewFlightEndpoint 操作延长过期时间。客户端需要使用 DoActionRenewFlightEndpoint 操作类型来延长过期时间。Action.body 必须是包含要续订的 FlightEndpointRenewFlightEndpointRequest

    客户端可以通过 CancelFlightInfo 操作取消返回的 FlightInfo。客户端需要使用 DoActionCancelFlightInfo 操作类型来取消 FlightInfo

通过运行重型查询下载数据#

客户端可能需要请求一个繁重的查询来下载数据。然而,GetFlightInfo 在查询完成之前不会返回,因此客户端会被阻塞。在这种情况下,客户端可以使用 PollFlightInfo 代替 GetFlightInfo

%% Licensed to the Apache Software Foundation (ASF) under one %% or more contributor license agreements. See the NOTICE file %% distributed with this work for additional information %% regarding copyright ownership. The ASF licenses this file %% to you under the Apache License, Version 2.0 (the %% "License"); you may not use this file except in compliance %% with the License. You may obtain a copy of the License at %% %% https://apache.org/licenses/LICENSE-2.0 %% %% Unless required by applicable law or agreed to in writing, %% software distributed under the License is distributed on an %% "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY %% KIND, either express or implied. See the License for the %% specific language governing permissions and limitations %% under the License. sequenceDiagram autonumber participant Client participant Metadata Server participant Data Server Client->>Metadata Server: PollFlightInfo(FlightDescriptor) Metadata Server->>Client: PollInfo{descriptor: FlightDescriptor', ...} Client->>Metadata Server: PollFlightInfo(FlightDescriptor') Metadata Server->>Client: PollInfo{descriptor: FlightDescriptor'', ...} Client->>Metadata Server: PollFlightInfo(FlightDescriptor'') Metadata Server->>Client: PollInfo{descriptor: null, info: FlightInfo{endpoints: [FlightEndpoint{ticket: Ticket}, …]} Note over Client, Data Server: This may be parallelized Note over Client, Data Server: Some endpoints may be processed while polling loop for each endpoint in FlightInfo.endpoints Client->>Data Server: DoGet(Ticket) Data Server->>Client: stream of FlightData end

通过 PollFlightInfo 轮询长时间运行的查询。#

  1. 像以前一样,构建或获取一个 FlightDescriptor

  2. 调用 PollFlightInfo(FlightDescriptor) 以获取 PollInfo 消息。

    服务器应该在第一次调用时尽快响应。因此,客户端不应该等待第一次 PollInfo 响应。

    如果查询尚未完成,PollInfo.flight_descriptor 具有 FlightDescriptor。客户端应使用该描述符(而不是原始 FlightDescriptor)来调用下一个 PollFlightInfo()。服务器应识别一个不一定是最新版本的 PollInfo.flight_descriptor,以防客户端在此期间错过更新。

    如果查询完成,PollInfo.flight_descriptor 将未设置。

    PollInfo.info 是目前可用的结果。它每次都是一个完整的 FlightInfo,而不仅仅是前一个和当前 FlightInfo 之间的差异。服务器每次都应该只向 PollInfo.info 中的端点追加数据。因此,即使查询尚未完成,客户端也可以使用 PollInfo.info 中的 Ticket 运行 DoGet(Ticket)FlightInfo.ordered 也有效。

    服务器不应响应,直到结果与上次不同。这样,客户端可以“长轮询”更新,而无需不断发出请求。如果需要,客户端可以设置一个短超时,以避免阻塞调用。

    PollInfo.progress 可以被设置。它代表查询的进度。如果设置了,该值必须在 [0.0, 1.0] 之间。该值不一定是单调或非递减的。服务器可以通过仅更新 PollInfo.progress 值来响应,尽管它不应该频繁地向客户端发送更新。

    PollInfo.timestamp 是此请求的过期时间。在此时间之后,服务器可能不再接受轮询描述符,并且查询可能会被取消。这可以在调用 PollFlightInfo 时更新。过期时间表示为 google.protobuf.Timestamp

    客户端可以通过 CancelFlightInfo 操作取消查询。

    如果查询失败,服务器应返回错误状态而不是响应。客户端不应轮询请求,除非是 TIMED_OUTUNAVAILABLE,这些可能不是源自服务器的错误。

  3. 像以前一样,消费服务器返回的每个端点。

上传数据#

要上传数据,客户端会

%% Licensed to the Apache Software Foundation (ASF) under one %% or more contributor license agreements. See the NOTICE file %% distributed with this work for additional information %% regarding copyright ownership. The ASF licenses this file %% to you under the Apache License, Version 2.0 (the %% "License"); you may not use this file except in compliance %% with the License. You may obtain a copy of the License at %% %% https://apache.org/licenses/LICENSE-2.0 %% %% Unless required by applicable law or agreed to in writing, %% software distributed under the License is distributed on an %% "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY %% KIND, either express or implied. See the License for the %% specific language governing permissions and limitations %% under the License. sequenceDiagram autonumber participant Client participant Server Note right of Client: The first FlightData includes a FlightDescriptor Client->>Server: DoPut(FlightData) Client->>Server: stream of FlightData Server->>Client: PutResult{app_metadata}

通过 DoPut 上传数据。#

  1. 像以前一样,构建或获取一个 FlightDescriptor

  2. 调用 DoPut(FlightData) 并上传 Arrow 记录批次流。

    FlightDescriptor 包含在第一条消息中,以便服务器可以识别数据集。

DoPut 允许服务器向客户端发送带有自定义元数据的响应消息。这可以用于实现诸如可恢复写入之类的功能(例如,服务器可以定期发送一条消息,指示目前已提交的行数)。

交换数据#

某些用例可能需要在一个调用中上传和下载数据。虽然这可以通过多次调用模拟,但如果应用程序是有状态的,这可能会很困难。例如,应用程序可能希望实现一个调用,其中客户端上传数据,服务器响应数据的转换;如果使用 DoGetDoPut 实现,这将需要有状态。相反,DoExchange 允许将其实现为单个调用。客户端会

%% Licensed to the Apache Software Foundation (ASF) under one %% or more contributor license agreements. See the NOTICE file %% distributed with this work for additional information %% regarding copyright ownership. The ASF licenses this file %% to you under the Apache License, Version 2.0 (the %% "License"); you may not use this file except in compliance %% with the License. You may obtain a copy of the License at %% %% https://apache.org/licenses/LICENSE-2.0 %% %% Unless required by applicable law or agreed to in writing, %% software distributed under the License is distributed on an %% "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY %% KIND, either express or implied. See the License for the %% specific language governing permissions and limitations %% under the License. sequenceDiagram autonumber participant Client participant Server Note right of Client: The first FlightData includes a FlightDescriptor Client->>Server: DoExchange(FlightData) par [Client sends data] Client->>Server: stream of FlightData and [Server sends data] Server->>Client: stream of FlightData end

使用 DoExchange 进行复杂数据流。#

  1. 像以前一样,构建或获取一个 FlightDescriptor

  2. 调用 DoExchange(FlightData)

    FlightDescriptor 包含在第一条消息中,与 DoPut 类似。此时,客户端和服务器可以同时向对方流式传输数据。

认证#

Flight 支持各种身份验证方法,应用程序可以根据其需求进行定制。

“握手”认证

这分两部分实现。在连接时,客户端调用 Handshake RPC 方法,应用程序定义的认证处理器可以与服务器上的对应方交换任意数量的消息。然后,处理器提供一个二进制令牌。Flight 客户端随后将在所有未来调用的标头中包含此令牌,服务器认证处理器将验证此令牌。

应用程序可以使用其中任何一部分;例如,它们可以忽略初始握手并在每次调用时发送一个外部获取的令牌(例如,一个 bearer 令牌),或者它们可以在握手期间建立信任并且不验证每次调用的令牌,将连接视为有状态的(一种“登录”模式)。

警告

除非在每次调用时都验证令牌,否则这种模式不安全,尤其是在存在第 7 层负载均衡器(在 gRPC 中很常见)或 gRPC 透明地重新连接客户端的情况下。

基于标头/基于中间件的认证

客户端可以在调用中包含自定义标头。然后可以实现自定义中间件来在服务器端验证和接受/拒绝调用。

相互 TLS (mTLS)

客户端在建立连接期间提供证书,服务器会验证该证书。应用程序无需实现任何身份验证代码,但必须配置和分发证书。

这可能仅在某些实现中可用,并且仅在启用 TLS 时才可用。

一些 Flight 实现也可能暴露底层的 gRPC API,在这种情况下,gRPC 支持的任何认证方法都是可用的。

位置 URI#

Flight 主要根据其下面的 Protobuf 和 gRPC 规范进行定义,但 Arrow 实现也可能支持替代传输(请参阅 Flight RPC)。客户端和服务器需要知道在给定位置的 URI 中使用哪种传输,因此 Flight 实现应使用以下 URI 方案进行给定的传输

传输

URI 方案

gRPC(明文)

grpc: 或 grpc+tcp

gRPC (TLS)

grpc+tls

gRPC(Unix 域套接字)

grpc+unix

(重用连接)

arrow-flight-reuse-connection

HTTP (1)

http: 或 https

备注

  • (1) 有关使用 http/https 作为传输时的语义,请参阅扩展位置 URI。它应该可以通过 GET 请求访问。

    http/https 作为传输。它应该可以通过 GET 请求访问。

连接重用#

上面提到的“重用连接”不是一种特定的传输。相反,它意味着客户端可以尝试对原始获取 FlightInfo 的同一服务器(并通过同一连接)执行 DoGet(即,它曾对其调用 GetFlightInfo 的服务器)。这与未返回特定 Location 时的解释方式相同。

这允许服务器将“自身”作为获取数据的一个可能位置返回,而无需知道自己的公共地址,这在难以或不可能知道此信息的部署中非常有用。例如,开发人员可以将云环境中的远程服务转发到他们的本地机器;在这种情况下,远程服务将无法知道它所访问的本地主机名和端口。

出于兼容性原因,URI 应始终为 arrow-flight-reuse-connection://?,带有尾随的空查询字符串。Java 的 URI 实现不接受 scheme:scheme://,C++ 的实现不接受空字符串,因此明显的候选方案不兼容。所选的表示形式可以被这两种实现以及 Go 的 net/url 和 Python 的 urllib.parse 解析。

扩展位置 URI#

除了替代传输,服务器还可以返回引用外部服务或对象存储位置的 URI。这在中间数据作为 Apache Parquet 文件缓存到云存储上或通过 HTTP 服务可访问的情况下非常有用。在这些场景中,提供一个 URI 供客户端直接下载数据更有效,而不是要求 Flight 服务将其读回内存并从 DoGet 请求中提供。

为了避免 Flight 客户端必须实现对多个不同云存储供应商(例如 AWS S3、Google Cloud)的支持的复杂性,我们将 URI 扩展为只允许 HTTP/HTTPS URI,客户端可以执行简单的 GET 请求来下载数据。身份验证可以通过在 Flight 协议之外协商处理,或者由服务器发送一个预签名 URL,客户端可以向其发出 GET 请求。这应该得到所有当前主要云存储供应商的支持,这意味着只有服务器需要了解底层对象存储 API 的语义。

当使用扩展位置 URI 时,客户端应忽略 FlightEndpointTicket 字段中的任何值。Ticket 仅用于在 Flight 服务上下文中识别数据,当客户端直接从外部服务下载数据时不需要它。

客户端应该假定,除非另有说明,数据是使用序列化和进程间通信 (IPC) 返回的,就像通过 DoGet 调用一样。如果返回的 Content-Type 标头是通用媒体类型,例如 application/octet-stream,客户端仍应假定它是 Arrow IPC 流。对于其他媒体类型,例如 Apache Parquet,服务器应使用客户端能够识别的适当的 IANA 媒体类型。

最后,服务器还可以通过尊重请求中的 Accept 标头,允许客户端选择数据返回的格式。如果请求并支持多种格式,则使用哪种格式由服务器决定。如果请求的内容类型均不受支持,服务器可以响应 406(不可接受)、415(不支持的媒体类型),或者成功响应它支持的不同格式,并附带正确的 Content-Type 标头。

错误处理#

Arrow Flight 定义了自己的一套错误代码。实现因语言而异(例如,在 C++ 中,Unimplemented 是一个通用的 Arrow 错误状态,而在 Java 中它是一个 Flight 特定的异常),但公开了以下集合

错误码

描述

UNKNOWN

未知错误。如果没有其他错误适用,则为默认值。

INTERNAL

服务实现内部发生错误。

INVALID_ARGUMENT

客户端向 RPC 传递了无效参数。

TIMED_OUT

操作超出超时或截止时间。

NOT_FOUND

请求的资源(操作、数据流)未找到。

ALREADY_EXISTS

资源已存在。

CANCELLED

操作被取消(由客户端或服务器取消)。

UNAUTHENTICATED

客户端未通过身份验证。

UNAUTHORIZED

客户端已通过身份验证,但没有请求操作的权限。

UNIMPLEMENTED

RPC 未实现。

UNAVAILABLE

服务器不可用。可能由客户端因连接原因发出。

外部资源#

协议缓冲区定义#

  1/*
  2 * Licensed to the Apache Software Foundation (ASF) under one
  3 * or more contributor license agreements.  See the NOTICE file
  4 * distributed with this work for additional information
  5 * regarding copyright ownership.  The ASF licenses this file
  6 * to you under the Apache License, Version 2.0 (the
  7 * "License"); you may not use this file except in compliance
  8 * with the License.  You may obtain a copy of the License at
  9 * <p>
 10 * https://apache.org/licenses/LICENSE-2.0
 11 * <p>
 12 * Unless required by applicable law or agreed to in writing, software
 13 * distributed under the License is distributed on an "AS IS" BASIS,
 14 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 15 * See the License for the specific language governing permissions and
 16 * limitations under the License.
 17 */
 18
 19syntax = "proto3";
 20import "google/protobuf/timestamp.proto";
 21
 22option java_package = "org.apache.arrow.flight.impl";
 23option go_package = "github.com/apache/arrow-go/arrow/flight/gen/flight";
 24option csharp_namespace = "Apache.Arrow.Flight.Protocol";
 25
 26package arrow.flight.protocol;
 27
 28/*
 29 * A flight service is an endpoint for retrieving or storing Arrow data. A
 30 * flight service can expose one or more predefined endpoints that can be
 31 * accessed using the Arrow Flight Protocol. Additionally, a flight service
 32 * can expose a set of actions that are available.
 33 */
 34service FlightService {
 35
 36  /*
 37   * Handshake between client and server. Depending on the server, the
 38   * handshake may be required to determine the token that should be used for
 39   * future operations. Both request and response are streams to allow multiple
 40   * round-trips depending on auth mechanism.
 41   */
 42  rpc Handshake(stream HandshakeRequest) returns (stream HandshakeResponse) {}
 43
 44  /*
 45   * Get a list of available streams given a particular criteria. Most flight
 46   * services will expose one or more streams that are readily available for
 47   * retrieval. This api allows listing the streams available for
 48   * consumption. A user can also provide a criteria. The criteria can limit
 49   * the subset of streams that can be listed via this interface. Each flight
 50   * service allows its own definition of how to consume criteria.
 51   */
 52  rpc ListFlights(Criteria) returns (stream FlightInfo) {}
 53
 54  /*
 55   * For a given FlightDescriptor, get information about how the flight can be
 56   * consumed. This is a useful interface if the consumer of the interface
 57   * already can identify the specific flight to consume. This interface can
 58   * also allow a consumer to generate a flight stream through a specified
 59   * descriptor. For example, a flight descriptor might be something that
 60   * includes a SQL statement or a Pickled Python operation that will be
 61   * executed. In those cases, the descriptor will not be previously available
 62   * within the list of available streams provided by ListFlights but will be
 63   * available for consumption for the duration defined by the specific flight
 64   * service.
 65   */
 66  rpc GetFlightInfo(FlightDescriptor) returns (FlightInfo) {}
 67
 68  /*
 69   * For a given FlightDescriptor, start a query and get information
 70   * to poll its execution status. This is a useful interface if the
 71   * query may be a long-running query. The first PollFlightInfo call
 72   * should return as quickly as possible. (GetFlightInfo doesn't
 73   * return until the query is complete.)
 74   *
 75   * A client can consume any available results before
 76   * the query is completed. See PollInfo.info for details.
 77   *
 78   * A client can poll the updated query status by calling
 79   * PollFlightInfo() with PollInfo.flight_descriptor. A server
 80   * should not respond until the result would be different from last
 81   * time. That way, the client can "long poll" for updates
 82   * without constantly making requests. Clients can set a short timeout
 83   * to avoid blocking calls if desired.
 84   *
 85   * A client can't use PollInfo.flight_descriptor after
 86   * PollInfo.expiration_time passes. A server might not accept the
 87   * retry descriptor anymore and the query may be cancelled.
 88   *
 89   * A client may use the CancelFlightInfo action with
 90   * PollInfo.info to cancel the running query.
 91   */
 92  rpc PollFlightInfo(FlightDescriptor) returns (PollInfo) {}
 93
 94  /*
 95   * For a given FlightDescriptor, get the Schema as described in Schema.fbs::Schema
 96   * This is used when a consumer needs the Schema of flight stream. Similar to
 97   * GetFlightInfo this interface may generate a new flight that was not previously
 98   * available in ListFlights.
 99   */
100   rpc GetSchema(FlightDescriptor) returns (SchemaResult) {}
101
102  /*
103   * Retrieve a single stream associated with a particular descriptor
104   * associated with the referenced ticket. A Flight can be composed of one or
105   * more streams where each stream can be retrieved using a separate opaque
106   * ticket that the flight service uses for managing a collection of streams.
107   */
108  rpc DoGet(Ticket) returns (stream FlightData) {}
109
110  /*
111   * Push a stream to the flight service associated with a particular
112   * flight stream. This allows a client of a flight service to upload a stream
113   * of data. Depending on the particular flight service, a client consumer
114   * could be allowed to upload a single stream per descriptor or an unlimited
115   * number. In the latter, the service might implement a 'seal' action that
116   * can be applied to a descriptor once all streams are uploaded.
117   */
118  rpc DoPut(stream FlightData) returns (stream PutResult) {}
119
120  /*
121   * Open a bidirectional data channel for a given descriptor. This
122   * allows clients to send and receive arbitrary Arrow data and
123   * application-specific metadata in a single logical stream. In
124   * contrast to DoGet/DoPut, this is more suited for clients
125   * offloading computation (rather than storage) to a Flight service.
126   */
127  rpc DoExchange(stream FlightData) returns (stream FlightData) {}
128
129  /*
130   * Flight services can support an arbitrary number of simple actions in
131   * addition to the possible ListFlights, GetFlightInfo, DoGet, DoPut
132   * operations that are potentially available. DoAction allows a flight client
133   * to do a specific action against a flight service. An action includes
134   * opaque request and response objects that are specific to the type action
135   * being undertaken.
136   */
137  rpc DoAction(Action) returns (stream Result) {}
138
139  /*
140   * A flight service exposes all of the available action types that it has
141   * along with descriptions. This allows different flight consumers to
142   * understand the capabilities of the flight service.
143   */
144  rpc ListActions(Empty) returns (stream ActionType) {}
145}
146
147/*
148 * The request that a client provides to a server on handshake.
149 */
150message HandshakeRequest {
151
152  /*
153   * A defined protocol version
154   */
155  uint64 protocol_version = 1;
156
157  /*
158   * Arbitrary auth/handshake info.
159   */
160  bytes payload = 2;
161}
162
163message HandshakeResponse {
164
165  /*
166   * A defined protocol version
167   */
168  uint64 protocol_version = 1;
169
170  /*
171   * Arbitrary auth/handshake info.
172   */
173  bytes payload = 2;
174}
175
176/*
177 * A message for doing simple auth.
178 */
179message BasicAuth {
180  string username = 2;
181  string password = 3;
182}
183
184message Empty {}
185
186/*
187 * Describes an available action, including both the name used for execution
188 * along with a short description of the purpose of the action.
189 */
190message ActionType {
191  string type = 1;
192  string description = 2;
193}
194
195/*
196 * A service specific expression that can be used to return a limited set
197 * of available Arrow Flight streams.
198 */
199message Criteria {
200  bytes expression = 1;
201}
202
203/*
204 * An opaque action specific for the service.
205 */
206message Action {
207  string type = 1;
208  bytes body = 2;
209}
210
211/*
212 * An opaque result returned after executing an action.
213 */
214message Result {
215  bytes body = 1;
216}
217
218/*
219 * Wrap the result of a getSchema call
220 */
221message SchemaResult {
222  // The schema of the dataset in its IPC form:
223  //   4 bytes - an optional IPC_CONTINUATION_TOKEN prefix
224  //   4 bytes - the byte length of the payload
225  //   a flatbuffer Message whose header is the Schema
226  bytes schema = 1;
227}
228
229/*
230 * The name or tag for a Flight. May be used as a way to retrieve or generate
231 * a flight or be used to expose a set of previously defined flights.
232 */
233message FlightDescriptor {
234
235  /*
236   * Describes what type of descriptor is defined.
237   */
238  enum DescriptorType {
239
240    // Protobuf pattern, not used.
241    UNKNOWN = 0;
242
243    /*
244     * A named path that identifies a dataset. A path is composed of a string
245     * or list of strings describing a particular dataset. This is conceptually
246     *  similar to a path inside a filesystem.
247     */
248    PATH = 1;
249
250    /*
251     * An opaque command to generate a dataset.
252     */
253    CMD = 2;
254  }
255
256  DescriptorType type = 1;
257
258  /*
259   * Opaque value used to express a command. Should only be defined when
260   * type = CMD.
261   */
262  bytes cmd = 2;
263
264  /*
265   * List of strings identifying a particular dataset. Should only be defined
266   * when type = PATH.
267   */
268  repeated string path = 3;
269}
270
271/*
272 * The access coordinates for retrieval of a dataset. With a FlightInfo, a
273 * consumer is able to determine how to retrieve a dataset.
274 */
275message FlightInfo {
276  // The schema of the dataset in its IPC form:
277  //   4 bytes - an optional IPC_CONTINUATION_TOKEN prefix
278  //   4 bytes - the byte length of the payload
279  //   a flatbuffer Message whose header is the Schema
280  bytes schema = 1;
281
282  /*
283   * The descriptor associated with this info.
284   */
285  FlightDescriptor flight_descriptor = 2;
286
287  /*
288   * A list of endpoints associated with the flight. To consume the
289   * whole flight, all endpoints (and hence all Tickets) must be
290   * consumed. Endpoints can be consumed in any order.
291   *
292   * In other words, an application can use multiple endpoints to
293   * represent partitioned data.
294   *
295   * If the returned data has an ordering, an application can use
296   * "FlightInfo.ordered = true" or should return the all data in a
297   * single endpoint. Otherwise, there is no ordering defined on
298   * endpoints or the data within.
299   *
300   * A client can read ordered data by reading data from returned
301   * endpoints, in order, from front to back.
302   *
303   * Note that a client may ignore "FlightInfo.ordered = true". If an
304   * ordering is important for an application, an application must
305   * choose one of them:
306   *
307   * * An application requires that all clients must read data in
308   *   returned endpoints order.
309   * * An application must return the all data in a single endpoint.
310   */
311  repeated FlightEndpoint endpoint = 3;
312
313  // Set these to -1 if unknown.
314  int64 total_records = 4;
315  int64 total_bytes = 5;
316
317  /*
318   * FlightEndpoints are in the same order as the data.
319   */
320  bool ordered = 6;
321
322  /*
323   * Application-defined metadata.
324   *
325   * There is no inherent or required relationship between this
326   * and the app_metadata fields in the FlightEndpoints or resulting
327   * FlightData messages. Since this metadata is application-defined,
328   * a given application could define there to be a relationship,
329   * but there is none required by the spec.
330   */
331  bytes app_metadata = 7;
332}
333
334/*
335 * The information to process a long-running query.
336 */
337message PollInfo {
338  /*
339   * The currently available results.
340   *
341   * If "flight_descriptor" is not specified, the query is complete
342   * and "info" specifies all results. Otherwise, "info" contains
343   * partial query results.
344   *
345   * Note that each PollInfo response contains a complete
346   * FlightInfo (not just the delta between the previous and current
347   * FlightInfo).
348   *
349   * Subsequent PollInfo responses may only append new endpoints to
350   * info.
351   *
352   * Clients can begin fetching results via DoGet(Ticket) with the
353   * ticket in the info before the query is
354   * completed. FlightInfo.ordered is also valid.
355   */
356  FlightInfo info = 1;
357
358  /*
359   * The descriptor the client should use on the next try.
360   * If unset, the query is complete.
361   */
362  FlightDescriptor flight_descriptor = 2;
363
364  /*
365   * Query progress. If known, must be in [0.0, 1.0] but need not be
366   * monotonic or nondecreasing. If unknown, do not set.
367   */
368  optional double progress = 3;
369
370  /*
371   * Expiration time for this request. After this passes, the server
372   * might not accept the retry descriptor anymore (and the query may
373   * be cancelled). This may be updated on a call to PollFlightInfo.
374   */
375  google.protobuf.Timestamp expiration_time = 4;
376}
377
378/*
379 * The request of the CancelFlightInfo action.
380 *
381 * The request should be stored in Action.body.
382 */
383message CancelFlightInfoRequest {
384  FlightInfo info = 1;
385}
386
387/*
388 * The result of a cancel operation.
389 *
390 * This is used by CancelFlightInfoResult.status.
391 */
392enum CancelStatus {
393  // The cancellation status is unknown. Servers should avoid using
394  // this value (send a NOT_FOUND error if the requested query is
395  // not known). Clients can retry the request.
396  CANCEL_STATUS_UNSPECIFIED = 0;
397  // The cancellation request is complete. Subsequent requests with
398  // the same payload may return CANCELLED or a NOT_FOUND error.
399  CANCEL_STATUS_CANCELLED = 1;
400  // The cancellation request is in progress. The client may retry
401  // the cancellation request.
402  CANCEL_STATUS_CANCELLING = 2;
403  // The query is not cancellable. The client should not retry the
404  // cancellation request.
405  CANCEL_STATUS_NOT_CANCELLABLE = 3;
406}
407
408/*
409 * The result of the CancelFlightInfo action.
410 *
411 * The result should be stored in Result.body.
412 */
413message CancelFlightInfoResult {
414  CancelStatus status = 1;
415}
416
417/*
418 * An opaque identifier that the service can use to retrieve a particular
419 * portion of a stream.
420 *
421 * Tickets are meant to be single use. It is an error/application-defined
422 * behavior to reuse a ticket.
423 */
424message Ticket {
425  bytes ticket = 1;
426}
427
428/*
429 * A location to retrieve a particular stream from. This URI should be one of
430 * the following:
431 *  - An empty string or the string 'arrow-flight-reuse-connection://?':
432 *    indicating that the ticket can be redeemed on the service where the
433 *    ticket was generated via a DoGet request.
434 *  - A valid grpc URI (grpc://, grpc+tls://, grpc+unix://, etc.):
435 *    indicating that the ticket can be redeemed on the service at the given
436 *    URI via a DoGet request.
437 *  - A valid HTTP URI (http://, https://, etc.):
438 *    indicating that the client should perform a GET request against the
439 *    given URI to retrieve the stream. The ticket should be empty
440 *    in this case and should be ignored by the client. Cloud object storage
441 *    can be utilized by presigned URLs or mediating the auth separately and
442 *    returning the full URL (e.g. https://amzn-s3-demo-bucket.s3.us-west-2.amazonaws.com/...).
443 *
444 * We allow non-Flight URIs for the purpose of allowing Flight services to indicate that
445 * results can be downloaded in formats other than Arrow (such as Parquet) or to allow
446 * direct fetching of results from a URI to reduce excess copying and data movement.
447 * In these cases, the following conventions should be followed by servers and clients:
448 *
449 *  - Unless otherwise specified by the 'Content-Type' header of the response,
450 *    a client should assume the response is using the Arrow IPC Streaming format.
451 *    Usage of an IANA media type like 'application/octet-stream' should be assumed to
452 *    be using the Arrow IPC Streaming format.
453 *  - The server may allow the client to choose a specific response format by
454 *    specifying an 'Accept' header in the request, such as 'application/vnd.apache.parquet'
455 *    or 'application/vnd.apache.arrow.stream'. If multiple types are requested and
456 *    supported by the server, the choice of which to use is server-specific. If
457 *    none of the requested content-types are supported, the server may respond with
458 *    either 406 (Not Acceptable) or 415 (Unsupported Media Type), or successfully
459 *    respond with a different format that it does support along with the correct
460 *    'Content-Type' header.
461 *
462 * Note: new schemes may be proposed in the future to allow for more flexibility based
463 * on community requests.
464 */
465message Location {
466  string uri = 1;
467}
468
469/*
470 * A particular stream or split associated with a flight.
471 */
472message FlightEndpoint {
473
474  /*
475   * Token used to retrieve this stream.
476   */
477  Ticket ticket = 1;
478
479  /*
480   * A list of URIs where this ticket can be redeemed via DoGet().
481   *
482   * If the list is empty, the expectation is that the ticket can only
483   * be redeemed on the current service where the ticket was
484   * generated.
485   *
486   * If the list is not empty, the expectation is that the ticket can be
487   * redeemed at any of the locations, and that the data returned will be
488   * equivalent. In this case, the ticket may only be redeemed at one of the
489   * given locations, and not (necessarily) on the current service. If one
490   * of the given locations is "arrow-flight-reuse-connection://?", the
491   * client may redeem the ticket on the service where the ticket was
492   * generated (i.e., the same as above), in addition to the other
493   * locations. (This URI was chosen to maximize compatibility, as 'scheme:'
494   * or 'scheme://' are not accepted by Java's java.net.URI.)
495   *
496   * In other words, an application can use multiple locations to
497   * represent redundant and/or load balanced services.
498   */
499  repeated Location location = 2;
500
501  /*
502   * Expiration time of this stream. If present, clients may assume
503   * they can retry DoGet requests. Otherwise, it is
504   * application-defined whether DoGet requests may be retried.
505   */
506  google.protobuf.Timestamp expiration_time = 3;
507
508  /*
509   * Application-defined metadata.
510   *
511   * There is no inherent or required relationship between this
512   * and the app_metadata fields in the FlightInfo or resulting
513   * FlightData messages. Since this metadata is application-defined,
514   * a given application could define there to be a relationship,
515   * but there is none required by the spec.
516   */
517  bytes app_metadata = 4;
518}
519
520/*
521 * The request of the RenewFlightEndpoint action.
522 *
523 * The request should be stored in Action.body.
524 */
525message RenewFlightEndpointRequest {
526  FlightEndpoint endpoint = 1;
527}
528
529/*
530 * A batch of Arrow data as part of a stream of batches.
531 */
532message FlightData {
533
534  /*
535   * The descriptor of the data. This is only relevant when a client is
536   * starting a new DoPut stream.
537   */
538  FlightDescriptor flight_descriptor = 1;
539
540  /*
541   * Header for message data as described in Message.fbs::Message.
542   */
543  bytes data_header = 2;
544
545  /*
546   * Application-defined metadata.
547   */
548  bytes app_metadata = 3;
549
550  /*
551   * The actual batch of Arrow data. Preferably handled with minimal-copies
552   * coming last in the definition to help with sidecar patterns (it is
553   * expected that some implementations will fetch this field off the wire
554   * with specialized code to avoid extra memory copies).
555   */
556  bytes data_body = 1000;
557}
558
559/**
560 * The response message associated with the submission of a DoPut.
561 */
562message PutResult {
563  bytes app_metadata = 1;
564}
565
566/*
567 * EXPERIMENTAL: Union of possible value types for a Session Option to be set to.
568 *
569 * By convention, an attempt to set a valueless SessionOptionValue should
570 * attempt to unset or clear the named option value on the server.
571 */
572message SessionOptionValue {
573  message StringListValue {
574    repeated string values = 1;
575  }
576
577  oneof option_value {
578    string string_value = 1;
579    bool bool_value = 2;
580    sfixed64 int64_value = 3;
581    double double_value = 4;
582    StringListValue string_list_value = 5;
583  }
584}
585
586/*
587 * EXPERIMENTAL: A request to set session options for an existing or new (implicit)
588 * server session.
589 *
590 * Sessions are persisted and referenced via a transport-level state management, typically
591 * RFC 6265 HTTP cookies when using an HTTP transport.  The suggested cookie name or state
592 * context key is 'arrow_flight_session_id', although implementations may freely choose their
593 * own name.
594 *
595 * Session creation (if one does not already exist) is implied by this RPC request, however
596 * server implementations may choose to initiate a session that also contains client-provided
597 * session options at any other time, e.g. on authentication, or when any other call is made
598 * and the server wishes to use a session to persist any state (or lack thereof).
599 */
600message SetSessionOptionsRequest {
601  map<string, SessionOptionValue> session_options = 1;
602}
603
604/*
605 * EXPERIMENTAL: The results (individually) of setting a set of session options.
606 *
607 * Option names should only be present in the response if they were not successfully
608 * set on the server; that is, a response without an Error for a name provided in the
609 * SetSessionOptionsRequest implies that the named option value was set successfully.
610 */
611message SetSessionOptionsResult {
612  enum ErrorValue {
613    // Protobuf deserialization fallback value: The status is unknown or unrecognized.
614    // Servers should avoid using this value. The request may be retried by the client.
615    UNSPECIFIED = 0;
616    // The given session option name is invalid.
617    INVALID_NAME = 1;
618    // The session option value or type is invalid.
619    INVALID_VALUE = 2;
620    // The session option cannot be set.
621    ERROR = 3;
622  }
623
624  message Error {
625    ErrorValue value = 1;
626  }
627
628  map<string, Error> errors = 1;
629}
630
631/*
632 * EXPERIMENTAL: A request to access the session options for the current server session.
633 *
634 * The existing session is referenced via a cookie header or similar (see
635 * SetSessionOptionsRequest above); it is an error to make this request with a missing,
636 * invalid, or expired session cookie header or other implementation-defined session
637 * reference token.
638 */
639message GetSessionOptionsRequest {
640}
641
642/*
643 * EXPERIMENTAL: The result containing the current server session options.
644 */
645message GetSessionOptionsResult {
646    map<string, SessionOptionValue> session_options = 1;
647}
648
649/*
650 * Request message for the "Close Session" action.
651 *
652 * The exiting session is referenced via a cookie header.
653 */
654message CloseSessionRequest {
655}
656
657/*
658 * The result of closing a session.
659 */
660message CloseSessionResult {
661  enum Status {
662    // Protobuf deserialization fallback value: The session close status is unknown or
663    // not recognized. Servers should avoid using this value (send a NOT_FOUND error if
664    // the requested session is not known or expired). Clients can retry the request.
665    UNSPECIFIED = 0;
666    // The session close request is complete. Subsequent requests with
667    // the same session produce a NOT_FOUND error.
668    CLOSED = 1;
669    // The session close request is in progress. The client may retry
670    // the close request.
671    CLOSING = 2;
672    // The session is not closeable. The client should not retry the
673    // close request.
674    NOT_CLOSEABLE = 3;
675  }
676
677  Status status = 1;
678}