Arrow Flight RPC#

Arrow Flight 是一个基于 Arrow 数据的高性能数据服务 RPC 框架,构建在 gRPCIPC 格式 之上。

Flight 以 Arrow 记录批次的流为组织核心,可以从另一个服务下载或上传到另一个服务。一组元数据方法提供了流的发现和自省,以及实现应用程序特定方法的能力。

方法和消息线格式由 Protobuf 定义,从而能够与可能分别支持 gRPC 和 Arrow 但不支持 Flight 的客户端实现互操作性。但是,Flight 实现包含进一步的优化以避免 Protobuf 使用中的开销(主要围绕避免过多的内存复制)。

RPC 方法和请求模式#

Flight 定义了一组 RPC 方法,用于上传/下载数据、检索有关数据流的元数据、列出可用的数据流以及实现应用程序特定的 RPC 方法。Flight 服务实现这些方法的某个子集,而 Flight 客户端可以调用这些方法中的任何一个。

数据流由描述符(FlightDescriptor 消息)标识,这些描述符要么是路径,要么是任意二进制命令。例如,描述符可以编码 SQL 查询、分布式文件系统上文件的路径,甚至可以是腌制的 Python 对象;应用程序可以根据需要使用此消息。

因此,一个 Flight 客户端可以连接到任何服务并执行基本操作。为了促进这一点,Flight 服务 *预期* 支持一些常见的请求模式,如下所述。当然,应用程序可以忽略兼容性,而只是将 Flight RPC 方法视为其自身目的的低级构建块。

有关所涉及方法和消息的完整详细信息,请参阅 协议缓冲区定义

下载数据#

希望下载数据的客户端将

%% Licensed to the Apache Software Foundation (ASF) under one %% or more contributor license agreements. See the NOTICE file %% distributed with this work for additional information %% regarding copyright ownership. The ASF licenses this file %% to you under the Apache License, Version 2.0 (the %% "License"); you may not use this file except in compliance %% with the License. You may obtain a copy of the License at %% %% http://www.apache.org/licenses/LICENSE-2.0 %% %% Unless required by applicable law or agreed to in writing, %% software distributed under the License is distributed on an %% "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY %% KIND, either express or implied. See the License for the %% specific language governing permissions and limitations %% under the License. sequenceDiagram autonumber participant Client participant Metadata Server participant Data Server Client->>Metadata Server: GetFlightInfo(FlightDescriptor) Metadata Server->>Client: FlightInfo{endpoints: [FlightEndpoint{ticket: Ticket}, …]} Note over Client, Data Server: This may be parallelized loop for each endpoint in FlightInfo.endpoints Client->>Data Server: DoGet(Ticket) Data Server->>Client: stream of FlightData end

通过 DoGet 检索数据。#

  1. 构建或获取他们感兴趣的数据集的 FlightDescriptor

    客户端可能已经知道他们想要的描述符,或者他们可以使用诸如 ListFlights 之类的方法来发现它们。

  2. 调用 GetFlightInfo(FlightDescriptor) 以获取 FlightInfo 消息。

    Flight 不要求数据与元数据位于同一服务器上。因此,FlightInfo 包含有关数据所在位置的详细信息,以便客户端可以从相应的服务器获取数据。这被编码为 FlightInfo 内的一系列 FlightEndpoint 消息。每个端点表示包含响应数据子集的某个位置。

    端点包含可以从中检索此数据的位置(服务器地址)列表和一个 Ticket,这是一个不透明的二进制令牌,服务器将使用它来识别正在请求的数据。

    如果 FlightInfo.ordered 为真,则表示不同端点的数据之间存在某种顺序。客户端应该产生与从每个端点返回的数据按顺序从前到后连接时相同的结果。

    如果 FlightInfo.ordered 为假,则客户端可以以任意顺序返回来自任何端点的数据。来自任何特定端点的数据必须按顺序返回,但来自不同端点的数据可以交错以允许并行获取。

    请注意,由于某些客户端可能会忽略 FlightInfo.ordered,如果顺序很重要且无法确保客户端支持,则服务器应返回单个端点。

    响应还包含其他元数据,例如模式,以及可选的数据集大小估计。

  3. 使用服务器返回的每个端点。

    要使用端点,客户端应连接到端点中的一个位置,然后使用端点中的票证调用 DoGet(Ticket)。这将为客户端提供 Arrow 记录批次的流。

    如果服务器希望指示数据位于本地服务器而不是其他位置,则它可以返回一个空的 location 列表。然后,客户端可以使用与原始服务器的现有连接来获取数据。否则,客户端必须连接到指示的位置之一。

    服务器可以在其他服务器位置旁边列出“自身”作为位置。通常,这需要服务器知道其公共地址,但它也可以使用特殊的 URI 字符串 arrow-flight-reuse-connection://? 告诉客户端他们可以使用与同一服务器的现有连接,而无需能够命名自身。请参阅下面 连接重用

    这样,端点内部的位置也可以被认为是执行旁路负载平衡或服务发现功能。并且端点可以表示已分区或以其他方式分布的数据。

    客户端必须使用所有端点才能检索完整的数据集。客户端可以以任何顺序或甚至并行使用端点,或者在多台机器之间分配端点以供使用;这由应用程序实现。客户端还可以使用 FlightInfo.ordered。有关 FlightInfo.ordered 的详细信息,请参阅上一项。

    每个端点可能都有过期时间(FlightEndpoint.expiration_time)。如果端点有过期时间,则客户端可以通过 DoGet 多次获取数据,直到达到过期时间为止。否则,DoGet 请求是否可以重试由应用程序定义。过期时间表示为 google.protobuf.Timestamp

    如果过期时间很短,则客户端可以通过 RenewFlightEndpoint 操作来延长过期时间。客户端需要使用 DoActionRenewFlightEndpoint 操作类型来延长过期时间。Action.body 必须是具有要更新的 FlightEndpointRenewFlightEndpointRequest

    客户端可能能够通过 CancelFlightInfo 操作取消返回的 FlightInfo。客户端需要使用 DoActionCancelFlightInfo 操作类型来取消 FlightInfo

通过运行繁重的查询下载数据#

客户端可能需要请求一个繁重的查询来下载数据。但是,GetFlightInfo 不会在查询完成之前返回,因此客户端会被阻塞。在这种情况下,客户端可以使用 PollFlightInfo 而不是 GetFlightInfo

%% Licensed to the Apache Software Foundation (ASF) under one %% or more contributor license agreements. See the NOTICE file %% distributed with this work for additional information %% regarding copyright ownership. The ASF licenses this file %% to you under the Apache License, Version 2.0 (the %% "License"); you may not use this file except in compliance %% with the License. You may obtain a copy of the License at %% %% http://www.apache.org/licenses/LICENSE-2.0 %% %% Unless required by applicable law or agreed to in writing, %% software distributed under the License is distributed on an %% "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY %% KIND, either express or implied. See the License for the %% specific language governing permissions and limitations %% under the License. sequenceDiagram autonumber participant Client participant Metadata Server participant Data Server Client->>Metadata Server: PollFlightInfo(FlightDescriptor) Metadata Server->>Client: PollInfo{descriptor: FlightDescriptor', ...} Client->>Metadata Server: PollFlightInfo(FlightDescriptor') Metadata Server->>Client: PollInfo{descriptor: FlightDescriptor'', ...} Client->>Metadata Server: PollFlightInfo(FlightDescriptor'') Metadata Server->>Client: PollInfo{descriptor: null, info: FlightInfo{endpoints: [FlightEndpoint{ticket: Ticket}, …]} Note over Client, Data Server: This may be parallelized Note over Client, Data Server: Some endpoints may be processed while polling loop for each endpoint in FlightInfo.endpoints Client->>Data Server: DoGet(Ticket) Data Server->>Client: stream of FlightData end

通过 PollFlightInfo 轮询长时间运行的查询。#

  1. 构建或获取 FlightDescriptor,如前所述。

  2. 调用 PollFlightInfo(FlightDescriptor) 以获取 PollInfo 消息。

    服务器应该在第一次调用时尽快响应。因此,客户端不应该等待第一个 PollInfo 响应。

    如果查询未完成,PollInfo.flight_descriptor 将具有 FlightDescriptor。客户端应该使用该描述符(而不是原始的 FlightDescriptor)来调用下一个 PollFlightInfo()。如果客户端在两者之间错过了更新,服务器应该识别不一定是最新的 PollInfo.flight_descriptor

    如果查询已完成,则 PollInfo.flight_descriptor 未设置。

    PollInfo.info 是迄今为止可用的当前结果。每次都是完整的 FlightInfo,而不仅仅是前一个和当前 FlightInfo 之间的增量。服务器每次都应该仅附加到 PollInfo.info 中的端点。因此,即使查询尚未完成,客户端也可以使用 PollInfo.info 中的 Ticket 运行 DoGet(Ticket)FlightInfo.ordered 也有效。

    服务器不应在结果与上次不同之前响应。这样,客户端就可以“长时间轮询”更新,而无需不断发出请求。如果需要,客户端可以设置短超时以避免阻塞调用。

    PollInfo.progress 可能已设置。它表示查询的进度。如果已设置,则该值必须在 [0.0, 1.0] 中。该值不一定单调或非递减。服务器可能只通过更新 PollInfo.progress 值来响应,尽管它不应该向客户端发送大量更新。

    PollInfo.timestamp 是此请求的过期时间。在此之后,服务器可能不再接受轮询描述符,并且查询可能会被取消。这可能会在调用 PollFlightInfo 时更新。过期时间表示为 google.protobuf.Timestamp

    客户端可能能够通过 CancelFlightInfo 操作取消查询。

    如果查询失败,服务器应该返回错误状态而不是响应。客户端不应轮询请求,除非 TIMED_OUTUNAVAILABLE,这可能不是来自服务器的。

  3. 使用服务器返回的每个端点,如前所述。

上传数据#

要上传数据,客户端将

%% Licensed to the Apache Software Foundation (ASF) under one %% or more contributor license agreements. See the NOTICE file %% distributed with this work for additional information %% regarding copyright ownership. The ASF licenses this file %% to you under the Apache License, Version 2.0 (the %% "License"); you may not use this file except in compliance %% with the License. You may obtain a copy of the License at %% %% http://www.apache.org/licenses/LICENSE-2.0 %% %% Unless required by applicable law or agreed to in writing, %% software distributed under the License is distributed on an %% "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY %% KIND, either express or implied. See the License for the %% specific language governing permissions and limitations %% under the License. sequenceDiagram autonumber participant Client participant Server Note right of Client: The first FlightData includes a FlightDescriptor Client->>Server: DoPut(FlightData) Client->>Server: stream of FlightData Server->>Client: PutResult{app_metadata}

通过 DoPut 上传数据。#

  1. 构建或获取 FlightDescriptor,如前所述。

  2. 调用 DoPut(FlightData) 并上传 Arrow 记录批次的流。

    第一个消息中包含 FlightDescriptor,以便服务器可以识别数据集。

DoPut 允许服务器向客户端发送带有自定义元数据的响应消息。这可用于实现诸如可恢复写入之类的事情(例如,服务器可以定期发送消息以指示迄今为止已提交多少行)。

交换数据#

某些用例可能需要在单个调用中上传和下载数据。虽然这可以使用多个调用来模拟,但如果应用程序是有状态的,这可能很困难。例如,应用程序可能希望实现一个客户端上传数据并且服务器响应该数据转换的调用;如果使用 DoGetDoPut 实现,这将需要是有状态的。相反,DoExchange 允许将其实现为单个调用。客户端将

%% Licensed to the Apache Software Foundation (ASF) under one %% or more contributor license agreements. See the NOTICE file %% distributed with this work for additional information %% regarding copyright ownership. The ASF licenses this file %% to you under the Apache License, Version 2.0 (the %% "License"); you may not use this file except in compliance %% with the License. You may obtain a copy of the License at %% %% http://www.apache.org/licenses/LICENSE-2.0 %% %% Unless required by applicable law or agreed to in writing, %% software distributed under the License is distributed on an %% "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY %% KIND, either express or implied. See the License for the %% specific language governing permissions and limitations %% under the License. sequenceDiagram autonumber participant Client participant Server Note right of Client: The first FlightData includes a FlightDescriptor Client->>Server: DoExchange(FlightData) par [Client sends data] Client->>Server: stream of FlightData and [Server sends data] Server->>Client: stream of FlightData end

使用 DoExchange 进行复杂的数据流。#

  1. 构建或获取 FlightDescriptor,如前所述。

  2. 调用 DoExchange(FlightData)

    DoPut 一样,第一个消息中包含 FlightDescriptor。此时,客户端和服务器都可以同时将数据流式传输到另一方。

身份验证#

Flight 支持各种身份验证方法,应用程序可以根据其需要对其进行自定义。

“握手”身份验证

这是分两部分实现的。在连接时,客户端调用Handshake RPC 方法,并且应用程序定义的身份验证处理程序可以与其在服务器上的对应方交换任意数量的消息。然后,处理程序提供一个二进制令牌。Flight 客户端随后将在所有未来调用的标头中包含此令牌,该令牌由服务器身份验证处理程序进行验证。

应用程序可以使用其中的任何部分;例如,它们可以忽略初始握手并在每次调用时发送外部获取的令牌(例如,承载令牌),或者它们可以在握手期间建立信任并且不验证每次调用的令牌,将连接视为有状态的(“登录”模式)。

警告

除非在每次调用时都验证令牌,否则此模式不安全,尤其是在存在第 7 层负载均衡器的情况下,这在使用 gRPC 时很常见,或者如果 gRPC 透明地重新连接客户端。

基于标头/中间件的身份验证

客户端可以在调用中包含自定义标头。然后可以实现自定义中间件以在服务器端验证和接受/拒绝调用。

双向 TLS (mTLS)

客户端在连接建立期间提供证书,该证书由服务器验证。应用程序不需要实现任何身份验证代码,但必须预配和分发证书。

这可能仅在某些实现中可用,并且仅在也启用 TLS 时可用。

一些 Flight 实现也可能公开底层的 gRPC API,在这种情况下,任何gRPC 支持的身份验证方法都可用。

位置 URI#

Flight 主要根据其 Protobuf 和 gRPC 规范(如下所示)进行定义,但 Arrow 实现也可能支持替代传输(请参阅Flight RPC)。客户端和服务器需要知道对于给定位置中的 URI 使用哪种传输,因此 Flight 实现应为给定传输使用以下 URI 方案

传输

URI 方案

gRPC(纯文本)

grpc: 或 grpc+tcp

gRPC(TLS)

grpc+tls

gRPC(Unix 域套接字)

grpc+unix

(重用连接)

arrow-flight-reuse-connection

UCX(纯文本)

ucx

连接重用#

上面的“重用连接”不是特定传输。相反,这意味着客户端可能会尝试对与最初从中获取 FlightInfo 的相同服务器(以及通过相同连接)执行 DoGet(即,它针对其调用了 GetFlightInfo)。这与未返回任何特定Location时的方式相同。

这允许服务器将“自身”作为获取数据的可能位置之一返回,而无需知道自己的公共地址,这在难以或不可能知道此地址的部署中非常有用。例如,开发人员可能会将云环境中的远程服务转发到其本地机器;在这种情况下,远程服务将无法知道它正在通过其访问的本地主机名和端口。

出于兼容性原因,URI 应始终为arrow-flight-reuse-connection://?,后跟空查询字符串。Java 的 URI 实现不接受scheme:scheme://,并且 C++ 的实现不接受空字符串,因此明显的候选者不兼容。所选表示形式可以由这两种实现以及 Go 的net/url和 Python 的urllib.parse进行解析。

错误处理#

Arrow Flight 定义了自己的错误代码集。实现因语言而异(例如,在 C++ 中,Unimplemented 是一个通用的 Arrow 错误状态,而在 Java 中它是 Flight 特定的异常),但会公开以下集合

错误代码

描述

UNKNOWN

未知错误。如果其他错误不适用,则为默认值。

INTERNAL

服务实现中发生内部错误。

INVALID_ARGUMENT

客户端向 RPC 传递了无效参数。

TIMED_OUT

操作超过了超时或截止时间。

NOT_FOUND

未找到请求的资源(操作、数据流)。

ALREADY_EXISTS

资源已存在。

CANCELLED

操作已取消(由客户端或服务器取消)。

UNAUTHENTICATED

客户端未经身份验证。

UNAUTHORIZED

客户端已通过身份验证,但没有请求操作的权限。

UNIMPLEMENTED

RPC 未实现。

UNAVAILABLE

服务器不可用。客户端可能出于连接原因发出此错误。

外部资源#

协议缓冲区定义#

  1/*
  2 * Licensed to the Apache Software Foundation (ASF) under one
  3 * or more contributor license agreements.  See the NOTICE file
  4 * distributed with this work for additional information
  5 * regarding copyright ownership.  The ASF licenses this file
  6 * to you under the Apache License, Version 2.0 (the
  7 * "License"); you may not use this file except in compliance
  8 * with the License.  You may obtain a copy of the License at
  9 * <p>
 10 * http://www.apache.org/licenses/LICENSE-2.0
 11 * <p>
 12 * Unless required by applicable law or agreed to in writing, software
 13 * distributed under the License is distributed on an "AS IS" BASIS,
 14 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 15 * See the License for the specific language governing permissions and
 16 * limitations under the License.
 17 */
 18
 19syntax = "proto3";
 20import "google/protobuf/timestamp.proto";
 21
 22option java_package = "org.apache.arrow.flight.impl";
 23option go_package = "github.com/apache/arrow-go/arrow/flight/gen/flight";
 24option csharp_namespace = "Apache.Arrow.Flight.Protocol";
 25
 26package arrow.flight.protocol;
 27
 28/*
 29 * A flight service is an endpoint for retrieving or storing Arrow data. A
 30 * flight service can expose one or more predefined endpoints that can be
 31 * accessed using the Arrow Flight Protocol. Additionally, a flight service
 32 * can expose a set of actions that are available.
 33 */
 34service FlightService {
 35
 36  /*
 37   * Handshake between client and server. Depending on the server, the
 38   * handshake may be required to determine the token that should be used for
 39   * future operations. Both request and response are streams to allow multiple
 40   * round-trips depending on auth mechanism.
 41   */
 42  rpc Handshake(stream HandshakeRequest) returns (stream HandshakeResponse) {}
 43
 44  /*
 45   * Get a list of available streams given a particular criteria. Most flight
 46   * services will expose one or more streams that are readily available for
 47   * retrieval. This api allows listing the streams available for
 48   * consumption. A user can also provide a criteria. The criteria can limit
 49   * the subset of streams that can be listed via this interface. Each flight
 50   * service allows its own definition of how to consume criteria.
 51   */
 52  rpc ListFlights(Criteria) returns (stream FlightInfo) {}
 53
 54  /*
 55   * For a given FlightDescriptor, get information about how the flight can be
 56   * consumed. This is a useful interface if the consumer of the interface
 57   * already can identify the specific flight to consume. This interface can
 58   * also allow a consumer to generate a flight stream through a specified
 59   * descriptor. For example, a flight descriptor might be something that
 60   * includes a SQL statement or a Pickled Python operation that will be
 61   * executed. In those cases, the descriptor will not be previously available
 62   * within the list of available streams provided by ListFlights but will be
 63   * available for consumption for the duration defined by the specific flight
 64   * service.
 65   */
 66  rpc GetFlightInfo(FlightDescriptor) returns (FlightInfo) {}
 67
 68  /*
 69   * For a given FlightDescriptor, start a query and get information
 70   * to poll its execution status. This is a useful interface if the
 71   * query may be a long-running query. The first PollFlightInfo call
 72   * should return as quickly as possible. (GetFlightInfo doesn't
 73   * return until the query is complete.)
 74   *
 75   * A client can consume any available results before
 76   * the query is completed. See PollInfo.info for details.
 77   *
 78   * A client can poll the updated query status by calling
 79   * PollFlightInfo() with PollInfo.flight_descriptor. A server
 80   * should not respond until the result would be different from last
 81   * time. That way, the client can "long poll" for updates
 82   * without constantly making requests. Clients can set a short timeout
 83   * to avoid blocking calls if desired.
 84   *
 85   * A client can't use PollInfo.flight_descriptor after
 86   * PollInfo.expiration_time passes. A server might not accept the
 87   * retry descriptor anymore and the query may be cancelled.
 88   *
 89   * A client may use the CancelFlightInfo action with
 90   * PollInfo.info to cancel the running query.
 91   */
 92  rpc PollFlightInfo(FlightDescriptor) returns (PollInfo) {}
 93
 94  /*
 95   * For a given FlightDescriptor, get the Schema as described in Schema.fbs::Schema
 96   * This is used when a consumer needs the Schema of flight stream. Similar to
 97   * GetFlightInfo this interface may generate a new flight that was not previously
 98   * available in ListFlights.
 99   */
100   rpc GetSchema(FlightDescriptor) returns (SchemaResult) {}
101
102  /*
103   * Retrieve a single stream associated with a particular descriptor
104   * associated with the referenced ticket. A Flight can be composed of one or
105   * more streams where each stream can be retrieved using a separate opaque
106   * ticket that the flight service uses for managing a collection of streams.
107   */
108  rpc DoGet(Ticket) returns (stream FlightData) {}
109
110  /*
111   * Push a stream to the flight service associated with a particular
112   * flight stream. This allows a client of a flight service to upload a stream
113   * of data. Depending on the particular flight service, a client consumer
114   * could be allowed to upload a single stream per descriptor or an unlimited
115   * number. In the latter, the service might implement a 'seal' action that
116   * can be applied to a descriptor once all streams are uploaded.
117   */
118  rpc DoPut(stream FlightData) returns (stream PutResult) {}
119
120  /*
121   * Open a bidirectional data channel for a given descriptor. This
122   * allows clients to send and receive arbitrary Arrow data and
123   * application-specific metadata in a single logical stream. In
124   * contrast to DoGet/DoPut, this is more suited for clients
125   * offloading computation (rather than storage) to a Flight service.
126   */
127  rpc DoExchange(stream FlightData) returns (stream FlightData) {}
128
129  /*
130   * Flight services can support an arbitrary number of simple actions in
131   * addition to the possible ListFlights, GetFlightInfo, DoGet, DoPut
132   * operations that are potentially available. DoAction allows a flight client
133   * to do a specific action against a flight service. An action includes
134   * opaque request and response objects that are specific to the type action
135   * being undertaken.
136   */
137  rpc DoAction(Action) returns (stream Result) {}
138
139  /*
140   * A flight service exposes all of the available action types that it has
141   * along with descriptions. This allows different flight consumers to
142   * understand the capabilities of the flight service.
143   */
144  rpc ListActions(Empty) returns (stream ActionType) {}
145}
146
147/*
148 * The request that a client provides to a server on handshake.
149 */
150message HandshakeRequest {
151
152  /*
153   * A defined protocol version
154   */
155  uint64 protocol_version = 1;
156
157  /*
158   * Arbitrary auth/handshake info.
159   */
160  bytes payload = 2;
161}
162
163message HandshakeResponse {
164
165  /*
166   * A defined protocol version
167   */
168  uint64 protocol_version = 1;
169
170  /*
171   * Arbitrary auth/handshake info.
172   */
173  bytes payload = 2;
174}
175
176/*
177 * A message for doing simple auth.
178 */
179message BasicAuth {
180  string username = 2;
181  string password = 3;
182}
183
184message Empty {}
185
186/*
187 * Describes an available action, including both the name used for execution
188 * along with a short description of the purpose of the action.
189 */
190message ActionType {
191  string type = 1;
192  string description = 2;
193}
194
195/*
196 * A service specific expression that can be used to return a limited set
197 * of available Arrow Flight streams.
198 */
199message Criteria {
200  bytes expression = 1;
201}
202
203/*
204 * An opaque action specific for the service.
205 */
206message Action {
207  string type = 1;
208  bytes body = 2;
209}
210
211/*
212 * An opaque result returned after executing an action.
213 */
214message Result {
215  bytes body = 1;
216}
217
218/*
219 * Wrap the result of a getSchema call
220 */
221message SchemaResult {
222  // The schema of the dataset in its IPC form:
223  //   4 bytes - an optional IPC_CONTINUATION_TOKEN prefix
224  //   4 bytes - the byte length of the payload
225  //   a flatbuffer Message whose header is the Schema
226  bytes schema = 1;
227}
228
229/*
230 * The name or tag for a Flight. May be used as a way to retrieve or generate
231 * a flight or be used to expose a set of previously defined flights.
232 */
233message FlightDescriptor {
234
235  /*
236   * Describes what type of descriptor is defined.
237   */
238  enum DescriptorType {
239
240    // Protobuf pattern, not used.
241    UNKNOWN = 0;
242
243    /*
244     * A named path that identifies a dataset. A path is composed of a string
245     * or list of strings describing a particular dataset. This is conceptually
246     *  similar to a path inside a filesystem.
247     */
248    PATH = 1;
249
250    /*
251     * An opaque command to generate a dataset.
252     */
253    CMD = 2;
254  }
255
256  DescriptorType type = 1;
257
258  /*
259   * Opaque value used to express a command. Should only be defined when
260   * type = CMD.
261   */
262  bytes cmd = 2;
263
264  /*
265   * List of strings identifying a particular dataset. Should only be defined
266   * when type = PATH.
267   */
268  repeated string path = 3;
269}
270
271/*
272 * The access coordinates for retrieval of a dataset. With a FlightInfo, a
273 * consumer is able to determine how to retrieve a dataset.
274 */
275message FlightInfo {
276  // The schema of the dataset in its IPC form:
277  //   4 bytes - an optional IPC_CONTINUATION_TOKEN prefix
278  //   4 bytes - the byte length of the payload
279  //   a flatbuffer Message whose header is the Schema
280  bytes schema = 1;
281
282  /*
283   * The descriptor associated with this info.
284   */
285  FlightDescriptor flight_descriptor = 2;
286
287  /*
288   * A list of endpoints associated with the flight. To consume the
289   * whole flight, all endpoints (and hence all Tickets) must be
290   * consumed. Endpoints can be consumed in any order.
291   *
292   * In other words, an application can use multiple endpoints to
293   * represent partitioned data.
294   *
295   * If the returned data has an ordering, an application can use
296   * "FlightInfo.ordered = true" or should return the all data in a
297   * single endpoint. Otherwise, there is no ordering defined on
298   * endpoints or the data within.
299   *
300   * A client can read ordered data by reading data from returned
301   * endpoints, in order, from front to back.
302   *
303   * Note that a client may ignore "FlightInfo.ordered = true". If an
304   * ordering is important for an application, an application must
305   * choose one of them:
306   *
307   * * An application requires that all clients must read data in
308   *   returned endpoints order.
309   * * An application must return the all data in a single endpoint.
310   */
311  repeated FlightEndpoint endpoint = 3;
312
313  // Set these to -1 if unknown.
314  int64 total_records = 4;
315  int64 total_bytes = 5;
316
317  /*
318   * FlightEndpoints are in the same order as the data.
319   */
320  bool ordered = 6;
321
322  /*
323   * Application-defined metadata.
324   *
325   * There is no inherent or required relationship between this
326   * and the app_metadata fields in the FlightEndpoints or resulting
327   * FlightData messages. Since this metadata is application-defined,
328   * a given application could define there to be a relationship,
329   * but there is none required by the spec.
330   */
331  bytes app_metadata = 7;
332}
333
334/*
335 * The information to process a long-running query.
336 */
337message PollInfo {
338  /*
339   * The currently available results.
340   *
341   * If "flight_descriptor" is not specified, the query is complete
342   * and "info" specifies all results. Otherwise, "info" contains
343   * partial query results.
344   *
345   * Note that each PollInfo response contains a complete
346   * FlightInfo (not just the delta between the previous and current
347   * FlightInfo).
348   *
349   * Subsequent PollInfo responses may only append new endpoints to
350   * info.
351   *
352   * Clients can begin fetching results via DoGet(Ticket) with the
353   * ticket in the info before the query is
354   * completed. FlightInfo.ordered is also valid.
355   */
356  FlightInfo info = 1;
357
358  /*
359   * The descriptor the client should use on the next try.
360   * If unset, the query is complete.
361   */
362  FlightDescriptor flight_descriptor = 2;
363
364  /*
365   * Query progress. If known, must be in [0.0, 1.0] but need not be
366   * monotonic or nondecreasing. If unknown, do not set.
367   */
368  optional double progress = 3;
369
370  /*
371   * Expiration time for this request. After this passes, the server
372   * might not accept the retry descriptor anymore (and the query may
373   * be cancelled). This may be updated on a call to PollFlightInfo.
374   */
375  google.protobuf.Timestamp expiration_time = 4;
376}
377
378/*
379 * The request of the CancelFlightInfo action.
380 *
381 * The request should be stored in Action.body.
382 */
383message CancelFlightInfoRequest {
384  FlightInfo info = 1;
385}
386
387/*
388 * The result of a cancel operation.
389 *
390 * This is used by CancelFlightInfoResult.status.
391 */
392enum CancelStatus {
393  // The cancellation status is unknown. Servers should avoid using
394  // this value (send a NOT_FOUND error if the requested query is
395  // not known). Clients can retry the request.
396  CANCEL_STATUS_UNSPECIFIED = 0;
397  // The cancellation request is complete. Subsequent requests with
398  // the same payload may return CANCELLED or a NOT_FOUND error.
399  CANCEL_STATUS_CANCELLED = 1;
400  // The cancellation request is in progress. The client may retry
401  // the cancellation request.
402  CANCEL_STATUS_CANCELLING = 2;
403  // The query is not cancellable. The client should not retry the
404  // cancellation request.
405  CANCEL_STATUS_NOT_CANCELLABLE = 3;
406}
407
408/*
409 * The result of the CancelFlightInfo action.
410 *
411 * The result should be stored in Result.body.
412 */
413message CancelFlightInfoResult {
414  CancelStatus status = 1;
415}
416
417/*
418 * An opaque identifier that the service can use to retrieve a particular
419 * portion of a stream.
420 *
421 * Tickets are meant to be single use. It is an error/application-defined
422 * behavior to reuse a ticket.
423 */
424message Ticket {
425  bytes ticket = 1;
426}
427
428/*
429 * A location where a Flight service will accept retrieval of a particular
430 * stream given a ticket.
431 */
432message Location {
433  string uri = 1;
434}
435
436/*
437 * A particular stream or split associated with a flight.
438 */
439message FlightEndpoint {
440
441  /*
442   * Token used to retrieve this stream.
443   */
444  Ticket ticket = 1;
445
446  /*
447   * A list of URIs where this ticket can be redeemed via DoGet().
448   *
449   * If the list is empty, the expectation is that the ticket can only
450   * be redeemed on the current service where the ticket was
451   * generated.
452   *
453   * If the list is not empty, the expectation is that the ticket can be
454   * redeemed at any of the locations, and that the data returned will be
455   * equivalent. In this case, the ticket may only be redeemed at one of the
456   * given locations, and not (necessarily) on the current service. If one
457   * of the given locations is "arrow-flight-reuse-connection://?", the
458   * client may redeem the ticket on the service where the ticket was
459   * generated (i.e., the same as above), in addition to the other
460   * locations. (This URI was chosen to maximize compatibility, as 'scheme:'
461   * or 'scheme://' are not accepted by Java's java.net.URI.)
462   *
463   * In other words, an application can use multiple locations to
464   * represent redundant and/or load balanced services.
465   */
466  repeated Location location = 2;
467
468  /*
469   * Expiration time of this stream. If present, clients may assume
470   * they can retry DoGet requests. Otherwise, it is
471   * application-defined whether DoGet requests may be retried.
472   */
473  google.protobuf.Timestamp expiration_time = 3;
474
475  /*
476   * Application-defined metadata.
477   *
478   * There is no inherent or required relationship between this
479   * and the app_metadata fields in the FlightInfo or resulting
480   * FlightData messages. Since this metadata is application-defined,
481   * a given application could define there to be a relationship,
482   * but there is none required by the spec.
483   */
484  bytes app_metadata = 4;
485}
486
487/*
488 * The request of the RenewFlightEndpoint action.
489 *
490 * The request should be stored in Action.body.
491 */
492message RenewFlightEndpointRequest {
493  FlightEndpoint endpoint = 1;
494}
495
496/*
497 * A batch of Arrow data as part of a stream of batches.
498 */
499message FlightData {
500
501  /*
502   * The descriptor of the data. This is only relevant when a client is
503   * starting a new DoPut stream.
504   */
505  FlightDescriptor flight_descriptor = 1;
506
507  /*
508   * Header for message data as described in Message.fbs::Message.
509   */
510  bytes data_header = 2;
511
512  /*
513   * Application-defined metadata.
514   */
515  bytes app_metadata = 3;
516
517  /*
518   * The actual batch of Arrow data. Preferably handled with minimal-copies
519   * coming last in the definition to help with sidecar patterns (it is
520   * expected that some implementations will fetch this field off the wire
521   * with specialized code to avoid extra memory copies).
522   */
523  bytes data_body = 1000;
524}
525
526/**
527 * The response message associated with the submission of a DoPut.
528 */
529message PutResult {
530  bytes app_metadata = 1;
531}
532
533/*
534 * EXPERIMENTAL: Union of possible value types for a Session Option to be set to.
535 *
536 * By convention, an attempt to set a valueless SessionOptionValue should
537 * attempt to unset or clear the named option value on the server.
538 */
539message SessionOptionValue {
540  message StringListValue {
541    repeated string values = 1;
542  }
543
544  oneof option_value {
545    string string_value = 1;
546    bool bool_value = 2;
547    sfixed64 int64_value = 3;
548    double double_value = 4;
549    StringListValue string_list_value = 5;
550  }
551}
552
553/*
554 * EXPERIMENTAL: A request to set session options for an existing or new (implicit)
555 * server session.
556 *
557 * Sessions are persisted and referenced via a transport-level state management, typically
558 * RFC 6265 HTTP cookies when using an HTTP transport.  The suggested cookie name or state
559 * context key is 'arrow_flight_session_id', although implementations may freely choose their
560 * own name.
561 *
562 * Session creation (if one does not already exist) is implied by this RPC request, however
563 * server implementations may choose to initiate a session that also contains client-provided
564 * session options at any other time, e.g. on authentication, or when any other call is made
565 * and the server wishes to use a session to persist any state (or lack thereof).
566 */
567message SetSessionOptionsRequest {
568  map<string, SessionOptionValue> session_options = 1;
569}
570
571/*
572 * EXPERIMENTAL: The results (individually) of setting a set of session options.
573 *
574 * Option names should only be present in the response if they were not successfully
575 * set on the server; that is, a response without an Error for a name provided in the
576 * SetSessionOptionsRequest implies that the named option value was set successfully.
577 */
578message SetSessionOptionsResult {
579  enum ErrorValue {
580    // Protobuf deserialization fallback value: The status is unknown or unrecognized.
581    // Servers should avoid using this value. The request may be retried by the client.
582    UNSPECIFIED = 0;
583    // The given session option name is invalid.
584    INVALID_NAME = 1;
585    // The session option value or type is invalid.
586    INVALID_VALUE = 2;
587    // The session option cannot be set.
588    ERROR = 3;
589  }
590
591  message Error {
592    ErrorValue value = 1;
593  }
594
595  map<string, Error> errors = 1;
596}
597
598/*
599 * EXPERIMENTAL: A request to access the session options for the current server session.
600 *
601 * The existing session is referenced via a cookie header or similar (see
602 * SetSessionOptionsRequest above); it is an error to make this request with a missing,
603 * invalid, or expired session cookie header or other implementation-defined session
604 * reference token.
605 */
606message GetSessionOptionsRequest {
607}
608
609/*
610 * EXPERIMENTAL: The result containing the current server session options.
611 */
612message GetSessionOptionsResult {
613    map<string, SessionOptionValue> session_options = 1;
614}
615
616/*
617 * Request message for the "Close Session" action.
618 *
619 * The exiting session is referenced via a cookie header.
620 */
621message CloseSessionRequest {
622}
623
624/*
625 * The result of closing a session.
626 */
627message CloseSessionResult {
628  enum Status {
629    // Protobuf deserialization fallback value: The session close status is unknown or
630    // not recognized. Servers should avoid using this value (send a NOT_FOUND error if
631    // the requested session is not known or expired). Clients can retry the request.
632    UNSPECIFIED = 0;
633    // The session close request is complete. Subsequent requests with
634    // the same session produce a NOT_FOUND error.
635    CLOSED = 1;
636    // The session close request is in progress. The client may retry
637    // the close request.
638    CLOSING = 2;
639    // The session is not closeable. The client should not retry the
640    // close request.
641    NOT_CLOSEABLE = 3;
642  }
643
644  Status status = 1;
645}