Arrow Flight RPC#

Arrow Flight 是一个基于 Arrow 数据的高性能数据服务 RPC 框架,它构建在 gRPCIPC 格式 之上。

Flight 围绕着 Arrow 记录批次的流进行组织,这些流可以从另一个服务下载或上传到另一个服务。一组元数据方法提供流的发现和内省,以及实现特定于应用程序的方法的能力。

方法和消息线格式由 Protobuf 定义,这使得与可能分别支持 gRPC 和 Arrow 但不支持 Flight 的客户端实现互操作性。但是,Flight 实现包括进一步的优化,以避免 Protobuf 使用中的开销(主要围绕避免过多的内存复制)。

RPC 方法和请求模式#

Flight 定义了一组 RPC 方法,用于上传/下载数据、检索有关数据流的元数据、列出可用的数据流以及实现特定于应用程序的 RPC 方法。Flight 服务实现这些方法的某个子集,而 Flight 客户端可以调用这些方法中的任何一个。

数据流由描述符(FlightDescriptor 消息)标识,这些描述符要么是路径,要么是任意的二进制命令。例如,描述符可以编码 SQL 查询、分布式文件系统上的文件路径,甚至可以编码 Python 对象的 pickle;应用程序可以根据需要使用此消息。

因此,一个 Flight 客户端可以连接到任何服务并执行基本操作。为了促进这一点,Flight 服务应该支持一些常见的请求模式,这些模式将在下面描述。当然,应用程序可以忽略兼容性,并将 Flight RPC 方法简单地视为其自身目的的低级构建块。

有关方法和消息的完整详细信息,请参阅 协议缓冲区定义

下载数据#

想要下载数据的客户端会

../_images/DoGet.mmd.svg

通过 DoGet 检索数据。#

  1. 构建或获取他们感兴趣的数据集的 FlightDescriptor

    客户端可能已经知道他们想要的描述符,或者他们可以使用 ListFlights 之类的方法来发现它们。

  2. 调用 GetFlightInfo(FlightDescriptor) 以获取 FlightInfo 消息。

    Flight 不要求数据与元数据位于同一服务器上。因此,FlightInfo 包含有关数据位置的详细信息,以便客户端可以从适当的服务器获取数据。这被编码为 FlightInfo 中的一系列 FlightEndpoint 消息。每个端点代表包含响应数据子集的某个位置。

    端点包含一个位置列表(服务器地址),这些位置可以从中检索数据,以及一个 Ticket,这是一个不透明的二进制令牌,服务器将使用它来识别正在请求的数据。

    如果 FlightInfo.ordered 为真,则表示不同端点的数据之间存在某种顺序。客户端应该产生与从每个端点返回的数据按顺序从前到后连接时相同的结果。

    如果 FlightInfo.ordered 为假,则客户端可以以任意顺序返回来自任何端点的数据。来自任何特定端点的数据必须按顺序返回,但来自不同端点的数据可以交织在一起以允许并行获取。

    请注意,由于某些客户端可能会忽略 FlightInfo.ordered,如果顺序很重要并且无法确保客户端支持,则服务器应该返回单个端点。

    响应还包含其他元数据,例如模式,以及可选的数据集大小估计。

  3. 使用服务器返回的每个端点。

    要使用端点,客户端应该连接到端点中的一个位置,然后使用端点中的票证调用 DoGet(Ticket)。这将为客户端提供 Arrow 记录批次的流。

    如果服务器希望指示数据位于本地服务器而不是其他位置,则它可以返回一个空的位置列表。然后,客户端可以重用与原始服务器的现有连接来获取数据。否则,客户端必须连接到指示的位置之一。

    服务器可能会将“自身”作为位置列在其他服务器位置旁边。通常,这需要服务器知道其公共地址,但它也可以使用特殊的 URI 字符串 arrow-flight-reuse-connection://? 来告诉客户端他们可以重用与同一服务器的现有连接,而无需能够命名自身。请参阅下面的 连接重用

    这样,端点中的位置也可以被认为是执行旁路负载均衡或服务发现功能。端点可以表示已分区或以其他方式分布的数据。

    客户端必须使用所有端点来检索完整的数据集。客户端可以以任何顺序甚至并行使用端点,或者将端点分布在多台机器上以供使用;这由应用程序来实现。客户端还可以使用 FlightInfo.ordered。有关 FlightInfo.ordered 的详细信息,请参阅上一项。

    每个端点可能都有过期时间 (FlightEndpoint.expiration_time)。如果端点有过期时间,则客户端可以通过 DoGet 多次获取数据,直到达到过期时间。否则,是否可以重试 DoGet 请求是应用程序定义的。过期时间表示为 google.protobuf.Timestamp

    如果过期时间很短,则客户端可以通过 RenewFlightEndpoint 操作来延长过期时间。客户端需要使用 DoActionRenewFlightEndpoint 操作类型来延长过期时间。 Action.body 必须是具有要续期的 FlightEndpointRenewFlightEndpointRequest

    客户端可以通过 CancelFlightInfo 操作来取消返回的 FlightInfo。客户端需要使用 DoActionCancelFlightInfo 操作类型来取消 FlightInfo

通过运行繁重的查询来下载数据#

客户端可能需要请求一个繁重的查询来下载数据。但是,GetFlightInfo 不会在查询完成之前返回,因此客户端会被阻塞。在这种情况下,客户端可以使用 PollFlightInfo 而不是 GetFlightInfo

../_images/PollFlightInfo.mmd.svg

通过 PollFlightInfo 轮询长时间运行的查询。#

  1. 构建或获取 FlightDescriptor,如前所述。

  2. 调用 PollFlightInfo(FlightDescriptor) 以获取 PollInfo 消息。

    服务器应该在第一次调用时尽快响应。因此,客户端不应该等待第一个 PollInfo 响应。

    如果查询尚未完成,则 PollInfo.flight_descriptor 具有 FlightDescriptor。客户端应该使用描述符(而不是原始的 FlightDescriptor)来调用下一个 PollFlightInfo()。服务器应该识别不一定是最新的 PollInfo.flight_descriptor,以防客户端在两者之间错过更新。

    如果查询已完成,则 PollInfo.flight_descriptor 未设置。

    PollInfo.info 是目前为止可用的结果。它每次都是一个完整的 FlightInfo,而不仅仅是先前和当前 FlightInfo 之间的增量。服务器应该每次只追加到 PollInfo.info 中的端点。因此,即使查询尚未完成,客户端也可以使用 PollInfo.info 中的 Ticket 运行 DoGet(Ticket)FlightInfo.ordered 也有效。

    服务器不应该在结果与上次不同之前响应。这样,客户端就可以“长时间轮询”更新,而无需不断地发出请求。如果需要,客户端可以设置一个短超时以避免阻塞调用。

    PollInfo.progress 可能被设置。它表示查询的进度。如果设置了,该值必须在 [0.0, 1.0] 范围内。该值不一定是单调的或非递减的。服务器可以通过仅更新 PollInfo.progress 值来响应,尽管它不应该向客户端发送大量更新。

    PollInfo.timestamp 是此请求的过期时间。超过此时间后,服务器可能不再接受轮询描述符,并且查询可能会被取消。这可能会在调用 PollFlightInfo 时更新。过期时间表示为 google.protobuf.Timestamp

    客户端可以通过 CancelFlightInfo 操作来取消查询。

    如果查询失败,服务器应该返回错误状态而不是响应。客户端不应该轮询请求,除非是 TIMED_OUTUNAVAILABLE,它们可能不是来自服务器的。

  3. 像以前一样消费服务器返回的每个端点。

上传数据#

要上传数据,客户端需要

../_images/DoPut.mmd.svg

通过 DoPut 上传数据。#

  1. 构建或获取 FlightDescriptor,如前所述。

  2. 调用 DoPut(FlightData) 并上传一系列 Arrow 记录批次。

    第一个消息中包含 FlightDescriptor,以便服务器可以识别数据集。

DoPut 允许服务器向客户端发送带有自定义元数据的响应消息。这可以用来实现诸如可恢复写入之类的事情(例如,服务器可以定期发送消息来指示到目前为止已提交了多少行)。

交换数据#

某些用例可能需要在单个调用中上传和下载数据。虽然这可以通过多个调用来模拟,但如果应用程序是有状态的,这可能很困难。例如,应用程序可能希望实现一个调用,其中客户端上传数据,服务器响应该数据的转换;如果使用 DoGetDoPut 实现,这将需要是有状态的。相反,DoExchange 允许将其作为单个调用实现。客户端需要

../_images/DoExchange.mmd.svg

使用 DoExchange 的复杂数据流。#

  1. 构建或获取 FlightDescriptor,如前所述。

  2. 调用 DoExchange(FlightData)

    第一个消息中包含 FlightDescriptor,与 DoPut 一样。此时,客户端和服务器都可以同时将数据流式传输到对方。

身份验证#

Flight 支持各种身份验证方法,应用程序可以根据自己的需要进行自定义。

“握手”身份验证

这在两个部分中实现。在连接时,客户端调用 Handshake RPC 方法,应用程序定义的身份验证处理程序可以与服务器上的对应处理程序交换任意数量的消息。然后,处理程序提供一个二进制令牌。Flight 客户端随后将在所有未来调用的标头中包含此令牌,该令牌将由服务器身份验证处理程序验证。

应用程序可以使用此部分的任何部分;例如,它们可以忽略初始握手并在每次调用时发送外部获取的令牌(例如,承载令牌),或者它们可以在握手期间建立信任,并且不验证每次调用的令牌,将连接视为有状态的(“登录”模式)。

警告

除非在每次调用时验证令牌,否则这种模式并不安全,尤其是在存在第 7 层负载均衡器的情况下,这在 gRPC 中很常见,或者如果 gRPC 透明地重新连接客户端。

基于标头/基于中间件的身份验证

客户端可以在调用中包含自定义标头。然后可以实现自定义中间件来验证和接受/拒绝服务器端的调用。

双向 TLS (mTLS)

客户端在连接建立期间提供证书,该证书由服务器验证。应用程序不需要实现任何身份验证代码,但必须配置和分发证书。

这可能仅在某些实现中可用,并且仅在启用 TLS 时可用。

某些 Flight 实现也可能公开底层的 gRPC API,在这种情况下,任何 gRPC 支持的身份验证方法 都可用。

位置 URI#

Flight 主要是在其 Protobuf 和 gRPC 规范(如下)中定义的,但 Arrow 实现也可能支持替代传输(参见 Flight RPC)。客户端和服务器需要知道对给定 URI 在位置中使用哪种传输,因此 Flight 实现应该对给定传输使用以下 URI 方案

传输

URI 方案

gRPC(明文)

grpc: 或 grpc+tcp

gRPC(TLS)

grpc+tls

gRPC(Unix 域套接字)

grpc+unix

(重用连接)

arrow-flight-reuse-connection

UCX(明文)

ucx

连接重用#

上面的“重用连接”不是特定的传输。相反,这意味着客户端可能会尝试对与最初获取 FlightInfo 的服务器(以及通过相同的连接)执行 DoGet(即,它对 GetFlightInfo 的调用)。这与没有返回特定 Location 时的情况解释相同。

这允许服务器将“自身”作为获取数据的可能位置之一返回,而无需知道自己的公共地址,这在难以或不可能知道此地址的部署中很有用。例如,开发人员可能会将云环境中的远程服务转发到他们的本地机器;在这种情况下,远程服务将无法知道它被访问的本地主机名和端口。

出于兼容性原因,URI 应该始终为 arrow-flight-reuse-connection://?,带有尾随的空查询字符串。Java 的 URI 实现不接受 scheme:scheme://,而 C++ 的实现不接受空字符串,因此明显的候选者不兼容。所选表示形式可以被两种实现解析,以及 Go 的 net/url 和 Python 的 urllib.parse

错误处理#

Arrow Flight 定义了自己的错误代码集。实现因语言而异(例如,在 C++ 中,Unimplemented 是一个通用的 Arrow 错误状态,而在 Java 中它是一个特定于 Flight 的异常),但以下集合是公开的

错误代码

描述

UNKNOWN

未知错误。如果没有任何其他错误适用,则为默认值。

INTERNAL

服务实现内部发生错误。

INVALID_ARGUMENT

客户端向 RPC 传递了无效参数。

TIMED_OUT

操作超过了超时或截止时间。

NOT_FOUND

请求的资源(操作、数据流)未找到。

ALREADY_EXISTS

资源已存在。

CANCELLED

操作被取消(由客户端或服务器取消)。

UNAUTHENTICATED

客户端未经身份验证。

UNAUTHORIZED

客户端已通过身份验证,但没有请求操作的权限。

UNIMPLEMENTED

RPC 未实现。

UNAVAILABLE

服务器不可用。可能由客户端出于连接原因发出。

外部资源#

协议缓冲区定义#

  1/*
  2 * Licensed to the Apache Software Foundation (ASF) under one
  3 * or more contributor license agreements.  See the NOTICE file
  4 * distributed with this work for additional information
  5 * regarding copyright ownership.  The ASF licenses this file
  6 * to you under the Apache License, Version 2.0 (the
  7 * "License"); you may not use this file except in compliance
  8 * with the License.  You may obtain a copy of the License at
  9 * <p>
 10 * http://www.apache.org/licenses/LICENSE-2.0
 11 * <p>
 12 * Unless required by applicable law or agreed to in writing, software
 13 * distributed under the License is distributed on an "AS IS" BASIS,
 14 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 15 * See the License for the specific language governing permissions and
 16 * limitations under the License.
 17 */
 18
 19syntax = "proto3";
 20import "google/protobuf/timestamp.proto";
 21
 22option java_package = "org.apache.arrow.flight.impl";
 23option go_package = "github.com/apache/arrow/go/arrow/flight/gen/flight";
 24option csharp_namespace = "Apache.Arrow.Flight.Protocol";
 25
 26package arrow.flight.protocol;
 27
 28/*
 29 * A flight service is an endpoint for retrieving or storing Arrow data. A
 30 * flight service can expose one or more predefined endpoints that can be
 31 * accessed using the Arrow Flight Protocol. Additionally, a flight service
 32 * can expose a set of actions that are available.
 33 */
 34service FlightService {
 35
 36  /*
 37   * Handshake between client and server. Depending on the server, the
 38   * handshake may be required to determine the token that should be used for
 39   * future operations. Both request and response are streams to allow multiple
 40   * round-trips depending on auth mechanism.
 41   */
 42  rpc Handshake(stream HandshakeRequest) returns (stream HandshakeResponse) {}
 43
 44  /*
 45   * Get a list of available streams given a particular criteria. Most flight
 46   * services will expose one or more streams that are readily available for
 47   * retrieval. This api allows listing the streams available for
 48   * consumption. A user can also provide a criteria. The criteria can limit
 49   * the subset of streams that can be listed via this interface. Each flight
 50   * service allows its own definition of how to consume criteria.
 51   */
 52  rpc ListFlights(Criteria) returns (stream FlightInfo) {}
 53
 54  /*
 55   * For a given FlightDescriptor, get information about how the flight can be
 56   * consumed. This is a useful interface if the consumer of the interface
 57   * already can identify the specific flight to consume. This interface can
 58   * also allow a consumer to generate a flight stream through a specified
 59   * descriptor. For example, a flight descriptor might be something that
 60   * includes a SQL statement or a Pickled Python operation that will be
 61   * executed. In those cases, the descriptor will not be previously available
 62   * within the list of available streams provided by ListFlights but will be
 63   * available for consumption for the duration defined by the specific flight
 64   * service.
 65   */
 66  rpc GetFlightInfo(FlightDescriptor) returns (FlightInfo) {}
 67
 68  /*
 69   * For a given FlightDescriptor, start a query and get information
 70   * to poll its execution status. This is a useful interface if the
 71   * query may be a long-running query. The first PollFlightInfo call
 72   * should return as quickly as possible. (GetFlightInfo doesn't
 73   * return until the query is complete.)
 74   *
 75   * A client can consume any available results before
 76   * the query is completed. See PollInfo.info for details.
 77   *
 78   * A client can poll the updated query status by calling
 79   * PollFlightInfo() with PollInfo.flight_descriptor. A server
 80   * should not respond until the result would be different from last
 81   * time. That way, the client can "long poll" for updates
 82   * without constantly making requests. Clients can set a short timeout
 83   * to avoid blocking calls if desired.
 84   *
 85   * A client can't use PollInfo.flight_descriptor after
 86   * PollInfo.expiration_time passes. A server might not accept the
 87   * retry descriptor anymore and the query may be cancelled.
 88   *
 89   * A client may use the CancelFlightInfo action with
 90   * PollInfo.info to cancel the running query.
 91   */
 92  rpc PollFlightInfo(FlightDescriptor) returns (PollInfo) {}
 93
 94  /*
 95   * For a given FlightDescriptor, get the Schema as described in Schema.fbs::Schema
 96   * This is used when a consumer needs the Schema of flight stream. Similar to
 97   * GetFlightInfo this interface may generate a new flight that was not previously
 98   * available in ListFlights.
 99   */
100   rpc GetSchema(FlightDescriptor) returns (SchemaResult) {}
101
102  /*
103   * Retrieve a single stream associated with a particular descriptor
104   * associated with the referenced ticket. A Flight can be composed of one or
105   * more streams where each stream can be retrieved using a separate opaque
106   * ticket that the flight service uses for managing a collection of streams.
107   */
108  rpc DoGet(Ticket) returns (stream FlightData) {}
109
110  /*
111   * Push a stream to the flight service associated with a particular
112   * flight stream. This allows a client of a flight service to upload a stream
113   * of data. Depending on the particular flight service, a client consumer
114   * could be allowed to upload a single stream per descriptor or an unlimited
115   * number. In the latter, the service might implement a 'seal' action that
116   * can be applied to a descriptor once all streams are uploaded.
117   */
118  rpc DoPut(stream FlightData) returns (stream PutResult) {}
119
120  /*
121   * Open a bidirectional data channel for a given descriptor. This
122   * allows clients to send and receive arbitrary Arrow data and
123   * application-specific metadata in a single logical stream. In
124   * contrast to DoGet/DoPut, this is more suited for clients
125   * offloading computation (rather than storage) to a Flight service.
126   */
127  rpc DoExchange(stream FlightData) returns (stream FlightData) {}
128
129  /*
130   * Flight services can support an arbitrary number of simple actions in
131   * addition to the possible ListFlights, GetFlightInfo, DoGet, DoPut
132   * operations that are potentially available. DoAction allows a flight client
133   * to do a specific action against a flight service. An action includes
134   * opaque request and response objects that are specific to the type action
135   * being undertaken.
136   */
137  rpc DoAction(Action) returns (stream Result) {}
138
139  /*
140   * A flight service exposes all of the available action types that it has
141   * along with descriptions. This allows different flight consumers to
142   * understand the capabilities of the flight service.
143   */
144  rpc ListActions(Empty) returns (stream ActionType) {}
145}
146
147/*
148 * The request that a client provides to a server on handshake.
149 */
150message HandshakeRequest {
151
152  /*
153   * A defined protocol version
154   */
155  uint64 protocol_version = 1;
156
157  /*
158   * Arbitrary auth/handshake info.
159   */
160  bytes payload = 2;
161}
162
163message HandshakeResponse {
164
165  /*
166   * A defined protocol version
167   */
168  uint64 protocol_version = 1;
169
170  /*
171   * Arbitrary auth/handshake info.
172   */
173  bytes payload = 2;
174}
175
176/*
177 * A message for doing simple auth.
178 */
179message BasicAuth {
180  string username = 2;
181  string password = 3;
182}
183
184message Empty {}
185
186/*
187 * Describes an available action, including both the name used for execution
188 * along with a short description of the purpose of the action.
189 */
190message ActionType {
191  string type = 1;
192  string description = 2;
193}
194
195/*
196 * A service specific expression that can be used to return a limited set
197 * of available Arrow Flight streams.
198 */
199message Criteria {
200  bytes expression = 1;
201}
202
203/*
204 * An opaque action specific for the service.
205 */
206message Action {
207  string type = 1;
208  bytes body = 2;
209}
210
211/*
212 * The request of the CancelFlightInfo action.
213 *
214 * The request should be stored in Action.body.
215 */
216message CancelFlightInfoRequest {
217  FlightInfo info = 1;
218}
219
220/*
221 * The request of the RenewFlightEndpoint action.
222 *
223 * The request should be stored in Action.body.
224 */
225message RenewFlightEndpointRequest {
226  FlightEndpoint endpoint = 1;
227}
228
229/*
230 * An opaque result returned after executing an action.
231 */
232message Result {
233  bytes body = 1;
234}
235
236/*
237 * The result of a cancel operation.
238 *
239 * This is used by CancelFlightInfoResult.status.
240 */
241enum CancelStatus {
242  // The cancellation status is unknown. Servers should avoid using
243  // this value (send a NOT_FOUND error if the requested query is
244  // not known). Clients can retry the request.
245  CANCEL_STATUS_UNSPECIFIED = 0;
246  // The cancellation request is complete. Subsequent requests with
247  // the same payload may return CANCELLED or a NOT_FOUND error.
248  CANCEL_STATUS_CANCELLED = 1;
249  // The cancellation request is in progress. The client may retry
250  // the cancellation request.
251  CANCEL_STATUS_CANCELLING = 2;
252  // The query is not cancellable. The client should not retry the
253  // cancellation request.
254  CANCEL_STATUS_NOT_CANCELLABLE = 3;
255}
256
257/*
258 * The result of the CancelFlightInfo action.
259 *
260 * The result should be stored in Result.body.
261 */
262message CancelFlightInfoResult {
263  CancelStatus status = 1;
264}
265
266/*
267 * Wrap the result of a getSchema call
268 */
269message SchemaResult {
270  // The schema of the dataset in its IPC form:
271  //   4 bytes - an optional IPC_CONTINUATION_TOKEN prefix
272  //   4 bytes - the byte length of the payload
273  //   a flatbuffer Message whose header is the Schema
274  bytes schema = 1;
275}
276
277/*
278 * The name or tag for a Flight. May be used as a way to retrieve or generate
279 * a flight or be used to expose a set of previously defined flights.
280 */
281message FlightDescriptor {
282
283  /*
284   * Describes what type of descriptor is defined.
285   */
286  enum DescriptorType {
287
288    // Protobuf pattern, not used.
289    UNKNOWN = 0;
290
291    /*
292     * A named path that identifies a dataset. A path is composed of a string
293     * or list of strings describing a particular dataset. This is conceptually
294     *  similar to a path inside a filesystem.
295     */
296    PATH = 1;
297
298    /*
299     * An opaque command to generate a dataset.
300     */
301    CMD = 2;
302  }
303
304  DescriptorType type = 1;
305
306  /*
307   * Opaque value used to express a command. Should only be defined when
308   * type = CMD.
309   */
310  bytes cmd = 2;
311
312  /*
313   * List of strings identifying a particular dataset. Should only be defined
314   * when type = PATH.
315   */
316  repeated string path = 3;
317}
318
319/*
320 * The access coordinates for retrieval of a dataset. With a FlightInfo, a
321 * consumer is able to determine how to retrieve a dataset.
322 */
323message FlightInfo {
324  // The schema of the dataset in its IPC form:
325  //   4 bytes - an optional IPC_CONTINUATION_TOKEN prefix
326  //   4 bytes - the byte length of the payload
327  //   a flatbuffer Message whose header is the Schema
328  bytes schema = 1;
329
330  /*
331   * The descriptor associated with this info.
332   */
333  FlightDescriptor flight_descriptor = 2;
334
335  /*
336   * A list of endpoints associated with the flight. To consume the
337   * whole flight, all endpoints (and hence all Tickets) must be
338   * consumed. Endpoints can be consumed in any order.
339   *
340   * In other words, an application can use multiple endpoints to
341   * represent partitioned data.
342   *
343   * If the returned data has an ordering, an application can use
344   * "FlightInfo.ordered = true" or should return the all data in a
345   * single endpoint. Otherwise, there is no ordering defined on
346   * endpoints or the data within.
347   *
348   * A client can read ordered data by reading data from returned
349   * endpoints, in order, from front to back.
350   *
351   * Note that a client may ignore "FlightInfo.ordered = true". If an
352   * ordering is important for an application, an application must
353   * choose one of them:
354   *
355   * * An application requires that all clients must read data in
356   *   returned endpoints order.
357   * * An application must return the all data in a single endpoint.
358   */
359  repeated FlightEndpoint endpoint = 3;
360
361  // Set these to -1 if unknown.
362  int64 total_records = 4;
363  int64 total_bytes = 5;
364
365  /*
366   * FlightEndpoints are in the same order as the data.
367   */
368  bool ordered = 6;
369
370  /*
371   * Application-defined metadata.
372   *
373   * There is no inherent or required relationship between this
374   * and the app_metadata fields in the FlightEndpoints or resulting
375   * FlightData messages. Since this metadata is application-defined,
376   * a given application could define there to be a relationship,
377   * but there is none required by the spec.
378   */
379  bytes app_metadata = 7;
380}
381
382/*
383 * The information to process a long-running query.
384 */
385message PollInfo {
386  /*
387   * The currently available results.
388   *
389   * If "flight_descriptor" is not specified, the query is complete
390   * and "info" specifies all results. Otherwise, "info" contains
391   * partial query results.
392   *
393   * Note that each PollInfo response contains a complete
394   * FlightInfo (not just the delta between the previous and current
395   * FlightInfo).
396   *
397   * Subsequent PollInfo responses may only append new endpoints to
398   * info.
399   *
400   * Clients can begin fetching results via DoGet(Ticket) with the
401   * ticket in the info before the query is
402   * completed. FlightInfo.ordered is also valid.
403   */
404  FlightInfo info = 1;
405
406  /*
407   * The descriptor the client should use on the next try.
408   * If unset, the query is complete.
409   */
410  FlightDescriptor flight_descriptor = 2;
411
412  /*
413   * Query progress. If known, must be in [0.0, 1.0] but need not be
414   * monotonic or nondecreasing. If unknown, do not set.
415   */
416  optional double progress = 3;
417
418  /*
419   * Expiration time for this request. After this passes, the server
420   * might not accept the retry descriptor anymore (and the query may
421   * be cancelled). This may be updated on a call to PollFlightInfo.
422   */
423  google.protobuf.Timestamp expiration_time = 4;
424}
425
426/*
427 * A particular stream or split associated with a flight.
428 */
429message FlightEndpoint {
430
431  /*
432   * Token used to retrieve this stream.
433   */
434  Ticket ticket = 1;
435
436  /*
437   * A list of URIs where this ticket can be redeemed via DoGet().
438   *
439   * If the list is empty, the expectation is that the ticket can only
440   * be redeemed on the current service where the ticket was
441   * generated.
442   *
443   * If the list is not empty, the expectation is that the ticket can be
444   * redeemed at any of the locations, and that the data returned will be
445   * equivalent. In this case, the ticket may only be redeemed at one of the
446   * given locations, and not (necessarily) on the current service. If one
447   * of the given locations is "arrow-flight-reuse-connection://?", the
448   * client may redeem the ticket on the service where the ticket was
449   * generated (i.e., the same as above), in addition to the other
450   * locations. (This URI was chosen to maximize compatibility, as 'scheme:'
451   * or 'scheme://' are not accepted by Java's java.net.URI.)
452   *
453   * In other words, an application can use multiple locations to
454   * represent redundant and/or load balanced services.
455   */
456  repeated Location location = 2;
457
458  /*
459   * Expiration time of this stream. If present, clients may assume
460   * they can retry DoGet requests. Otherwise, it is
461   * application-defined whether DoGet requests may be retried.
462   */
463  google.protobuf.Timestamp expiration_time = 3;
464
465  /*
466   * Application-defined metadata.
467   *
468   * There is no inherent or required relationship between this
469   * and the app_metadata fields in the FlightInfo or resulting
470   * FlightData messages. Since this metadata is application-defined,
471   * a given application could define there to be a relationship,
472   * but there is none required by the spec.
473   */
474  bytes app_metadata = 4;
475}
476
477/*
478 * A location where a Flight service will accept retrieval of a particular
479 * stream given a ticket.
480 */
481message Location {
482  string uri = 1;
483}
484
485/*
486 * An opaque identifier that the service can use to retrieve a particular
487 * portion of a stream.
488 *
489 * Tickets are meant to be single use. It is an error/application-defined
490 * behavior to reuse a ticket.
491 */
492message Ticket {
493  bytes ticket = 1;
494}
495
496/*
497 * A batch of Arrow data as part of a stream of batches.
498 */
499message FlightData {
500
501  /*
502   * The descriptor of the data. This is only relevant when a client is
503   * starting a new DoPut stream.
504   */
505  FlightDescriptor flight_descriptor = 1;
506
507  /*
508   * Header for message data as described in Message.fbs::Message.
509   */
510  bytes data_header = 2;
511
512  /*
513   * Application-defined metadata.
514   */
515  bytes app_metadata = 3;
516
517  /*
518   * The actual batch of Arrow data. Preferably handled with minimal-copies
519   * coming last in the definition to help with sidecar patterns (it is
520   * expected that some implementations will fetch this field off the wire
521   * with specialized code to avoid extra memory copies).
522   */
523  bytes data_body = 1000;
524}
525
526/**
527 * The response message associated with the submission of a DoPut.
528 */
529message PutResult {
530  bytes app_metadata = 1;
531}
532
533/*
534 * EXPERIMENTAL: Union of possible value types for a Session Option to be set to.
535 *
536 * By convention, an attempt to set a valueless SessionOptionValue should
537 * attempt to unset or clear the named option value on the server.
538 */
539message SessionOptionValue {
540  message StringListValue {
541    repeated string values = 1;
542  }
543
544  oneof option_value {
545    string string_value = 1;
546    bool bool_value = 2;
547    sfixed64 int64_value = 3;
548    double double_value = 4;
549    StringListValue string_list_value = 5;
550  }
551}
552
553/*
554 * EXPERIMENTAL: A request to set session options for an existing or new (implicit)
555 * server session.
556 *
557 * Sessions are persisted and referenced via a transport-level state management, typically
558 * RFC 6265 HTTP cookies when using an HTTP transport.  The suggested cookie name or state
559 * context key is 'arrow_flight_session_id', although implementations may freely choose their
560 * own name.
561 *
562 * Session creation (if one does not already exist) is implied by this RPC request, however
563 * server implementations may choose to initiate a session that also contains client-provided
564 * session options at any other time, e.g. on authentication, or when any other call is made
565 * and the server wishes to use a session to persist any state (or lack thereof).
566 */
567message SetSessionOptionsRequest {
568  map<string, SessionOptionValue> session_options = 1;
569}
570
571/*
572 * EXPERIMENTAL: The results (individually) of setting a set of session options.
573 *
574 * Option names should only be present in the response if they were not successfully
575 * set on the server; that is, a response without an Error for a name provided in the
576 * SetSessionOptionsRequest implies that the named option value was set successfully.
577 */
578message SetSessionOptionsResult {
579  enum ErrorValue {
580    // Protobuf deserialization fallback value: The status is unknown or unrecognized.
581    // Servers should avoid using this value. The request may be retried by the client.
582    UNSPECIFIED = 0;
583    // The given session option name is invalid.
584    INVALID_NAME = 1;
585    // The session option value or type is invalid.
586    INVALID_VALUE = 2;
587    // The session option cannot be set.
588    ERROR = 3;
589  }
590
591  message Error {
592    ErrorValue value = 1;
593  }
594
595  map<string, Error> errors = 1;
596}
597
598/*
599 * EXPERIMENTAL: A request to access the session options for the current server session.
600 *
601 * The existing session is referenced via a cookie header or similar (see
602 * SetSessionOptionsRequest above); it is an error to make this request with a missing,
603 * invalid, or expired session cookie header or other implementation-defined session
604 * reference token.
605 */
606message GetSessionOptionsRequest {
607}
608
609/*
610 * EXPERIMENTAL: The result containing the current server session options.
611 */
612message GetSessionOptionsResult {
613    map<string, SessionOptionValue> session_options = 1;
614}
615
616/*
617 * Request message for the "Close Session" action.
618 *
619 * The exiting session is referenced via a cookie header.
620 */
621message CloseSessionRequest {
622}
623
624/*
625 * The result of closing a session.
626 */
627message CloseSessionResult {
628  enum Status {
629    // Protobuf deserialization fallback value: The session close status is unknown or
630    // not recognized. Servers should avoid using this value (send a NOT_FOUND error if
631    // the requested session is not known or expired). Clients can retry the request.
632    UNSPECIFIED = 0;
633    // The session close request is complete. Subsequent requests with
634    // the same session produce a NOT_FOUND error.
635    CLOSED = 1;
636    // The session close request is in progress. The client may retry
637    // the close request.
638    CLOSING = 2;
639    // The session is not closeable. The client should not retry the
640    // close request.
641    NOT_CLOSEABLE = 3;
642  }
643
644  Status status = 1;
645}