Arrow Flight RPC#

Arrow Flight 是一个基于 Arrow 数据的高性能数据服务 RPC 框架,构建于 gRPCIPC 格式之上。

Flight 的核心是以 Arrow 记录批次流的形式组织数据,这些流可以是从另一个服务下载,也可以是上传到另一个服务。一套元数据方法提供了流的发现和内省功能,以及实现特定于应用程序的方法的能力。

方法和消息线格式由 Protobuf 定义,从而实现了与可能单独支持 gRPC 和 Arrow 但不支持 Flight 的客户端之间的互操作性。然而,Flight 的实现包含进一步的优化,以避免在使用 Protobuf 时产生开销(主要是避免过多的内存复制)。

RPC 方法和请求模式#

Flight 定义了一套 RPC 方法,用于上传/下载数据、检索数据流的元数据、列出可用数据流以及实现特定于应用程序的 RPC 方法。Flight 服务实现这些方法中的一部分,而 Flight 客户端可以调用这些方法中的任何一个。

数据流由描述符(FlightDescriptor 消息)标识,描述符可以是路径或任意二进制命令。例如,描述符可以编码 SQL 查询、分布式文件系统上的文件路径,甚至是一个 pickled 的 Python 对象;应用程序可以根据需要使用此消息。

因此,一个 Flight 客户端可以连接到任何服务并执行基本操作。为了方便这一点,Flight 服务“期望”支持一些常见的请求模式,如下所述。当然,应用程序可能会忽略兼容性,而仅仅将 Flight RPC 方法视为用于自身目的的低级构建块。

有关所涉及的方法和消息的完整详细信息,请参阅协议缓冲定义

下载数据#

希望下载数据的客户端将

%% Licensed to the Apache Software Foundation (ASF) under one %% or more contributor license agreements. See the NOTICE file %% distributed with this work for additional information %% regarding copyright ownership. The ASF licenses this file %% to you under the Apache License, Version 2.0 (the %% "License"); you may not use this file except in compliance %% with the License. You may obtain a copy of the License at %% %% https://apache.org/licenses/LICENSE-2.0 %% %% Unless required by applicable law or agreed to in writing, %% software distributed under the License is distributed on an %% "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY %% KIND, either express or implied. See the License for the %% specific language governing permissions and limitations %% under the License. sequenceDiagram autonumber participant Client participant Metadata Server participant Data Server Client->>Metadata Server: GetFlightInfo(FlightDescriptor) Metadata Server->>Client: FlightInfo{endpoints: [FlightEndpoint{ticket: Ticket}, …]} Note over Client, Data Server: This may be parallelized loop for each endpoint in FlightInfo.endpoints Client->>Data Server: DoGet(Ticket) Data Server->>Client: stream of FlightData end

通过 DoGet 检索数据。#

  1. 为其感兴趣的数据集构造或获取一个 FlightDescriptor

    客户端可能已经知道他们想要的描述符是什么,或者他们可以使用 ListFlights 等方法来发现它们。

  2. 调用 GetFlightInfo(FlightDescriptor) 以获取 FlightInfo 消息。

    Flight 不要求数据和元数据位于同一服务器上。因此,FlightInfo 包含有关数据位置的详细信息,以便客户端可以从适当的服务器获取数据。这编码为 FlightInfo 中的一系列 FlightEndpoint 消息。每个端点表示包含响应数据子集的位置。

    端点包含一个位置列表(服务器地址),可以从中检索此数据,以及一个 Ticket,这是一个不透明的二进制令牌,服务器将使用它来标识请求的数据。

    如果 FlightInfo.ordered 为真,则表示来自不同端点的数据之间存在某种顺序。客户端应产生与每个端点返回的数据按顺序从前到后连接起来相同的结果。

    如果 FlightInfo.ordered 为假,客户端可以以任意顺序从任何端点返回数据。来自任何特定端点的数据必须按顺序返回,但来自不同端点的数据可以交错以允许并行获取。

    请注意,由于某些客户端可能会忽略 FlightInfo.ordered,如果顺序很重要且无法确保客户端支持,服务器应返回单个端点。

    响应还包含其他元数据,例如模式,以及可选的数据集大小估计。

  3. 消费服务器返回的每个端点。

    要消费端点,客户端应连接到端点中的一个位置,然后使用端点中的 ticket 调用 DoGet(Ticket)。这将为客户端提供 Arrow 记录批次流。

    如果服务器希望表明数据位于本地服务器而不是不同位置,则可以返回一个空的位置列表。然后,客户端可以重用与原始服务器的现有连接来获取数据。否则,客户端必须连接到指示的位置之一。

    服务器可以将“自身”列为与其他服务器位置并列的位置。通常,这需要服务器知道其公共地址,但它也可以使用特殊的 URI 字符串 arrow-flight-reuse-connection://? 来告诉客户端它们可以重用与同一服务器的现有连接,而无需能够命名自身。参见下面的连接复用

    通过这种方式,端点内的位置也可以被视为执行 look-aside 负载均衡或服务发现功能。端点可以表示已分区或以其他方式分发的数据。

    客户端必须消费所有端点才能检索完整数据集。客户端可以按任何顺序,甚至并行地消费端点,或者将端点分配给多台机器进行消费;这取决于应用程序的实现。客户端还可以使用 FlightInfo.ordered。有关 FlightInfo.ordered 的详细信息,请参见上一点。

    每个端点可能有过期时间(FlightEndpoint.expiration_time)。如果端点有过期时间,客户端可以通过 DoGet 多次获取数据,直到达到过期时间。否则,是否可以重试 DoGet 请求由应用程序定义。过期时间表示为 google.protobuf.Timestamp

    如果过期时间很短,客户端可能可以通过 RenewFlightEndpoint 操作延长过期时间。客户端需要使用 DoActionRenewFlightEndpoint 操作类型来延长过期时间。Action.body 必须是包含要续订的 FlightEndpointRenewFlightEndpointRequest

    客户端可能可以通过 CancelFlightInfo 操作取消返回的 FlightInfo。客户端需要使用 DoActionCancelFlightInfo 操作类型来取消 FlightInfo

通过运行繁重查询下载数据#

客户端可能需要请求一个繁重的查询来下载数据。然而,GetFlightInfo 在查询完成之前不会返回,因此客户端会被阻塞。在这种情况下,客户端可以使用 PollFlightInfo 代替 GetFlightInfo

%% Licensed to the Apache Software Foundation (ASF) under one %% or more contributor license agreements. See the NOTICE file %% distributed with this work for additional information %% regarding copyright ownership. The ASF licenses this file %% to you under the Apache License, Version 2.0 (the %% "License"); you may not use this file except in compliance %% with the License. You may obtain a copy of the License at %% %% https://apache.org/licenses/LICENSE-2.0 %% %% Unless required by applicable law or agreed to in writing, %% software distributed under the License is distributed on an %% "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY %% KIND, either express or implied. See the License for the %% specific language governing permissions and limitations %% under the License. sequenceDiagram autonumber participant Client participant Metadata Server participant Data Server Client->>Metadata Server: PollFlightInfo(FlightDescriptor) Metadata Server->>Client: PollInfo{descriptor: FlightDescriptor', ...} Client->>Metadata Server: PollFlightInfo(FlightDescriptor') Metadata Server->>Client: PollInfo{descriptor: FlightDescriptor'', ...} Client->>Metadata Server: PollFlightInfo(FlightDescriptor'') Metadata Server->>Client: PollInfo{descriptor: null, info: FlightInfo{endpoints: [FlightEndpoint{ticket: Ticket}, …]} Note over Client, Data Server: This may be parallelized Note over Client, Data Server: Some endpoints may be processed while polling loop for each endpoint in FlightInfo.endpoints Client->>Data Server: DoGet(Ticket) Data Server->>Client: stream of FlightData end

通过 PollFlightInfo 轮询长时间运行的查询。#

  1. 像之前一样,构造或获取一个 FlightDescriptor

  2. 调用 PollFlightInfo(FlightDescriptor) 以获取 PollInfo 消息。

    服务器应在第一次调用时尽快响应。因此客户端不应等待第一个 PollInfo 响应。

    如果查询尚未完成,PollInfo.flight_descriptor 包含一个 FlightDescriptor。客户端应使用该描述符(而不是原始的 FlightDescriptor)调用下一个 PollFlightInfo()。服务器应该能够识别不一定是最新状态的 PollInfo.flight_descriptor,以防客户端错过中间的更新。

    如果查询已完成,PollInfo.flight_descriptor 不会设置。

    PollInfo.info 是当前可用的结果。每次都是一个完整的 FlightInfo,而不仅仅是之前和当前 FlightInfo 之间的差值。服务器每次只应向 PollInfo.info 中的端点追加。因此,即使查询尚未完成,客户端也可以使用 PollInfo.info 中的 Ticket 运行 DoGet(Ticket)FlightInfo.ordered 也有效。

    服务器不应响应,直到结果与上次不同。这样,客户端可以“长轮询”更新,而无需持续发出请求。客户端可以设置短超时,如果需要,以避免阻塞调用。

    PollInfo.progress 可能被设置。它表示查询的进度。如果设置了,值必须在 [0.0, 1.0] 范围内。该值不一定是单调或非递减的。服务器可以通过仅更新 PollInfo.progress 值来响应,尽管不应向客户端发送垃圾邮件般的更新。

    PollInfo.timestamp 是此请求的过期时间。在此时间过后,服务器可能不再接受轮询描述符,并且查询可能会被取消。这可能会在调用 PollFlightInfo 时更新。过期时间表示为 google.protobuf.Timestamp

    客户端可能可以通过 CancelFlightInfo 操作取消查询。

    如果查询失败,服务器应返回错误状态而不是响应。除了 TIMED_OUTUNAVAILABLE 之外,客户端不应轮询请求,这些错误可能并非源自服务器。

  3. 像之前一样,消费服务器返回的每个端点。

上传数据#

要上传数据,客户端将

%% Licensed to the Apache Software Foundation (ASF) under one %% or more contributor license agreements. See the NOTICE file %% distributed with this work for additional information %% regarding copyright ownership. The ASF licenses this file %% to you under the Apache License, Version 2.0 (the %% "License"); you may not use this file except in compliance %% with the License. You may obtain a copy of the License at %% %% https://apache.org/licenses/LICENSE-2.0 %% %% Unless required by applicable law or agreed to in writing, %% software distributed under the License is distributed on an %% "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY %% KIND, either express or implied. See the License for the %% specific language governing permissions and limitations %% under the License. sequenceDiagram autonumber participant Client participant Server Note right of Client: The first FlightData includes a FlightDescriptor Client->>Server: DoPut(FlightData) Client->>Server: stream of FlightData Server->>Client: PutResult{app_metadata}

通过 DoPut 上传数据。#

  1. 像之前一样,构造或获取一个 FlightDescriptor

  2. 调用 DoPut(FlightData) 并上传 Arrow 记录批次流。

    FlightDescriptor 包含在第一条消息中,以便服务器可以识别数据集。

DoPut 允许服务器向客户端发送包含自定义元数据的响应消息。这可以用于实现诸如可恢复写入之类的功能(例如,服务器可以定期发送一条消息,指示到目前为止已提交的行数)。

交换数据#

某些用例可能需要在单个调用中上传和下载数据。虽然这可以通过多次调用来模拟,但如果应用程序是有状态的,这可能会很困难。例如,应用程序可能希望实现一个调用,其中客户端上传数据,服务器返回对该数据的转换;如果使用 DoGetDoPut 实现,这将需要有状态。相反,DoExchange 允许将其实现为单个调用。客户端将

%% Licensed to the Apache Software Foundation (ASF) under one %% or more contributor license agreements. See the NOTICE file %% distributed with this work for additional information %% regarding copyright ownership. The ASF licenses this file %% to you under the Apache License, Version 2.0 (the %% "License"); you may not use this file except in compliance %% with the License. You may obtain a copy of the License at %% %% https://apache.org/licenses/LICENSE-2.0 %% %% Unless required by applicable law or agreed to in writing, %% software distributed under the License is distributed on an %% "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY %% KIND, either express or implied. See the License for the %% specific language governing permissions and limitations %% under the License. sequenceDiagram autonumber participant Client participant Server Note right of Client: The first FlightData includes a FlightDescriptor Client->>Server: DoExchange(FlightData) par [Client sends data] Client->>Server: stream of FlightData and [Server sends data] Server->>Client: stream of FlightData end

使用 DoExchange 的复杂数据流。#

  1. 像之前一样,构造或获取一个 FlightDescriptor

  2. 调用 DoExchange(FlightData)

    FlightDescriptor 包含在第一条消息中,与 DoPut 类似。此时,客户端和服务器可以同时向对方流式传输数据。

身份验证#

Flight 支持多种身份验证方法,应用程序可以根据其需求进行定制。

“握手”身份验证

这分为两部分实现。在连接时,客户端调用 Handshake RPC 方法,应用程序定义的身份验证处理程序可以与其在服务器端的对应方交换任意数量的消息。然后,处理程序提供一个二进制令牌。Flight 客户端随后将在所有未来的调用请求头中包含此令牌,该令牌由服务器身份验证处理程序验证。

应用程序可以使用其中的任何一部分;例如,它们可以忽略初始握手,并在每次调用时发送外部获取的令牌(例如,bearer 令牌),或者它们可以在握手期间建立信任,并且不对每次调用验证令牌,将连接视为有状态(“登录”模式)。

警告

除非在每次调用时验证令牌,否则此模式不安全,特别是在存在第 7 层负载均衡器(gRPC 中常见)或 gRPC 透明地重新连接客户端的情况下。

基于头部/中间件的身份验证

客户端可以在调用中包含自定义头部。然后可以实现自定义中间件来验证和接受/拒绝服务器端的调用。

相互 TLS (mTLS)

客户端在建立连接期间提供证书,服务器进行验证。应用程序无需实现任何身份验证代码,但必须配置和分发证书。

此功能可能仅在某些实现中可用,并且仅在启用 TLS 时可用。

某些 Flight 实现也可能暴露底层的 gRPC API,在这种情况下,gRPC 支持的任何身份验证方法都可用。

位置 URI#

Flight 主要根据其 Protobuf 和 gRPC 规范(如下)来定义,但 Arrow 实现也可能支持替代传输(参见Flight RPC)。客户端和服务器需要知道对于 Location 中的给定 URI 使用哪种传输方式,因此 Flight 实现应为给定传输使用以下 URI scheme

传输

URI Scheme

gRPC (纯文本)

grpc: 或 grpc+tcp

gRPC (TLS)

grpc+tls

gRPC (Unix 域套接字)

grpc+unix

(重用连接)

arrow-flight-reuse-connection

连接复用#

上面的“重用连接”不是特定的传输。相反,它意味着客户端可以尝试对最初获取 FlightInfo 的同一服务器(并通过同一连接)执行 DoGet(即,对该服务器调用 GetFlightInfo)。这与未返回特定 Location 时解释方式相同。

这使得服务器可以将其“自身”作为获取数据的可能位置之一返回,而无需知道自己的公共地址,这在难以或不可能知道此信息的部署中非常有用。例如,开发人员可以将云环境中的远程服务转发到本地机器;在这种情况下,远程服务将无法知道它正被访问的本地主机名和端口。

出于兼容性原因,URI 应始终为 arrow-flight-reuse-connection://?,带有末尾的空查询字符串。Java 的 URI 实现不接受 scheme:scheme://,而 C++ 的实现不接受空字符串,因此显而易见的候选方案不兼容。所选的表示形式可以由这两种实现以及 Go 的 net/url 和 Python 的 urllib.parse 进行解析。

错误处理#

Arrow Flight 定义了自己的一套错误代码。实现方式在不同语言之间有所不同(例如,在 C++ 中,Unimplemented 是一个通用的 Arrow 错误状态,而在 Java 中是一个 Flight 特定的异常),但公开了以下集合

错误代码

描述

UNKNOWN

未知错误。如果没有其他错误适用时的默认值。

INTERNAL

服务实现内部发生错误。

INVALID_ARGUMENT

客户端向 RPC 传递了无效参数。

TIMED_OUT

操作超出超时或截止时间。

NOT_FOUND

未找到请求的资源(操作、数据流)。

ALREADY_EXISTS

资源已存在。

CANCELLED

操作被取消(由客户端或服务器)。

UNAUTHENTICATED

客户端未进行身份验证。

UNAUTHORIZED

客户端已进行身份验证,但没有执行请求操作的权限。

UNIMPLEMENTED

RPC 未实现。

UNAVAILABLE

服务器不可用。可能由客户端因连接原因发出。

外部资源#

协议缓冲定义#

  1/*
  2 * Licensed to the Apache Software Foundation (ASF) under one
  3 * or more contributor license agreements.  See the NOTICE file
  4 * distributed with this work for additional information
  5 * regarding copyright ownership.  The ASF licenses this file
  6 * to you under the Apache License, Version 2.0 (the
  7 * "License"); you may not use this file except in compliance
  8 * with the License.  You may obtain a copy of the License at
  9 * <p>
 10 * https://apache.org/licenses/LICENSE-2.0
 11 * <p>
 12 * Unless required by applicable law or agreed to in writing, software
 13 * distributed under the License is distributed on an "AS IS" BASIS,
 14 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 15 * See the License for the specific language governing permissions and
 16 * limitations under the License.
 17 */
 18
 19syntax = "proto3";
 20import "google/protobuf/timestamp.proto";
 21
 22option java_package = "org.apache.arrow.flight.impl";
 23option go_package = "github.com/apache/arrow-go/arrow/flight/gen/flight";
 24option csharp_namespace = "Apache.Arrow.Flight.Protocol";
 25
 26package arrow.flight.protocol;
 27
 28/*
 29 * A flight service is an endpoint for retrieving or storing Arrow data. A
 30 * flight service can expose one or more predefined endpoints that can be
 31 * accessed using the Arrow Flight Protocol. Additionally, a flight service
 32 * can expose a set of actions that are available.
 33 */
 34service FlightService {
 35
 36  /*
 37   * Handshake between client and server. Depending on the server, the
 38   * handshake may be required to determine the token that should be used for
 39   * future operations. Both request and response are streams to allow multiple
 40   * round-trips depending on auth mechanism.
 41   */
 42  rpc Handshake(stream HandshakeRequest) returns (stream HandshakeResponse) {}
 43
 44  /*
 45   * Get a list of available streams given a particular criteria. Most flight
 46   * services will expose one or more streams that are readily available for
 47   * retrieval. This api allows listing the streams available for
 48   * consumption. A user can also provide a criteria. The criteria can limit
 49   * the subset of streams that can be listed via this interface. Each flight
 50   * service allows its own definition of how to consume criteria.
 51   */
 52  rpc ListFlights(Criteria) returns (stream FlightInfo) {}
 53
 54  /*
 55   * For a given FlightDescriptor, get information about how the flight can be
 56   * consumed. This is a useful interface if the consumer of the interface
 57   * already can identify the specific flight to consume. This interface can
 58   * also allow a consumer to generate a flight stream through a specified
 59   * descriptor. For example, a flight descriptor might be something that
 60   * includes a SQL statement or a Pickled Python operation that will be
 61   * executed. In those cases, the descriptor will not be previously available
 62   * within the list of available streams provided by ListFlights but will be
 63   * available for consumption for the duration defined by the specific flight
 64   * service.
 65   */
 66  rpc GetFlightInfo(FlightDescriptor) returns (FlightInfo) {}
 67
 68  /*
 69   * For a given FlightDescriptor, start a query and get information
 70   * to poll its execution status. This is a useful interface if the
 71   * query may be a long-running query. The first PollFlightInfo call
 72   * should return as quickly as possible. (GetFlightInfo doesn't
 73   * return until the query is complete.)
 74   *
 75   * A client can consume any available results before
 76   * the query is completed. See PollInfo.info for details.
 77   *
 78   * A client can poll the updated query status by calling
 79   * PollFlightInfo() with PollInfo.flight_descriptor. A server
 80   * should not respond until the result would be different from last
 81   * time. That way, the client can "long poll" for updates
 82   * without constantly making requests. Clients can set a short timeout
 83   * to avoid blocking calls if desired.
 84   *
 85   * A client can't use PollInfo.flight_descriptor after
 86   * PollInfo.expiration_time passes. A server might not accept the
 87   * retry descriptor anymore and the query may be cancelled.
 88   *
 89   * A client may use the CancelFlightInfo action with
 90   * PollInfo.info to cancel the running query.
 91   */
 92  rpc PollFlightInfo(FlightDescriptor) returns (PollInfo) {}
 93
 94  /*
 95   * For a given FlightDescriptor, get the Schema as described in Schema.fbs::Schema
 96   * This is used when a consumer needs the Schema of flight stream. Similar to
 97   * GetFlightInfo this interface may generate a new flight that was not previously
 98   * available in ListFlights.
 99   */
100   rpc GetSchema(FlightDescriptor) returns (SchemaResult) {}
101
102  /*
103   * Retrieve a single stream associated with a particular descriptor
104   * associated with the referenced ticket. A Flight can be composed of one or
105   * more streams where each stream can be retrieved using a separate opaque
106   * ticket that the flight service uses for managing a collection of streams.
107   */
108  rpc DoGet(Ticket) returns (stream FlightData) {}
109
110  /*
111   * Push a stream to the flight service associated with a particular
112   * flight stream. This allows a client of a flight service to upload a stream
113   * of data. Depending on the particular flight service, a client consumer
114   * could be allowed to upload a single stream per descriptor or an unlimited
115   * number. In the latter, the service might implement a 'seal' action that
116   * can be applied to a descriptor once all streams are uploaded.
117   */
118  rpc DoPut(stream FlightData) returns (stream PutResult) {}
119
120  /*
121   * Open a bidirectional data channel for a given descriptor. This
122   * allows clients to send and receive arbitrary Arrow data and
123   * application-specific metadata in a single logical stream. In
124   * contrast to DoGet/DoPut, this is more suited for clients
125   * offloading computation (rather than storage) to a Flight service.
126   */
127  rpc DoExchange(stream FlightData) returns (stream FlightData) {}
128
129  /*
130   * Flight services can support an arbitrary number of simple actions in
131   * addition to the possible ListFlights, GetFlightInfo, DoGet, DoPut
132   * operations that are potentially available. DoAction allows a flight client
133   * to do a specific action against a flight service. An action includes
134   * opaque request and response objects that are specific to the type action
135   * being undertaken.
136   */
137  rpc DoAction(Action) returns (stream Result) {}
138
139  /*
140   * A flight service exposes all of the available action types that it has
141   * along with descriptions. This allows different flight consumers to
142   * understand the capabilities of the flight service.
143   */
144  rpc ListActions(Empty) returns (stream ActionType) {}
145}
146
147/*
148 * The request that a client provides to a server on handshake.
149 */
150message HandshakeRequest {
151
152  /*
153   * A defined protocol version
154   */
155  uint64 protocol_version = 1;
156
157  /*
158   * Arbitrary auth/handshake info.
159   */
160  bytes payload = 2;
161}
162
163message HandshakeResponse {
164
165  /*
166   * A defined protocol version
167   */
168  uint64 protocol_version = 1;
169
170  /*
171   * Arbitrary auth/handshake info.
172   */
173  bytes payload = 2;
174}
175
176/*
177 * A message for doing simple auth.
178 */
179message BasicAuth {
180  string username = 2;
181  string password = 3;
182}
183
184message Empty {}
185
186/*
187 * Describes an available action, including both the name used for execution
188 * along with a short description of the purpose of the action.
189 */
190message ActionType {
191  string type = 1;
192  string description = 2;
193}
194
195/*
196 * A service specific expression that can be used to return a limited set
197 * of available Arrow Flight streams.
198 */
199message Criteria {
200  bytes expression = 1;
201}
202
203/*
204 * An opaque action specific for the service.
205 */
206message Action {
207  string type = 1;
208  bytes body = 2;
209}
210
211/*
212 * An opaque result returned after executing an action.
213 */
214message Result {
215  bytes body = 1;
216}
217
218/*
219 * Wrap the result of a getSchema call
220 */
221message SchemaResult {
222  // The schema of the dataset in its IPC form:
223  //   4 bytes - an optional IPC_CONTINUATION_TOKEN prefix
224  //   4 bytes - the byte length of the payload
225  //   a flatbuffer Message whose header is the Schema
226  bytes schema = 1;
227}
228
229/*
230 * The name or tag for a Flight. May be used as a way to retrieve or generate
231 * a flight or be used to expose a set of previously defined flights.
232 */
233message FlightDescriptor {
234
235  /*
236   * Describes what type of descriptor is defined.
237   */
238  enum DescriptorType {
239
240    // Protobuf pattern, not used.
241    UNKNOWN = 0;
242
243    /*
244     * A named path that identifies a dataset. A path is composed of a string
245     * or list of strings describing a particular dataset. This is conceptually
246     *  similar to a path inside a filesystem.
247     */
248    PATH = 1;
249
250    /*
251     * An opaque command to generate a dataset.
252     */
253    CMD = 2;
254  }
255
256  DescriptorType type = 1;
257
258  /*
259   * Opaque value used to express a command. Should only be defined when
260   * type = CMD.
261   */
262  bytes cmd = 2;
263
264  /*
265   * List of strings identifying a particular dataset. Should only be defined
266   * when type = PATH.
267   */
268  repeated string path = 3;
269}
270
271/*
272 * The access coordinates for retrieval of a dataset. With a FlightInfo, a
273 * consumer is able to determine how to retrieve a dataset.
274 */
275message FlightInfo {
276  // The schema of the dataset in its IPC form:
277  //   4 bytes - an optional IPC_CONTINUATION_TOKEN prefix
278  //   4 bytes - the byte length of the payload
279  //   a flatbuffer Message whose header is the Schema
280  bytes schema = 1;
281
282  /*
283   * The descriptor associated with this info.
284   */
285  FlightDescriptor flight_descriptor = 2;
286
287  /*
288   * A list of endpoints associated with the flight. To consume the
289   * whole flight, all endpoints (and hence all Tickets) must be
290   * consumed. Endpoints can be consumed in any order.
291   *
292   * In other words, an application can use multiple endpoints to
293   * represent partitioned data.
294   *
295   * If the returned data has an ordering, an application can use
296   * "FlightInfo.ordered = true" or should return the all data in a
297   * single endpoint. Otherwise, there is no ordering defined on
298   * endpoints or the data within.
299   *
300   * A client can read ordered data by reading data from returned
301   * endpoints, in order, from front to back.
302   *
303   * Note that a client may ignore "FlightInfo.ordered = true". If an
304   * ordering is important for an application, an application must
305   * choose one of them:
306   *
307   * * An application requires that all clients must read data in
308   *   returned endpoints order.
309   * * An application must return the all data in a single endpoint.
310   */
311  repeated FlightEndpoint endpoint = 3;
312
313  // Set these to -1 if unknown.
314  int64 total_records = 4;
315  int64 total_bytes = 5;
316
317  /*
318   * FlightEndpoints are in the same order as the data.
319   */
320  bool ordered = 6;
321
322  /*
323   * Application-defined metadata.
324   *
325   * There is no inherent or required relationship between this
326   * and the app_metadata fields in the FlightEndpoints or resulting
327   * FlightData messages. Since this metadata is application-defined,
328   * a given application could define there to be a relationship,
329   * but there is none required by the spec.
330   */
331  bytes app_metadata = 7;
332}
333
334/*
335 * The information to process a long-running query.
336 */
337message PollInfo {
338  /*
339   * The currently available results.
340   *
341   * If "flight_descriptor" is not specified, the query is complete
342   * and "info" specifies all results. Otherwise, "info" contains
343   * partial query results.
344   *
345   * Note that each PollInfo response contains a complete
346   * FlightInfo (not just the delta between the previous and current
347   * FlightInfo).
348   *
349   * Subsequent PollInfo responses may only append new endpoints to
350   * info.
351   *
352   * Clients can begin fetching results via DoGet(Ticket) with the
353   * ticket in the info before the query is
354   * completed. FlightInfo.ordered is also valid.
355   */
356  FlightInfo info = 1;
357
358  /*
359   * The descriptor the client should use on the next try.
360   * If unset, the query is complete.
361   */
362  FlightDescriptor flight_descriptor = 2;
363
364  /*
365   * Query progress. If known, must be in [0.0, 1.0] but need not be
366   * monotonic or nondecreasing. If unknown, do not set.
367   */
368  optional double progress = 3;
369
370  /*
371   * Expiration time for this request. After this passes, the server
372   * might not accept the retry descriptor anymore (and the query may
373   * be cancelled). This may be updated on a call to PollFlightInfo.
374   */
375  google.protobuf.Timestamp expiration_time = 4;
376}
377
378/*
379 * The request of the CancelFlightInfo action.
380 *
381 * The request should be stored in Action.body.
382 */
383message CancelFlightInfoRequest {
384  FlightInfo info = 1;
385}
386
387/*
388 * The result of a cancel operation.
389 *
390 * This is used by CancelFlightInfoResult.status.
391 */
392enum CancelStatus {
393  // The cancellation status is unknown. Servers should avoid using
394  // this value (send a NOT_FOUND error if the requested query is
395  // not known). Clients can retry the request.
396  CANCEL_STATUS_UNSPECIFIED = 0;
397  // The cancellation request is complete. Subsequent requests with
398  // the same payload may return CANCELLED or a NOT_FOUND error.
399  CANCEL_STATUS_CANCELLED = 1;
400  // The cancellation request is in progress. The client may retry
401  // the cancellation request.
402  CANCEL_STATUS_CANCELLING = 2;
403  // The query is not cancellable. The client should not retry the
404  // cancellation request.
405  CANCEL_STATUS_NOT_CANCELLABLE = 3;
406}
407
408/*
409 * The result of the CancelFlightInfo action.
410 *
411 * The result should be stored in Result.body.
412 */
413message CancelFlightInfoResult {
414  CancelStatus status = 1;
415}
416
417/*
418 * An opaque identifier that the service can use to retrieve a particular
419 * portion of a stream.
420 *
421 * Tickets are meant to be single use. It is an error/application-defined
422 * behavior to reuse a ticket.
423 */
424message Ticket {
425  bytes ticket = 1;
426}
427
428/*
429 * A location where a Flight service will accept retrieval of a particular
430 * stream given a ticket.
431 */
432message Location {
433  string uri = 1;
434}
435
436/*
437 * A particular stream or split associated with a flight.
438 */
439message FlightEndpoint {
440
441  /*
442   * Token used to retrieve this stream.
443   */
444  Ticket ticket = 1;
445
446  /*
447   * A list of URIs where this ticket can be redeemed via DoGet().
448   *
449   * If the list is empty, the expectation is that the ticket can only
450   * be redeemed on the current service where the ticket was
451   * generated.
452   *
453   * If the list is not empty, the expectation is that the ticket can be
454   * redeemed at any of the locations, and that the data returned will be
455   * equivalent. In this case, the ticket may only be redeemed at one of the
456   * given locations, and not (necessarily) on the current service. If one
457   * of the given locations is "arrow-flight-reuse-connection://?", the
458   * client may redeem the ticket on the service where the ticket was
459   * generated (i.e., the same as above), in addition to the other
460   * locations. (This URI was chosen to maximize compatibility, as 'scheme:'
461   * or 'scheme://' are not accepted by Java's java.net.URI.)
462   *
463   * In other words, an application can use multiple locations to
464   * represent redundant and/or load balanced services.
465   */
466  repeated Location location = 2;
467
468  /*
469   * Expiration time of this stream. If present, clients may assume
470   * they can retry DoGet requests. Otherwise, it is
471   * application-defined whether DoGet requests may be retried.
472   */
473  google.protobuf.Timestamp expiration_time = 3;
474
475  /*
476   * Application-defined metadata.
477   *
478   * There is no inherent or required relationship between this
479   * and the app_metadata fields in the FlightInfo or resulting
480   * FlightData messages. Since this metadata is application-defined,
481   * a given application could define there to be a relationship,
482   * but there is none required by the spec.
483   */
484  bytes app_metadata = 4;
485}
486
487/*
488 * The request of the RenewFlightEndpoint action.
489 *
490 * The request should be stored in Action.body.
491 */
492message RenewFlightEndpointRequest {
493  FlightEndpoint endpoint = 1;
494}
495
496/*
497 * A batch of Arrow data as part of a stream of batches.
498 */
499message FlightData {
500
501  /*
502   * The descriptor of the data. This is only relevant when a client is
503   * starting a new DoPut stream.
504   */
505  FlightDescriptor flight_descriptor = 1;
506
507  /*
508   * Header for message data as described in Message.fbs::Message.
509   */
510  bytes data_header = 2;
511
512  /*
513   * Application-defined metadata.
514   */
515  bytes app_metadata = 3;
516
517  /*
518   * The actual batch of Arrow data. Preferably handled with minimal-copies
519   * coming last in the definition to help with sidecar patterns (it is
520   * expected that some implementations will fetch this field off the wire
521   * with specialized code to avoid extra memory copies).
522   */
523  bytes data_body = 1000;
524}
525
526/**
527 * The response message associated with the submission of a DoPut.
528 */
529message PutResult {
530  bytes app_metadata = 1;
531}
532
533/*
534 * EXPERIMENTAL: Union of possible value types for a Session Option to be set to.
535 *
536 * By convention, an attempt to set a valueless SessionOptionValue should
537 * attempt to unset or clear the named option value on the server.
538 */
539message SessionOptionValue {
540  message StringListValue {
541    repeated string values = 1;
542  }
543
544  oneof option_value {
545    string string_value = 1;
546    bool bool_value = 2;
547    sfixed64 int64_value = 3;
548    double double_value = 4;
549    StringListValue string_list_value = 5;
550  }
551}
552
553/*
554 * EXPERIMENTAL: A request to set session options for an existing or new (implicit)
555 * server session.
556 *
557 * Sessions are persisted and referenced via a transport-level state management, typically
558 * RFC 6265 HTTP cookies when using an HTTP transport.  The suggested cookie name or state
559 * context key is 'arrow_flight_session_id', although implementations may freely choose their
560 * own name.
561 *
562 * Session creation (if one does not already exist) is implied by this RPC request, however
563 * server implementations may choose to initiate a session that also contains client-provided
564 * session options at any other time, e.g. on authentication, or when any other call is made
565 * and the server wishes to use a session to persist any state (or lack thereof).
566 */
567message SetSessionOptionsRequest {
568  map<string, SessionOptionValue> session_options = 1;
569}
570
571/*
572 * EXPERIMENTAL: The results (individually) of setting a set of session options.
573 *
574 * Option names should only be present in the response if they were not successfully
575 * set on the server; that is, a response without an Error for a name provided in the
576 * SetSessionOptionsRequest implies that the named option value was set successfully.
577 */
578message SetSessionOptionsResult {
579  enum ErrorValue {
580    // Protobuf deserialization fallback value: The status is unknown or unrecognized.
581    // Servers should avoid using this value. The request may be retried by the client.
582    UNSPECIFIED = 0;
583    // The given session option name is invalid.
584    INVALID_NAME = 1;
585    // The session option value or type is invalid.
586    INVALID_VALUE = 2;
587    // The session option cannot be set.
588    ERROR = 3;
589  }
590
591  message Error {
592    ErrorValue value = 1;
593  }
594
595  map<string, Error> errors = 1;
596}
597
598/*
599 * EXPERIMENTAL: A request to access the session options for the current server session.
600 *
601 * The existing session is referenced via a cookie header or similar (see
602 * SetSessionOptionsRequest above); it is an error to make this request with a missing,
603 * invalid, or expired session cookie header or other implementation-defined session
604 * reference token.
605 */
606message GetSessionOptionsRequest {
607}
608
609/*
610 * EXPERIMENTAL: The result containing the current server session options.
611 */
612message GetSessionOptionsResult {
613    map<string, SessionOptionValue> session_options = 1;
614}
615
616/*
617 * Request message for the "Close Session" action.
618 *
619 * The exiting session is referenced via a cookie header.
620 */
621message CloseSessionRequest {
622}
623
624/*
625 * The result of closing a session.
626 */
627message CloseSessionResult {
628  enum Status {
629    // Protobuf deserialization fallback value: The session close status is unknown or
630    // not recognized. Servers should avoid using this value (send a NOT_FOUND error if
631    // the requested session is not known or expired). Clients can retry the request.
632    UNSPECIFIED = 0;
633    // The session close request is complete. Subsequent requests with
634    // the same session produce a NOT_FOUND error.
635    CLOSED = 1;
636    // The session close request is in progress. The client may retry
637    // the close request.
638    CLOSING = 2;
639    // The session is not closeable. The client should not retry the
640    // close request.
641    NOT_CLOSEABLE = 3;
642  }
643
644  Status status = 1;
645}