Skip to content

Releases: aliyun/aliyun-odps-java-sdk

v0.48.7-public

07 Aug 02:54
Compare
Choose a tag to compare

Changelog

[0.48.7-public] - 2024-08-07

Enhancements

  • TableTunnel Configuration Optimization: Introduced the tags attribute to TableTunnel Configuration, enabling users to attach custom tags to tunnel operations for enhanced logging and management. These tags are recorded in the tenant-level information schema.
Odps odps;
Configuration configuration =
    Configuration.builder(odps)
                 .withTags(Arrays.asList("tag1", "tag2")) // Utilize Arrays.asList for code standardization
                 .build();
TableTunnel tableTunnel = odps.tableTunnel(configuration);
// Proceed with tunnel operations
  • Instance Enhancement: Added the waitForTerminatedAndGetResult method to the Instance class, integrating optimization strategies from versions 0.48.6 and 0.48.7 for the SQLExecutor interface, enhancing operational efficiency. Refer to com.aliyun.odps.sqa.SQLExecutorImpl.getOfflineResultSet for usage.

Improve

  • SQLExecutor Offline Job Processing Optimization: Significantly reduced end-to-end latency by enabling immediate result retrieval after critical processing stages of offline jobs executed by SQLExecutor, without waiting for the job to fully complete, thus boosting response speed and resource utilization.

Fixes

  • TunnelRetryHandler NPE Fix: Rectified a potential null pointer exception issue in the getRetryPolicy method when the error code (error code) was null.

更新日志

[0.48.7-public] - 2024-08-07

增强

  • TableTunnel 配置优化:引入 tags 属性至 TableTunnel Configuration
    ,旨在允许用户为Tunnel相关操作附上自定义标签。这些标签会被记录在租户层级的 information schema
    中,便于日志追踪与管理。
Odps odps;
    Configuration configuration=
    Configuration.builder(odps)
    .withTags(Arrays.asList("tag1","tag2")) 
    .build();
    TableTunnel tableTunnel=odps.tableTunnel(configuration);
// 继续执行Tunnel相关操作
  • Instance 增强:在 Instance 类中新增 waitForTerminatedAndGetResult 方法,此方法整合了 0.48.6
    及 0.48.7 版本中对 SQLExecutor
    接口的优化策略,提升了操作效率。使用方式可参考 com.aliyun.odps.sqa.SQLExecutorImpl.getOfflineResultSet
    方法。

优化

  • SQLExecutor 离线作业处理优化:显著减少了端到端延迟,通过改进使得由 SQLExecutor
    执行的离线作业能在关键处理阶段完成后即刻获取结果,无需等待作业全部完成,提高了响应速度和资源利用率。

修复

  • TunnelRetryHandler NPE修复:修正了 getRetryPolicy 方法中在错误码 (error code) 为 null
    的情况下潜在空指针异常问题。

0.48.6-public

17 Jul 12:20
Compare
Choose a tag to compare

Changelog

[0.48.6-public] - 2024-07-17

Added

  • Serializable Support:
    • Key data types like ArrayRecord, Column, TableSchema, and TypeInfo now support serialization and deserialization, enabling caching and inter-process communication.
  • Predicate Pushdown:
    • Introduced Attribute type predicates to specify column names.

Changed

  • Tunnel Interface Refactoring:
    • Refactored Tunnel-related interfaces to include seamless retry logic, greatly enhancing stability and robustness.
    • Removed TunnelRetryStrategy and ConfigurationImpl classes, which are now replaced by TunnelRetryHandler and Configuration respectively.

Improve

  • SQLExecutor Optimization:
    • Improved performance when executing offline SQL jobs through the SQLExecutor interface, reducing one network request per job to fetch results, thereby decreasing end-to-end latency.

Fixed

  • Decimal Read in Table.read:
    • Fixed issue where trailing zeroes in the decimal type were not as expected in the Table.read interface.

更新日志

[0.48.6-public] - 2024-07-17

新增

  • 支持序列化
    • 主要数据类型如 ArrayRecordColumnTableSchemaTypeInfo 现在支持序列化和反序列化,能够进行缓存和进程间通信。
  • 谓词下推
    • 新增 Attribute 类型的谓词,用于指定列名。

变更

  • Tunnel 接口重构
    • 重构了 Tunnel 相关接口,加入了无感知的重试逻辑,大大增强了稳定性和鲁棒性。
    • 删除了 TunnelRetryStrategyConfigurationImpl 类,分别被 TunnelRetryHandlerConfiguration 所取代。

优化

  • SQLExecutor 优化
    • 在使用 SQLExecutor 接口执行离线 SQL 作业时进行优化,减少每个作业获取结果时的一次网络请求,从而减少端到端延时。

修复

  • Table.read Decimal 读取
    • 修复了 Table.read 接口在读取 decimal 类型时,后面补零不符合预期的问题。

v0.48.5-public

18 Jun 03:41
Compare
Choose a tag to compare

Changelog

[0.48.5-public] - 2024-06-18

Added

  • Added the getPartitionSpecs method to the Table interface. Compared to the getPartitions method, this method does not require fetching detailed partition information, resulting in faster execution.

Changes

  • Removed the isPrimaryKey method from the Column class. This method was initially added to support users in specifying certain columns as primary keys when creating a table. However, it was found to be misleading in read scenarios, as it does not communicate with the server. Therefore, it is not suitable for determining whether a column is a primary key. Moreover, when using this method for table creation, primary keys should be table-level fields (since primary keys are ordered), and this method neglected the order of primary keys, leading to a flawed design. Hence, it will be removed in version 0.48.6.

    For read scenarios, users should use the Table.getPrimaryKey() method to retrieve primary keys. For table creation, users can now use the withPrimaryKeys method in the TableCreator to specify primary keys during table creation.

Fixes

  • Fixed an issue in the RecordConverter where formatting a Record of type String would throw an exception when the data type was byte[].

更新日志

[0.48.5-public] - 2024-06-18

新增

  • Table 接口新增 getPartitionSpecs 方法, 相比 getPartitions 方法,该方法无需获取分区的详细信息,进而获得更快的执行速度

变更

  • 移除了Column类中的isPrimaryKey
    方法。这个方法最初是为了支持用户在创建表时指定某些列为主键。然而,在读取场景下,这个方法容易引起误解,因为它并不会与服务端通信,所以当用户希望知道某列是否为主键时,这个方法并不适用。此外,在使用该方法建表时,主键应当是表级别的字段(因为主键是有序的),而该方法忽略了主键的顺序,设计上不合理。因此,将在0.48.6版本中移除了该方法。
    在读取场景,用户应当使用Table.getPrimaryKey()方法来获取主键。
    在建表场景,改为在TableCreator中增加withPrimaryKeys方法以达成建表时指定主键的目的。

修复

修复了RecordConverter在format String类型的Record,当数据类型为byte[] 时,会抛出异常的问题

v0.48.4-public

03 Jun 12:27
Compare
Choose a tag to compare

Changelog

[0.48.4-public] - 2024-06-04

New

  • Use table-api to write MaxCompute tables, now supports JSON and TIMESTAMP_NTZ types
  • odps-sdk-udf functions continue to be improved

Change

  • When the Table.read() interface encounters the Decimal type, it will currently remove the trailing 0 by default (but will not use scientific notation)

Fix

  • Fixed the problem that ArrayRecord does not support the getBytes method for JSON type

更新日志

[0.48.4-public] - 2024-06-04

新增

  • 使用 table-api 写MaxCompute表,现在支持JSONTIMESTAMP_NTZ类型
  • odps-sdk-udf 功能继续完善

变更

  • Table.read() 接口在遇到 Decimal 类型时,目前将默认去掉尾部的 0(但不会使用科学计数法)

修复

  • 修复了 ArrayRecord 针对 JSON 类型不支持 getBytes 方法的问题

v0.48.3-public

21 May 08:03
Compare
Choose a tag to compare

Changelog

[0.48.3-public] - 2024-05-21

Added

  • Support for passing retryStrategy when building UpsertSession.

Changed

  • The onFlushFail(String, int) interface in UpsertStream.Listener has been marked as @Deprecated in favor of onFlushFail(Throwable, int) interface. This interface will be removed in version 0.50.0.
  • Default compression algorithm for Tunnel upsert has been changed to ODPS_LZ4_FRAME.

Fixed

  • Fixed an issue where data couldn't be written correctly in Tunnel upsert when the compression algorithm was set to something other than ZLIB.
  • Fixed a resource leak in UpsertSession that could persist for a long time if close was not explicitly called by the user.
  • Fixed an exception thrown by Tunnel data retrieval interfaces (preview, download) when encountering invalid Decimal types (such as inf, nan) in tables; will now return null to align with the getResult interface.

更新日志

[0.48.3-public] - 2024-05-21

新增

  • 在构建UpsertSession时,现在支持传入 retryStrategy

变更

  • UpsertStream.ListeneronFlushFail(String, int) 接口被标记为了 @Deprecated,使用 onFlushFail(Throwable, int) 接口替代。该接口将在 0.50.0 版本中移除。
  • Tunnel upsert 的默认压缩算法更改为 ODPS_LZ4_FRAME

修复

  • 修复了 Tunnel upsert 当压缩算法不为 ZLIB 时,数据无法正确写入的问题。
  • 修复了 UpsertSession 当用户未显式调用 close 时,资源长时间无法释放的问题。
  • 修复了 Tunnel 获取数据相关接口(previewdownload),当遇到表内存在不合法 Decimal 类型时(如 infnan),会抛出异常的问题,现在会返回 null(与 getResult 接口一致)。

v0.48.2-public

08 May 04:05
Compare
Choose a tag to compare

Changelog

[0.48.2-public] - 2024-05-08

Important fixes

Fixed the issue of relying on the user's local time zone when bucketing primary keys of DATE and DATETIME types during Tunnel upsert. This may lead to incorrect bucketing and abnormal data query. Users who rely on this feature are strongly recommended to upgrade to version 0.48.2.

Added

Table adds a method getTableLifecycleConfig() to obtain the lifecycle configuration of hierarchical storage.
TableReadSession now supports predicate pushdown

更新日志

[0.48.2-public] - 2024-05-08

重要修复

修复了Tunnel upsert时,对DATE、DATETIME类型的主键进行分桶时,依赖用户本地时区的问题。这可能导致分桶有误,导致数据查询异常。强烈建议依赖该特性的用户升级到0.48.2版本。

新增

Table增加获取分层存储的lifecycle配置的方法getTableLifecycleConfig()。
TableReadSession 现支持谓词下推了

v0.48.1-public

07 May 06:54
Compare
Choose a tag to compare

Changelog

[0.48.1-public] - 2024-05-07

Added

Arrow and ANTLR Libraries: Added new includes to the Maven Shade Plugin configuration for better handling and packaging of specific libraries. These includes ensure that certain essential libraries are correctly packaged into the final shaded artifact. The newly included libraries are:

  • org.apache.arrow:arrow-format:jar
  • org.apache.arrow:arrow-memory-core:jar
  • org.apache.arrow:arrow-memory-netty:jar
  • org.antlr:ST4:jar
  • org.antlr:antlr-runtime:jar
  • org.antlr:antlr4:jar
  • org.antlr:antlr4-runtime:jar

Relocation Adjustments

Shaded Relocation for ANTLR and StringTemplate: The configuration now includes updated relocation rules for org.antlr and org.stringtemplate.v4 packages to prevent potential conflicts with other versions of these libraries that may exist in the classpath. The new shaded patterns are:
org.stringtemplate.v4 relocated to com.aliyun.odps.thirdparty.org.stringtemplate.v4
org.antlr relocated to com.aliyun.odps.thirdparty.antlr

更新日志

[0.48.1-public] - 2024-05-07

新增

Arrow和ANTLR库:在 Maven Shade 插件配置中添加了新的包含项,以更好地处理和打包特定库。这些包含项确保某些关键库被正确地打包进最终的遮蔽(Shaded)构件中。新加入的库包括:

  • org.apache.arrow:arrow-format:jar
  • org.apache.arrow:arrow-memory-core:jar
  • org.apache.arrow:arrow-memory-netty:jar
  • org.antlr:ST4:jar
  • org.antlr:antlr-runtime:jar
  • org.antlr:antlr4:jar
  • org.antlr:antlr4-runtime:jar

位置调整

ANTLR和StringTemplate的遮蔽重定位:配置现在包括针对 org.antlr 和 org.stringtemplate.v4 包的更新重定位规则,以防止可能在类路径中存在的这些库的其他版本的潜在冲突。新的遮蔽模式是:
org.stringtemplate.v4 重定位至 com.aliyun.odps.thirdparty.org.stringtemplate.v4
org.antlr 重定位至 com.aliyun.odps.thirdparty.antlr

0.48.0-public

19 Apr 10:04
a45c4d0
Compare
Choose a tag to compare

Changelog

[0.48.0-public] - 2024-04-22

Added

  • Introduced odps-sdk-udf module to allow batch data reading in UDFs for MaxCompute, significantly improving performance in high-volume data scenarios.
  • Table now supports retrieving ColumnMaskInfo, aiding in data desensitization scenarios and relevant information acquisition.
  • Support for setting proxies through the use of odps.getRestClient().setProxy(Proxy) method.
  • Implementation of iterable RecordReader and RecordReader.stream() method, enabling conversion to a Stream of Record objects.
  • Added new parameters upsertConcurrentNum and upsertNetworkNum in TableAPI RestOptions for more detailed control for users performing upsert operations via the TableAPI.
  • Support for Builder pattern in constructing TableSchema.
  • Support for toString method in ArrayRecord.

Improved

  • UploadSession now supports configuration of the GET_BLOCK_ID parameter to speed up session creation when the client does not need blockId.
  • Enhanced table creation method using the builder pattern (TableCreator), making table creation simpler.

Fixed

  • Fixed a bug in Upsert Session where the timeout setting was configured incorrectly.
  • Fixed the issue where TimestampWritable computed one second less when nanoseconds were negative.

更新日志

[0.48.0-public] - 2024-04-22

新增

  • 引入了odps-sdk-udf模块,支持在UDF中按批读取MaxCompute数据,能在大数据量场景下显著提高性能。
  • Table现支持获取ColumnMaskInfo,用于数据脱敏场景,方便相关信息的获取。
  • 新增通过odps.getRestClient().setProxy(Proxy)方法设置代理的支持。
  • 实现了可迭代的RecordReader以及RecordReader.stream()方法,允许将其转换为Record对象的流。
  • TableAPI RestOptions中新增upsertConcurrentNumupsertNetworkNum参数,为使用TableAPI进行upsert操作的用户提供更细致的控制。
  • 支持使用Builder模式来构建TableSchema
  • ArrayRecord支持toString方法。

变更

  • 现在,当用户使用StsAccount但不传递StsToken时,将被视作使用AliyunAccount

改进

  • UploadSession现支持配置GET_BLOCK_ID参数,当客户端不需要blockId时,可以加速创建Session的速度。
  • 使用builder模式(TableCreator)加强了表的创建方法,现在可以更简单地创建表了。

修复

  • 修复了Upsert Session获取连接时,超时时间配置错误的问题。
  • 修复了TimestampWritable在纳秒为负数时计算出错一秒的问题。

v0.47.0-public

08 Apr 11:16
0faa2d5
Compare
Choose a tag to compare

Changelog

[0.47.0-public] - 2024-04-08

Added

  • Support for new Stream type that enables incremental queries.
  • preview method to the TableTunnel for data preview purposes.
  • OdpsRecordConverter for parsing and formatting records.
  • Enhancements to the Projects class with create and delete methods now available, and update method made public. Operations related to the group-api package are now marked as deprecated.
  • Improved Schemas class to support filtering schemas with SchemaFilter, listing schemas, and retrieving detailed schema metadata.
  • DownloadSession introduces new parameter disableModifiedCheck to bypass modification checks and fetchBlockId to skip block ID list retrieval.
  • TableWriteSession supports writing TIMESTAMP_NTZ / JSON types and adds a new parameter MaxFieldSize.
  • TABLE_API adds predicate related classes to support predicate pushdown in the future.

Changed

  • The implementation of the read method in the Table class is now replaced with TableTunnel.preview, supporting new types in MaxCompute and time types switched to Java 8 time types without timezone.
  • The default MapWritable implementation switched from HashMap to LinkedHashMap to ensure order.
  • Column class now supports creation using the Builder pattern.

Improved

  • TableReadSession now introduces new parameters maxBatchRawSize and splitMaxFileNum.
  • UpsertSession enhancements:
    • Supports writing partial columns.
    • Allows setting the number of Netty thread pools with the default changed to 1.
    • Enables setting maximum concurrency with the default value changed to 16.
  • TableTunnel now supports setting quotaName option.

更新日志

[0.47.0-public] - 2024-04-08

新增

  • 对 Stream 新类型的支持,可用于进行增量查询。
  • TableTunnel 中增加了 preview 方法,用于数据预览。
  • 引入 OdpsRecordConverter,用于对 Record 进行解析和格式化。
  • Projects 类增加了 create(创建)和 delete(删除)方法,update 方法现已公开。group-api 包下的相关操作已被标记为弃用。
  • Schemas 类增强,支持通过设置 SchemaFilter 来过滤 schema,支持 listSchema 以及获取 schema 的详细元信息。
  • DownloadSession 新增参数 disableModifiedCheck,用于跳过修改检查。新增参数 fetchBlockId,用于跳过获取 block ID 列表。
  • TableWriteSession 支持写入 TIMESTAMP_NTZ / JSON 类型,新增参数 MaxFieldSize
  • TABLE_API 新增 predicate 相关类,用于后续支持谓词下推。

变更

  • Table 类的 read 方法实现现已更换为 TableTunnel.preview 方法,会支持 MaxCompute 新类型,时间类型切换为 Java 8 无时区类型。
  • 默认的 MapWritable 实现从 HashMap 改为 LinkedHashMap,以确保有序。
  • Column 类现支持使用建造者模式(Builder pattern)进行创建。

改进

  • TableReadSession 新增参数 maxBatchRawSizesplitMaxFileNum
  • UpsertSession 现支持:
    • 写入部分列。
    • 设置 Netty 线程池的数量(默认更改为 1)。
    • 设置最大并发量(默认值更改为 16)。
  • TableTunnel 支持设置 quotaName 选项。

Important Correction Notice for Version 0.47.0

19 Apr 09:13
8c56cd1
Compare
Choose a tag to compare

We have identified an issue with the recent release of version 0.47.0, where code intended for the upcoming release of version 0.48.0 was inadvertently included. We understand that this may have caused some confusion, and we sincerely apologize for any inconvenience this may have caused.

Impact Assessment

After assessing all the code, we can confirm that all changes are backward-compatible. Therefore, there is no need to worry about the current version negatively impacting usage. You can rest assured that it is safe to use.

Recommended Action

Considering that the released version 0.47.0 actually contains the content of version 0.48.0, we advise you to review the complete release notes for version 0.48.0 to prepare for the upcoming features and improvements. If you have any questions or require assistance during usage, please do not hesitate to contact us.

Acknowledgments and Appreciation

We deeply apologize for any inconvenience this matter may have caused and thank you for your understanding and support. The MaxCompute team is committed to earning your trust and we will continue working to ensure that such issues are avoided in future releases.

Should you encounter any issues or require further assistance, please visit our support page at help.aliyun.com, or raise an issue in our GitHub repository.

Thank you for your continued support.

Best Regards,
The MaxCompute Team

关于0.47.0版本重要更正通知

我们发现了最近发布的0.47.0版本中的一个问题,即原本计划在即将发布的0.48.0版本中包含的代码不慎被包含在了此次发布中。我们理解这可能会导致混淆,并且对由此可能造成的不便表示诚挚的歉意。

影响评估

我们已经评估了所有代码,确认所有变更均属于兼容性变更,因此用户不必担心当前版本会对使用产生负面影响,请您放心。

建议的行动

考虑到已经发布的0.47.0实际上包含了0.48.0版本的内容,我们建议您阅读完整的0.48.0版本发布说明,并为即将到来的特性与改进做好准备。如果您在使用过程中有任何疑问或需要帮助,请及时与我们取得联系。

致谢和感激

我们为可能给您带来的任何不便表示诚挚的歉意,并感谢您的理解与支持。MaxCompute团队值得您的信任,并会继续致力于确保未来的版本发布避免此类问题。

如果您遇到任何问题或需要更多帮助,请访问我们的支持页面 help.aliyun.com,或者在我们的GitHub仓库中提交问题。

感谢您的持续支持。

此致,
MaxCompute团队