Releases: aliyun/aliyun-odps-java-sdk
v0.48.7-public
Changelog
[0.48.7-public] - 2024-08-07
Enhancements
- TableTunnel Configuration Optimization: Introduced the
tags
attribute toTableTunnel Configuration
, enabling users to attach custom tags to tunnel operations for enhanced logging and management. These tags are recorded in the tenant-levelinformation schema
.
Odps odps;
Configuration configuration =
Configuration.builder(odps)
.withTags(Arrays.asList("tag1", "tag2")) // Utilize Arrays.asList for code standardization
.build();
TableTunnel tableTunnel = odps.tableTunnel(configuration);
// Proceed with tunnel operations
- Instance Enhancement: Added the
waitForTerminatedAndGetResult
method to theInstance
class, integrating optimization strategies from versions 0.48.6 and 0.48.7 for theSQLExecutor
interface, enhancing operational efficiency. Refer tocom.aliyun.odps.sqa.SQLExecutorImpl.getOfflineResultSet
for usage.
Improve
- SQLExecutor Offline Job Processing Optimization: Significantly reduced end-to-end latency by enabling immediate result retrieval after critical processing stages of offline jobs executed by
SQLExecutor
, without waiting for the job to fully complete, thus boosting response speed and resource utilization.
Fixes
- TunnelRetryHandler NPE Fix: Rectified a potential null pointer exception issue in the
getRetryPolicy
method when the error code (error code
) wasnull
.
更新日志
[0.48.7-public] - 2024-08-07
增强
- TableTunnel 配置优化:引入
tags
属性至TableTunnel Configuration
,旨在允许用户为Tunnel相关操作附上自定义标签。这些标签会被记录在租户层级的information schema
中,便于日志追踪与管理。
Odps odps;
Configuration configuration=
Configuration.builder(odps)
.withTags(Arrays.asList("tag1","tag2"))
.build();
TableTunnel tableTunnel=odps.tableTunnel(configuration);
// 继续执行Tunnel相关操作
- Instance 增强:在
Instance
类中新增waitForTerminatedAndGetResult
方法,此方法整合了 0.48.6
及 0.48.7 版本中对SQLExecutor
接口的优化策略,提升了操作效率。使用方式可参考com.aliyun.odps.sqa.SQLExecutorImpl.getOfflineResultSet
方法。
优化
- SQLExecutor 离线作业处理优化:显著减少了端到端延迟,通过改进使得由
SQLExecutor
执行的离线作业能在关键处理阶段完成后即刻获取结果,无需等待作业全部完成,提高了响应速度和资源利用率。
修复
- TunnelRetryHandler NPE修复:修正了
getRetryPolicy
方法中在错误码 (error code
) 为null
的情况下潜在空指针异常问题。
0.48.6-public
Changelog
[0.48.6-public] - 2024-07-17
Added
- Serializable Support:
- Key data types like
ArrayRecord
,Column
,TableSchema
, andTypeInfo
now support serialization and deserialization, enabling caching and inter-process communication.
- Key data types like
- Predicate Pushdown:
- Introduced
Attribute
type predicates to specify column names.
- Introduced
Changed
- Tunnel Interface Refactoring:
- Refactored Tunnel-related interfaces to include seamless retry logic, greatly enhancing stability and robustness.
- Removed
TunnelRetryStrategy
andConfigurationImpl
classes, which are now replaced byTunnelRetryHandler
andConfiguration
respectively.
Improve
- SQLExecutor Optimization:
- Improved performance when executing offline SQL jobs through the
SQLExecutor
interface, reducing one network request per job to fetch results, thereby decreasing end-to-end latency.
- Improved performance when executing offline SQL jobs through the
Fixed
- Decimal Read in Table.read:
- Fixed issue where trailing zeroes in the
decimal
type were not as expected in theTable.read
interface.
- Fixed issue where trailing zeroes in the
更新日志
[0.48.6-public] - 2024-07-17
新增
- 支持序列化:
- 主要数据类型如
ArrayRecord
、Column
、TableSchema
和TypeInfo
现在支持序列化和反序列化,能够进行缓存和进程间通信。
- 主要数据类型如
- 谓词下推:
- 新增
Attribute
类型的谓词,用于指定列名。
- 新增
变更
- Tunnel 接口重构:
- 重构了 Tunnel 相关接口,加入了无感知的重试逻辑,大大增强了稳定性和鲁棒性。
- 删除了
TunnelRetryStrategy
和ConfigurationImpl
类,分别被TunnelRetryHandler
和Configuration
所取代。
优化
- SQLExecutor 优化:
- 在使用
SQLExecutor
接口执行离线 SQL 作业时进行优化,减少每个作业获取结果时的一次网络请求,从而减少端到端延时。
- 在使用
修复
- Table.read Decimal 读取:
- 修复了
Table.read
接口在读取decimal
类型时,后面补零不符合预期的问题。
- 修复了
v0.48.5-public
Changelog
[0.48.5-public] - 2024-06-18
Added
- Added the
getPartitionSpecs
method to theTable
interface. Compared to thegetPartitions
method, this method does not require fetching detailed partition information, resulting in faster execution.
Changes
-
Removed the
isPrimaryKey
method from theColumn
class. This method was initially added to support users in specifying certain columns as primary keys when creating a table. However, it was found to be misleading in read scenarios, as it does not communicate with the server. Therefore, it is not suitable for determining whether a column is a primary key. Moreover, when using this method for table creation, primary keys should be table-level fields (since primary keys are ordered), and this method neglected the order of primary keys, leading to a flawed design. Hence, it will be removed in version 0.48.6.For read scenarios, users should use the
Table.getPrimaryKey()
method to retrieve primary keys. For table creation, users can now use thewithPrimaryKeys
method in theTableCreator
to specify primary keys during table creation.
Fixes
- Fixed an issue in the
RecordConverter
where formatting aRecord
of typeString
would throw an exception when the data type wasbyte[]
.
更新日志
[0.48.5-public] - 2024-06-18
新增
Table
接口新增getPartitionSpecs
方法, 相比getPartitions
方法,该方法无需获取分区的详细信息,进而获得更快的执行速度
变更
- 移除了
Column
类中的isPrimaryKey
方法。这个方法最初是为了支持用户在创建表时指定某些列为主键。然而,在读取场景下,这个方法容易引起误解,因为它并不会与服务端通信,所以当用户希望知道某列是否为主键时,这个方法并不适用。此外,在使用该方法建表时,主键应当是表级别的字段(因为主键是有序的),而该方法忽略了主键的顺序,设计上不合理。因此,将在0.48.6版本中移除了该方法。
在读取场景,用户应当使用Table.getPrimaryKey()
方法来获取主键。
在建表场景,改为在TableCreator
中增加withPrimaryKeys
方法以达成建表时指定主键的目的。
修复
修复了RecordConverter
在format String类型的Record
,当数据类型为byte[]
时,会抛出异常的问题
v0.48.4-public
Changelog
[0.48.4-public] - 2024-06-04
New
- Use
table-api
to write MaxCompute tables, now supportsJSON
andTIMESTAMP_NTZ
types odps-sdk-udf
functions continue to be improved
Change
- When the Table.read() interface encounters the Decimal type, it will currently remove the trailing 0 by default (but will not use scientific notation)
Fix
- Fixed the problem that ArrayRecord does not support the getBytes method for JSON type
更新日志
[0.48.4-public] - 2024-06-04
新增
- 使用
table-api
写MaxCompute表,现在支持JSON
和TIMESTAMP_NTZ
类型 odps-sdk-udf
功能继续完善
变更
- Table.read() 接口在遇到 Decimal 类型时,目前将默认去掉尾部的 0(但不会使用科学计数法)
修复
- 修复了 ArrayRecord 针对 JSON 类型不支持 getBytes 方法的问题
v0.48.3-public
Changelog
[0.48.3-public] - 2024-05-21
Added
- Support for passing
retryStrategy
when buildingUpsertSession
.
Changed
- The
onFlushFail(String, int)
interface inUpsertStream.Listener
has been marked as@Deprecated
in favor ofonFlushFail(Throwable, int)
interface. This interface will be removed in version 0.50.0. - Default compression algorithm for Tunnel upsert has been changed to
ODPS_LZ4_FRAME
.
Fixed
- Fixed an issue where data couldn't be written correctly in Tunnel upsert when the compression algorithm was set to something other than
ZLIB
. - Fixed a resource leak in
UpsertSession
that could persist for a long time ifclose
was not explicitly called by the user. - Fixed an exception thrown by Tunnel data retrieval interfaces (
preview
,download
) when encountering invalidDecimal
types (such asinf
,nan
) in tables; will now returnnull
to align with thegetResult
interface.
更新日志
[0.48.3-public] - 2024-05-21
新增
- 在构建UpsertSession时,现在支持传入
retryStrategy
。
变更
UpsertStream.Listener
的onFlushFail(String, int)
接口被标记为了@Deprecated
,使用onFlushFail(Throwable, int)
接口替代。该接口将在 0.50.0 版本中移除。- Tunnel upsert 的默认压缩算法更改为
ODPS_LZ4_FRAME
。
修复
- 修复了 Tunnel upsert 当压缩算法不为
ZLIB
时,数据无法正确写入的问题。 - 修复了 UpsertSession 当用户未显式调用
close
时,资源长时间无法释放的问题。 - 修复了 Tunnel 获取数据相关接口(
preview
,download
),当遇到表内存在不合法Decimal
类型时(如inf
,nan
),会抛出异常的问题,现在会返回null
(与getResult
接口一致)。
v0.48.2-public
Changelog
[0.48.2-public] - 2024-05-08
Important fixes
Fixed the issue of relying on the user's local time zone when bucketing primary keys of DATE and DATETIME types during Tunnel upsert. This may lead to incorrect bucketing and abnormal data query. Users who rely on this feature are strongly recommended to upgrade to version 0.48.2.
Added
Table adds a method getTableLifecycleConfig() to obtain the lifecycle configuration of hierarchical storage.
TableReadSession now supports predicate pushdown
更新日志
[0.48.2-public] - 2024-05-08
重要修复
修复了Tunnel upsert时,对DATE、DATETIME类型的主键进行分桶时,依赖用户本地时区的问题。这可能导致分桶有误,导致数据查询异常。强烈建议依赖该特性的用户升级到0.48.2版本。
新增
Table增加获取分层存储的lifecycle配置的方法getTableLifecycleConfig()。
TableReadSession 现支持谓词下推了
v0.48.1-public
Changelog
[0.48.1-public] - 2024-05-07
Added
Arrow and ANTLR Libraries: Added new includes to the Maven Shade Plugin configuration for better handling and packaging of specific libraries. These includes ensure that certain essential libraries are correctly packaged into the final shaded artifact. The newly included libraries are:
- org.apache.arrow:arrow-format:jar
- org.apache.arrow:arrow-memory-core:jar
- org.apache.arrow:arrow-memory-netty:jar
- org.antlr:ST4:jar
- org.antlr:antlr-runtime:jar
- org.antlr:antlr4:jar
- org.antlr:antlr4-runtime:jar
Relocation Adjustments
Shaded Relocation for ANTLR and StringTemplate: The configuration now includes updated relocation rules for org.antlr and org.stringtemplate.v4 packages to prevent potential conflicts with other versions of these libraries that may exist in the classpath. The new shaded patterns are:
org.stringtemplate.v4 relocated to com.aliyun.odps.thirdparty.org.stringtemplate.v4
org.antlr relocated to com.aliyun.odps.thirdparty.antlr
更新日志
[0.48.1-public] - 2024-05-07
新增
Arrow和ANTLR库:在 Maven Shade 插件配置中添加了新的包含项,以更好地处理和打包特定库。这些包含项确保某些关键库被正确地打包进最终的遮蔽(Shaded)构件中。新加入的库包括:
- org.apache.arrow:arrow-format:jar
- org.apache.arrow:arrow-memory-core:jar
- org.apache.arrow:arrow-memory-netty:jar
- org.antlr:ST4:jar
- org.antlr:antlr-runtime:jar
- org.antlr:antlr4:jar
- org.antlr:antlr4-runtime:jar
位置调整
ANTLR和StringTemplate的遮蔽重定位:配置现在包括针对 org.antlr 和 org.stringtemplate.v4 包的更新重定位规则,以防止可能在类路径中存在的这些库的其他版本的潜在冲突。新的遮蔽模式是:
org.stringtemplate.v4 重定位至 com.aliyun.odps.thirdparty.org.stringtemplate.v4
org.antlr 重定位至 com.aliyun.odps.thirdparty.antlr
0.48.0-public
Changelog
[0.48.0-public] - 2024-04-22
Added
- Introduced
odps-sdk-udf
module to allow batch data reading in UDFs for MaxCompute, significantly improving performance in high-volume data scenarios. Table
now supports retrievingColumnMaskInfo
, aiding in data desensitization scenarios and relevant information acquisition.- Support for setting proxies through the use of
odps.getRestClient().setProxy(Proxy)
method. - Implementation of iterable
RecordReader
andRecordReader.stream()
method, enabling conversion to a Stream ofRecord
objects. - Added new parameters
upsertConcurrentNum
andupsertNetworkNum
inTableAPI RestOptions
for more detailed control for users performing upsert operations via the TableAPI. - Support for
Builder
pattern in constructingTableSchema
. - Support for
toString
method inArrayRecord
.
Improved
UploadSession
now supports configuration of theGET_BLOCK_ID
parameter to speed up session creation when the client does not needblockId
.- Enhanced table creation method using the
builder
pattern (TableCreator
), making table creation simpler.
Fixed
- Fixed a bug in
Upsert Session
where the timeout setting was configured incorrectly. - Fixed the issue where
TimestampWritable
computed one second less when nanoseconds were negative.
更新日志
[0.48.0-public] - 2024-04-22
新增
- 引入了
odps-sdk-udf
模块,支持在UDF中按批读取MaxCompute数据,能在大数据量场景下显著提高性能。 Table
现支持获取ColumnMaskInfo
,用于数据脱敏场景,方便相关信息的获取。- 新增通过
odps.getRestClient().setProxy(Proxy)
方法设置代理的支持。 - 实现了可迭代的
RecordReader
以及RecordReader.stream()
方法,允许将其转换为Record
对象的流。 - 在
TableAPI RestOptions
中新增upsertConcurrentNum
和upsertNetworkNum
参数,为使用TableAPI进行upsert操作的用户提供更细致的控制。 - 支持使用
Builder
模式来构建TableSchema
。 ArrayRecord
支持toString
方法。
变更
- 现在,当用户使用
StsAccount
但不传递StsToken
时,将被视作使用AliyunAccount
。
改进
UploadSession
现支持配置GET_BLOCK_ID
参数,当客户端不需要blockId
时,可以加速创建Session的速度。- 使用
builder
模式(TableCreator
)加强了表的创建方法,现在可以更简单地创建表了。
修复
- 修复了
Upsert Session
获取连接时,超时时间配置错误的问题。 - 修复了
TimestampWritable
在纳秒为负数时计算出错一秒的问题。
v0.47.0-public
Changelog
[0.47.0-public] - 2024-04-08
Added
- Support for new Stream type that enables incremental queries.
preview
method to theTableTunnel
for data preview purposes.OdpsRecordConverter
for parsing and formatting records.- Enhancements to the
Projects
class withcreate
anddelete
methods now available, andupdate
method made public. Operations related to thegroup-api
package are now marked as deprecated. - Improved
Schemas
class to support filtering schemas withSchemaFilter
, listing schemas, and retrieving detailed schema metadata. DownloadSession
introduces new parameterdisableModifiedCheck
to bypass modification checks andfetchBlockId
to skip block ID list retrieval.TableWriteSession
supports writingTIMESTAMP_NTZ
/JSON
types and adds a new parameterMaxFieldSize
.TABLE_API
addspredicate
related classes to support predicate pushdown in the future.
Changed
- The implementation of the
read
method in theTable
class is now replaced withTableTunnel.preview
, supporting new types in MaxCompute and time types switched to Java 8 time types without timezone. - The default
MapWritable
implementation switched fromHashMap
toLinkedHashMap
to ensure order. Column
class now supports creation using the Builder pattern.
Improved
TableReadSession
now introduces new parametersmaxBatchRawSize
andsplitMaxFileNum
.UpsertSession
enhancements:- Supports writing partial columns.
- Allows setting the number of Netty thread pools with the default changed to 1.
- Enables setting maximum concurrency with the default value changed to 16.
TableTunnel
now supports settingquotaName
option.
更新日志
[0.47.0-public] - 2024-04-08
新增
- 对 Stream 新类型的支持,可用于进行增量查询。
- 在
TableTunnel
中增加了preview
方法,用于数据预览。 - 引入
OdpsRecordConverter
,用于对 Record 进行解析和格式化。 Projects
类增加了create
(创建)和delete
(删除)方法,update
方法现已公开。group-api
包下的相关操作已被标记为弃用。Schemas
类增强,支持通过设置SchemaFilter
来过滤 schema,支持listSchema
以及获取 schema 的详细元信息。DownloadSession
新增参数disableModifiedCheck
,用于跳过修改检查。新增参数fetchBlockId
,用于跳过获取 block ID 列表。TableWriteSession
支持写入TIMESTAMP_NTZ
/JSON
类型,新增参数MaxFieldSize
。TABLE_API
新增predicate
相关类,用于后续支持谓词下推。
变更
Table
类的read
方法实现现已更换为TableTunnel.preview
方法,会支持 MaxCompute 新类型,时间类型切换为 Java 8 无时区类型。- 默认的
MapWritable
实现从HashMap
改为LinkedHashMap
,以确保有序。 Column
类现支持使用建造者模式(Builder pattern)进行创建。
改进
TableReadSession
新增参数maxBatchRawSize
和splitMaxFileNum
。UpsertSession
现支持:- 写入部分列。
- 设置 Netty 线程池的数量(默认更改为 1)。
- 设置最大并发量(默认值更改为 16)。
TableTunnel
支持设置quotaName
选项。
Important Correction Notice for Version 0.47.0
We have identified an issue with the recent release of version 0.47.0, where code intended for the upcoming release of version 0.48.0 was inadvertently included. We understand that this may have caused some confusion, and we sincerely apologize for any inconvenience this may have caused.
Impact Assessment
After assessing all the code, we can confirm that all changes are backward-compatible. Therefore, there is no need to worry about the current version negatively impacting usage. You can rest assured that it is safe to use.
Recommended Action
Considering that the released version 0.47.0 actually contains the content of version 0.48.0, we advise you to review the complete release notes for version 0.48.0 to prepare for the upcoming features and improvements. If you have any questions or require assistance during usage, please do not hesitate to contact us.
Acknowledgments and Appreciation
We deeply apologize for any inconvenience this matter may have caused and thank you for your understanding and support. The MaxCompute team is committed to earning your trust and we will continue working to ensure that such issues are avoided in future releases.
Should you encounter any issues or require further assistance, please visit our support page at help.aliyun.com, or raise an issue in our GitHub repository.
Thank you for your continued support.
Best Regards,
The MaxCompute Team
关于0.47.0版本重要更正通知
我们发现了最近发布的0.47.0版本中的一个问题,即原本计划在即将发布的0.48.0版本中包含的代码不慎被包含在了此次发布中。我们理解这可能会导致混淆,并且对由此可能造成的不便表示诚挚的歉意。
影响评估
我们已经评估了所有代码,确认所有变更均属于兼容性变更,因此用户不必担心当前版本会对使用产生负面影响,请您放心。
建议的行动
考虑到已经发布的0.47.0实际上包含了0.48.0版本的内容,我们建议您阅读完整的0.48.0版本发布说明,并为即将到来的特性与改进做好准备。如果您在使用过程中有任何疑问或需要帮助,请及时与我们取得联系。
致谢和感激
我们为可能给您带来的任何不便表示诚挚的歉意,并感谢您的理解与支持。MaxCompute团队值得您的信任,并会继续致力于确保未来的版本发布避免此类问题。
如果您遇到任何问题或需要更多帮助,请访问我们的支持页面 help.aliyun.com,或者在我们的GitHub仓库中提交问题。
感谢您的持续支持。
此致,
MaxCompute团队