Release Intel® Extension for OpenXLA* 0.6.0 · intel/intel-extension-for-openxla

Major Features

Intel® Extension for OpenXLA* is an Intel optimized PyPI package to extend official OpenXLA framework on Intel GPUs. Built on PJRT plugin mechanism, it enables seamless execution of JAX models on Intel® Data Center GPU Max Series and Intel® Arc™ B-Series Graphics Series.

This release contains following major features:

JAX Upgrade:

Upgraded JAX to v0.4.38, ensuring compatibility between jax and jaxlib.
Preliminary support has been added for Intel® Arc™ B-Series Graphics Series.
For details on JAX and jaxlib versioning, refer to: How are jax and jaxlib versioned.

intel-extension-for-openxla jaxlib jax

0.6.0 0.4.38 0.4.38

intel-extension-for-openxla	jaxlib	jax
0.6.0	0.4.38	0.4.38

Toolkit & Driver Support:

Library & Compatibility Enhancements

Known Caveats

Flan T5 and Gemma models have a dependency on Tensorflow-Text, which doesn't support Python 3.13.
Multi-process API support is still experimental and may cause hang issues with collectives.
The following JAX unit tests (UTs) must be skipped when using Intel Extension for OpenXLA:
- Mock GPU Tests: mock_gpu_test & mock_gpu_topology_test (Sycl device not supported)
- Pallas Tests: gpu_ops_test, pallas_shape_poly_test, pallas_vmap_test (Pallas calls are not currently supported for sycl backend)
- Multi-process GPU Test: multiprocess_gpu_test (multi-process environments are currently not supported for sycl backend)
- Profiling Tests: pgle_test (Sycl device not supported in TensorFlow profiling APIs)
- FFI Tests (JAXPR to MLIR lowering rule is presently missing for sycl backend)
- BCOOTest failure: A UT in the test file sparse_bcoo_bcsr_test.py (test_bcoo_mul_sparse5) fails with rolling driver version 2507.12 due to a known issue.

GPT-2 Causal Language Modeling (CLM) failure due to a bug in backward pass of FlashAttention has been fixed.
Fixed OpenXLA build failure on the 2025.1.0.426 compiler
Gemma-7B real-time inference performance regression (caused by disabling region analysis for copy insertion) is fixed by setting the flag XLA_FLAGS=--xla_gpu_copy_insertion_use_region_analysis=true
Flan-T5 inference performance regression (~8%) observed with oneAPI Base Toolkit 2025.0.1 has been resolved.
Accuracy drop in GPT-J model has been fixed.

JAX v0.4.30 is no longer supported.
- Refer to the JAX change log for migration steps.
- If your application requires JAX v0.4.30, downgrade the Intel Extension for OpenXLA version to v0.5.0.
Intel® Data Center GPU Flex Series is no longer supported