Major Features
Intel® Extension for OpenXLA* is an Intel optimized PyPI package to extend official OpenXLA framework on Intel GPUs. Built on PJRT plugin mechanism, it enables seamless execution of JAX models on Intel® Data Center GPU Max Series and Intel® Arc™ B-Series Graphics Series.
This release contains following major features:
JAX Upgrade:
- Upgraded JAX to v0.4.38, ensuring compatibility between jax and jaxlib.
- Preliminary support has been added for Intel® Arc™ B-Series Graphics Series.
- For details on JAX and jaxlib versioning, refer to: How are jax and jaxlib versioned.
intel-extension-for-openxla jaxlib jax 0.6.0 0.4.38 0.4.38
Toolkit & Driver Support:
- Intel® Deep Learning Essentials 2025.1 support added.
- Upgraded driver: Supports LTS release 2350.136
Library & Compatibility Enhancements
- oneDNN v3.7 support added.
- Supports Python versions: 3.10, 3.11, 3.12, 3.13.
Known Caveats
- Flan T5 and Gemma models have a dependency on Tensorflow-Text, which doesn't support Python 3.13.
- Multi-process API support is still experimental and may cause hang issues with collectives.
- The following JAX unit tests (UTs) must be skipped when using Intel Extension for OpenXLA:
- Mock GPU Tests:
mock_gpu_test
&mock_gpu_topology_test
(Sycl device not supported) - Pallas Tests:
gpu_ops_test
,pallas_shape_poly_test
,pallas_vmap_test
(Pallas calls are not currently supported for sycl backend) - Multi-process GPU Test:
multiprocess_gpu_test
(multi-process environments are currently not supported for sycl backend) - Profiling Tests:
pgle_test
(Sycl device not supported in TensorFlow profiling APIs) - FFI Tests (JAXPR to MLIR lowering rule is presently missing for sycl backend)
- BCOOTest failure: A UT in the test file
sparse_bcoo_bcsr_test.py
(test_bcoo_mul_sparse5
) fails with rolling driver version 2507.12 due to a known issue.
- Mock GPU Tests:
Bugs & Performance Fixes
- GPT-2 Causal Language Modeling (CLM) failure due to a bug in backward pass of FlashAttention has been fixed.
- Fixed OpenXLA build failure on the 2025.1.0.426 compiler
- Gemma-7B real-time inference performance regression (caused by disabling region analysis for copy insertion) is fixed by setting the flag
XLA_FLAGS=--xla_gpu_copy_insertion_use_region_analysis=true
- Flan-T5 inference performance regression (~8%) observed with oneAPI Base Toolkit 2025.0.1 has been resolved.
- Accuracy drop in GPT-J model has been fixed.
Deprecations
- JAX v0.4.30 is no longer supported.
- Refer to the JAX change log for migration steps.
- If your application requires JAX v0.4.30, downgrade the Intel Extension for OpenXLA version to v0.5.0.
- Intel® Data Center GPU Flex Series is no longer supported
Documentation
- Introduction to Intel® Extension for OpenXLA*
- Accelerating JAX models on Intel GPUs via PJRT
- How JAX and OpenXLA Enabled an Argonne Workload and Quality Assurance on Aurora Supercomputer
- JAX and OpenXLA - Part 1: Execution Process & Underlying Logic
- JAX and OpenXLA - Part 2: Execution Process & Underlying Logic
- How are jax and jaxlib versioned?