Skip to content

Intel® Extension for OpenXLA* 0.6.0

Latest
Compare
Choose a tag to compare
@vsanghavi vsanghavi released this 08 Apr 02:40
85c05dd

Major Features

Intel® Extension for OpenXLA* is an Intel optimized PyPI package to extend official OpenXLA framework on Intel GPUs. Built on PJRT plugin mechanism, it enables seamless execution of JAX models on Intel® Data Center GPU Max Series and Intel® Arc™ B-Series Graphics Series.

This release contains following major features:

JAX Upgrade:

Toolkit & Driver Support:

Library & Compatibility Enhancements

  • oneDNN v3.7 support added.
  • Supports Python versions: 3.10, 3.11, 3.12, 3.13.

Known Caveats

  • Flan T5 and Gemma models have a dependency on Tensorflow-Text, which doesn't support Python 3.13.
  • Multi-process API support is still experimental and may cause hang issues with collectives.
  • The following JAX unit tests (UTs) must be skipped when using Intel Extension for OpenXLA:
    • Mock GPU Tests: mock_gpu_test & mock_gpu_topology_test (Sycl device not supported)
    • Pallas Tests: gpu_ops_test, pallas_shape_poly_test, pallas_vmap_test (Pallas calls are not currently supported for sycl backend)
    • Multi-process GPU Test: multiprocess_gpu_test (multi-process environments are currently not supported for sycl backend)
    • Profiling Tests: pgle_test (Sycl device not supported in TensorFlow profiling APIs)
    • FFI Tests (JAXPR to MLIR lowering rule is presently missing for sycl backend)
    • BCOOTest failure: A UT in the test file sparse_bcoo_bcsr_test.py (test_bcoo_mul_sparse5) fails with rolling driver version 2507.12 due to a known issue.

Bugs & Performance Fixes

  • GPT-2 Causal Language Modeling (CLM) failure due to a bug in backward pass of FlashAttention has been fixed.
  • Fixed OpenXLA build failure on the 2025.1.0.426 compiler
  • Gemma-7B real-time inference performance regression (caused by disabling region analysis for copy insertion) is fixed by setting the flag XLA_FLAGS=--xla_gpu_copy_insertion_use_region_analysis=true
  • Flan-T5 inference performance regression (~8%) observed with oneAPI Base Toolkit 2025.0.1 has been resolved.
  • Accuracy drop in GPT-J model has been fixed.

Deprecations

  • JAX v0.4.30 is no longer supported.
    • Refer to the JAX change log for migration steps.
    • If your application requires JAX v0.4.30, downgrade the Intel Extension for OpenXLA version to v0.5.0.
  • Intel® Data Center GPU Flex Series is no longer supported

Documentation