Skip to content

[SYCL][UR][L0] Implement urAdapterGetLastError #10784

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

jandres742
Copy link
Contributor

No description provided.

@jandres742 jandres742 requested a review from a team as a code owner August 11, 2023 04:40
Comment on lines +183 to +184
*Error = ErrorMessageCode;
return UR_RESULT_SUCCESS;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this how urAdapterGetLastError is defined to always return success?
In a matching PI API we need to return error if SYCL RT needs to abort, or success if this is rather a warning. Is this happening?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

urAdapterGetLastError doesnt return an error depending on the message. It may return an error if the call to urAdapterGetLastError fails, but not because of the error it is interpreting.

See here https://oneapi-src.github.io/unified-runtime/core/api.html?highlight=uradaptergetlasterror#uradaptergetlasterror

Example usage:

if (::urQueueCreate(hContext, hDevice, nullptr, &hQueue) ==
        ::UR_RESULT_ERROR_ADAPTER_SPECIFIC) {
    const char* pMessage;
    int32_t error;
    ::urAdapterGetLastError(hAdapter, &pMessage, &error);
}

Another call may have returned UR_RESULT_ERROR_ADAPTER_SPECIFIC. Then application (in this case SYCL RT) would call urAdapterGetLastError to get the description of the message for the call that previously failed (*Message) and the error code (*Error).

This is the same implementation we have in other adapters, for instance CU:

UR_APIEXPORT ur_result_t UR_APICALL urAdapterGetLastError(

It is up to the application to parse this info and determining the level of severity.

@callumfare , @kbenzie : is my understanding correct?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original idea was that adapters that raise the error decide on the severity (not the SYCL RT), see here

/// \return PI_SUCCESS if plugin is indicating non-fatal warning. Any other

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@smaslov-intel : so something like this?

UR_APIEXPORT ur_result_t UR_APICALL urAdapterGetLastError(
    ur_adapter_handle_t Adapter, ///< [in] handle of the platform instance
    const char **Message, ///< [out] pointer to a C string where the adapter
                          ///< specific error message will be stored.
    int32_t *Error ///< [out] pointer to an integer where the adapter specific
                   ///< error code will be stored.
) {
  *Message = ErrorMessage;
  *Error = ErrorMessageCode;
  if(static_cast<ur_result_t>(ErrorMessageCode) != UR_RESULT_SUCCESS) {
    return UR_RESULT_ERROR_ADAPTER_SPECIFIC;
  }
  return UR_RESULT_SUCCESS;
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps just return ErrorMessageCode?
Then I don't see why we need the Error argument.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Error argument is intended for returning the adapter specific error code, in this case a ze_result_t.

The ur_result_t returned from urAdapterGetLastError should be UR_RESULT_SUCCESS if the error is non-fatal/warning, and UR_RESULT_ERROR_ADAPTER_SPECIFIC if it is an unrecoverable/hard error.

Currently Error isn't hooked up to piPluginGetLastError because it doesn't have an equivelent argument. This should be resolved as part of the work to replace PI with UR in the SYCL RT.

As for what to do with ErrorMessageCode, if depends how its being set when an adapter specific error occurs. If its being set to either UR_RESULT_SUCCESS or UR_RESULT_ERROR_ADAPTER_SPECIFIC it should be returned from urAdapterGetLastError. If it contains a ze_result_t then it should be used to set *Error.

Here's a recent change to the CUDA adapter that's addressing a similar issue https://github.com/intel/llvm/pull/10626/files#diff-1efcd510b7db74f7166a4a06ccd048867c04911e8809ef7c39b9821aa66583e5

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discovered a while ago that the implementation of piPluginGetLastError in pi2ur was incorrect, and the cuda adapter was doing the wrong thing - it was fixed in #10626

To match PI the return code is used to signify whether the result is treated as a warning (UR_RESULT_SUCCESS) or an error (any other return code).

The purpose of the Error argument is to return native error codes (e.g. CUresult) from the adapter, but the cuda adapter doesn't implement this yet.

@jandres742
Copy link
Contributor Author

Being addressed in #10953. Closing.

@jandres742 jandres742 closed this Sep 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants