Skip to content

Implement SYCL kernels in noncentral_chisquare #1054

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

LukichevaPolina
Copy link
Contributor

No description provided.

Copy link

@samir-nasibli samir-nasibli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this an optimization? If so, how is this proven? Is there any measurement data?

kernel_parallel_for_func2);
};
event_out = DPNP_QUEUE.submit(kernel_func2);
event_out.wait();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How many kernels are you submitting here?

@LukichevaPolina
Copy link
Contributor Author

Is this an optimization? If so, how is this proven? Is there any measurement data?

The purpose of these changes is not optimization, but to implement the execution on the device to prevent copying data to the host.

@samir-nasibli
Copy link

Is this an optimization? If so, how is this proven? Is there any measurement data?

The purpose of these changes is not optimization, but to implement the execution on the device to prevent copying data to the host.

Is it really copying data to the host? how was it found?

Even if this is not an optimization, and if we assume that copying occurs here, do these changes really do not degrade the execution time?

@Alexander-Makaryev
Copy link
Contributor

Is this an optimization? If so, how is this proven? Is there any measurement data?

The purpose of these changes is not optimization, but to implement the execution on the device to prevent copying data to the host.

Is it really copying data to the host? how was it found?

Even if this is not an optimization, and if we assume that copying occurs here, do these changes really do not degrade the execution time?

DPNPC_ptr_adapter<_DataType> result1_ptr(result, size, true, true); copy happens here because we explicitly provide true value for the third parameter.
We have to do the copy while we have a 'host' code that works with the pointer.

@samir-nasibli
Copy link

Is this an optimization? If so, how is this proven? Is there any measurement data?

The purpose of these changes is not optimization, but to implement the execution on the device to prevent copying data to the host.

Is it really copying data to the host? how was it found?
Even if this is not an optimization, and if we assume that copying occurs here, do these changes really do not degrade the execution time?

DPNPC_ptr_adapter<_DataType> result1_ptr(result, size, true, true); copy happens here because we explicitly provide true value for the third parameter. We have to do the copy while we have a 'host' code that works with the pointer.

Pay attention please, here a lot of kernels are added to the queue.
Is there any confirmation/proven data that these changes, even taking into account the copies, do not degrade the execution time?

Copy link
Contributor

@Alexander-Makaryev Alexander-Makaryev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@Alexander-Makaryev Alexander-Makaryev merged commit 0c7de14 into IntelPython:master Dec 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants