-
Notifications
You must be signed in to change notification settings - Fork 51
Ensure that data_p_
is typed with the underlying type of data_
#267
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
b3c65cc
to
84ff462
Compare
struct get_underlying_type { | ||
using type = T; | ||
}; | ||
} // namespace traits |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The traits
namespace may be overkill, I'm not sure.
I have seen this idea used elsewhere though. One of those places is Rcpp:
https://github.com/RcppCore/Rcpp/blob/master/inst/include/Rcpp/traits/storage_type.h
LOGICAL_GET_REGION(data_->data_, pos, length_, reinterpret_cast<int*>(buf_.data())); | ||
LOGICAL_GET_REGION(data_->data_, pos, length_, buf_.data()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
buf_
now has type int*
here already, so no need to cast it
return reinterpret_cast<r_bool*>(LOGICAL(data)); | ||
return LOGICAL(data); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was the motivating example. get_p()
now has a return type of int*
for logicals, which is exactly what LOGICAL()
returns, so no need for the cast
return buf_[pos_ - block_start_]; | ||
return static_cast<T>(buf_[pos_ - block_start_]); | ||
} else { | ||
return data_->data_p_[pos_]; | ||
return static_cast<T>(data_->data_p_[pos_]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably don't need these, the implicit casts to T
will kick in as needed (for T = r_bool
and T = uint8_t
) but it feels like good practice to be explicit?
return is_altrep_ ? static_cast<r_bool>(LOGICAL_ELT(data_, pos)) : data_p_[pos]; | ||
return is_altrep_ ? LOGICAL_ELT(data_, pos) : data_p_[pos]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LOGICAL_ELT()
and data_p_[]
now return the same type (an int
). So in theory the right thing to do would be to wrap the whole ternary condition in static_cast<r_bool>
, but I'm relying on the implicit cast to r_bool
instead - this seems to be what we do elsewhere.
I am not sure what best practice is here, rely on implicit casts to the return value, or be explicit with static_cast<r_bool>
? I think I'd rather see the explicit cast, so there is no guessing involved. I'll probably do a follow up PR to try this out everywhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apparently it is a Core Guideline to avoid implicit conversions where possible http://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#c164-avoid-implicit-conversion-operators
I resonate strongly with this! However, I will leave this as is for now since we use implicit casts in many cases already.
It would be even more apparent that we should be using an explicit static_cast<r_bool>
if we had made the "r_bool
from int
" constructor explicit
, i.e. explicit r_bool(int value)
in r_bool.hpp
, which would have forced us to use a static cast here. I feel like this would be better practice, maybe we should look into this. (Note that I did this for explicit r_complex(Rcomplex value)
in the complex PR and really like the safety of it)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that explicit is better.
Previously, a `reinterpret_cast()` was used before assigning to `data_` so that: - For raws, `RAW()` was reinterpreted from `Rbyte* -> uint8_t*` - For logicals, `LOGICAL()` was reinterpreted from `int* -> r_bool*`
84ff462
to
d6b750b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. We can look into the explicit stuff in a follow up PR.
This PR attempts to force
data_p_
to always be a pointer to the "underlying" type ofSEXP data_
.Currently, this is not the case.
data_p_
currently has typeT*
, which means that for various vectors it maps to:int*
double*
r_bool*
(rather thanint*
)uint8_t*
(rather thanRbyte*
)SEXP*
r_string*
(rather thanSEXP*
)The
logicals
andraws
cases make me a little nervous. Whendata_p_
is set for these classes, it is done so like:data_p_ = reinterpret_cast<r_bool*>(LOGICAL(data))
I think we can avoid this by instead ensuring that
data_p_
has typeint*
here, and is just set toLOGICAL(data)
. Then we rely on casts fromr_bool
toint
and vice versa when setting and getting elements.I think it definitely cleaned up
logicals.hpp
andraws.hpp
nicely, as some intermediatestatic_cast()
andreinterpret_cast()
calls were removed.I'm also not entirely sure how this
reinterpret_cast()
is currently working, but I assume it works becauser_bool
just has 1 variable member,int value_
, which is the same size as anint
that we'd get from theint*
? Nevertheless, this new approach feels safer and easier to understand to me.The general idea of this PR is to introduce a compile time "underlying" type that we can deduce from
T
. The default is justT
, which works forint
anddouble
, but this is overridden forr_bool
anduint8_t
.We then use this underlying type for the buffer (
buf_
) and for the pointer type.This PR should have other benefits too. I am working on adding complexes, and I think I will use an
r_complex
intermediate type which will also define a custom underlying type, like:Additionally, for read-only lists and character vectors, we could set the
data_p_
toDATAPTR()
, like we do in vctrs https://github.com/r-lib/vctrs/blob/7260d31a31b87ece16c7bbda3521e91d1c0f242f/src/vctrs-core.h#L97. This could provide much faster access, since we don't have to go throughVECTOR_ELT()
, and should be safe for read-only access. That would require theget_underlying_type<r_string>
I've added here (it currently goes unused).