-
Notifications
You must be signed in to change notification settings - Fork 156
Add some basic docs #895
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add some basic docs #895
Conversation
#[derive(Debug, Deserialize)] | ||
pub struct Config { | ||
pub keys: Keys, | ||
} | ||
|
||
/// Site context object that contains global data | ||
pub struct InputData { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might want to consider renaming this. InputData is descriptive enough to make me think it's something specific but not descriptive enough to know what that might be.
Considering it's sort of a collection of cached items we just want to keep around, something purposefully vague like SiteCtx
might be better.
It's also interesting that it's referred to as Db
elsewhere in the code (presumably because it has the pool
and cached index
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy to see this renamed -- I didn't ever come up with a nice name for it. I think SiteCtx is not a bad idea.
#[derive(Debug, Default, Deserialize)] | ||
pub struct Keys { | ||
/// GitHub API token from the `GITHUB_API_TOKEN` env variable | ||
pub github: Option<String>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These fields might have better names like github_api_token
and github_webhook_secret
.
@@ -434,11 +434,11 @@ pub struct LabelId(pub u8, pub u32); | |||
#[derive(Serialize, Deserialize, Copy, Clone, PartialEq, Eq, PartialOrd, Ord, Hash, Debug)] | |||
pub struct ArtifactIdNumber(pub u32); | |||
|
|||
/// Cached results of various queries. | |||
#[derive(Debug, Clone, PartialEq, Eq, Default)] | |||
pub struct Index { | |||
commits: Indexed<Commit>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to add comments to these fields, but I'm still not exactly sure what they are. For instance, commits seems to be from the artifact
table with type
master or try. I'm not exactly sure what that should be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you say more about your confusion? These fields are intended to be essentially duplicates of the various tables (artifacts, error_series, pstat_series, etc.) so that we can avoid a network round-trip in the case of Postgres when fetching the ID used in the "main" table.
To some extent I think the presence of these is mostly an artifact of not wanting to write too much SQL -- it's much easier to have ad-hoc searching in Rust code than threading through SQL queries. In practice IIRC it was also a little faster, but whether that's still true or if I'm remembering right I'm less sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My comment is a bit old as I know understand more about the structure. Perhaps we should do a short write up on the database structure. We could then refer to that here.
@@ -15,24 +15,33 @@ pub use self_profile::{QueryData, SelfProfile}; | |||
#[derive(Debug, Copy, Clone, PartialEq, PartialOrd, Deserialize)] | |||
pub struct DeltaTime(#[serde(with = "round_float")] pub f64); | |||
|
|||
/// The bound of a range changes in codebase |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bound
doesn't seem to always be a bound in the true sense. If it's a Commit
we're usually looking for exact matches. I'm not sure what Bound
is really supposed to be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is supposed to be "exact" in the sense of identifying a particular commit in our storage, at least at the time of access.
Commits hashes are easy, obviously, though the string in Commit(String) can also be the artifact ID (e.g., 1.53.0 or whatever the user provides on the command line when recording data locally).
pub fn left_match(&self, commit: &Commit) -> bool { | ||
let last_month = chrono::Utc::now().date().naive_utc() - chrono::Duration::days(30); | ||
match self { | ||
Bound::Commit(sha) => commit.sha == **sha, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, for example, we don't treat the Bound
as a bound but rather an exact match.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think the thing that might be missing in the intuition is that for the date bounds, we don't want an exact match because there may not be commits on that day, but you still want to have the query for e.g. the last month or year of data to work, not error out saying "no commits on 2021-01-01" or something like that.
With commits there's not really such a problem; it doesn't make sense to have a partial match there as much.
/// | ||
/// Searches in the indexed commits either from the right or left | ||
/// If not found in the indexed commits, searches through the artifacts cache | ||
pub fn data_for(data: &Index, is_left: bool, bound: Bound) -> Option<ArtifactId> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function was hard for me to grok at first. I mean it's simple to understand, but every time I came across it, I'd have to reread the entire body because the name didn't help me at all to remember what it does.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As always, happy to make any changes -- I've never had good experiences naming things in rustc-perf.
@@ -257,18 +259,30 @@ pub async fn compare_given_commits( | |||
})) | |||
} | |||
|
|||
/// Data associated with a specific date | |||
/// Data associated with a specific artifact | |||
#[derive(Debug, Clone, Serialize)] | |||
pub struct DateData { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably be renamed ArtifactData
as this is for a given artifact which may or may not have a date associated with it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems reasonable. It was named prior to the notion of artifacts -- and "CommitData" was already used for something else I think at that time...
#[derive(Debug, Clone, Serialize)] | ||
pub struct DateData { | ||
/// The artifact in question | ||
pub commit: String, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably be artifact: ArtifactId
since commit
is wrong (it might be an artifact "tag" and not a commit sha) and there's no need for it to be a string at this point.
/// For the given `ArtifactId`, consume the first datapoint in each of the given `SeriesResponse` | ||
/// | ||
/// It is assumed that the provided ArtifactId is the same as artifact id returned as the next data | ||
/// point from all of the series `SeriesResponse`s. If this is not true, this function will panic. | ||
async fn consume_one<'a, T>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I finally understand why this is called consume_one
but I don't really like the name....
/// The incremental cache state | ||
/// | ||
/// These are usually reported to users in a "flipped" way. For example, | ||
/// `Cache::Empty` means we're doing a "full" build. We present this to users as "full". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW it would be fine by me to rename the enum variants here -- they were named when we were trying different things but it's no big deal to rename if needed.
/// | ||
/// * $profile refers to the flavor of compilation like debug, doc, opt(timized), etc. | ||
/// * $cache refers to how much of the compilation must be done and how much is cached | ||
/// (e.g., "incr-unchanged" == compiling with full incremental cache and no code having changed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's plausible we could have non-String data here too. IIRC it's like this because making sure all the JS code is up to date when changing the Rust serialization was a bit of a pain - and Serde can be a bit picky in terms of serializing enums and struct nicely when they're nested inside vecs/hashmaps etc
Left comments on most of your comments - largely I think everything looks good in terms of existing docs, don't think I had any concerns. |
I'm going to merge this since there's no real concerns. I'll submit a small PR renaming things. I definitely feel much better about the code now. As far as documentation goes, I think the only other major thing I'd like to do is to document the database structure. Other changes can just be made along the way. |
Starting to add some docs to make onboarding into the codebase a bit easier.