Skip to content

Add some basic docs #895

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jul 9, 2021
Merged

Add some basic docs #895

merged 5 commits into from
Jul 9, 2021

Conversation

rylev
Copy link
Member

@rylev rylev commented Jul 7, 2021

Starting to add some docs to make onboarding into the codebase a bit easier.

@rylev rylev requested a review from Mark-Simulacrum July 7, 2021 16:02
#[derive(Debug, Deserialize)]
pub struct Config {
pub keys: Keys,
}

/// Site context object that contains global data
pub struct InputData {
Copy link
Member Author

@rylev rylev Jul 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might want to consider renaming this. InputData is descriptive enough to make me think it's something specific but not descriptive enough to know what that might be.

Considering it's sort of a collection of cached items we just want to keep around, something purposefully vague like SiteCtx might be better.

It's also interesting that it's referred to as Db elsewhere in the code (presumably because it has the pool and cached index.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to see this renamed -- I didn't ever come up with a nice name for it. I think SiteCtx is not a bad idea.

#[derive(Debug, Default, Deserialize)]
pub struct Keys {
/// GitHub API token from the `GITHUB_API_TOKEN` env variable
pub github: Option<String>,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These fields might have better names like github_api_token and github_webhook_secret.

@@ -434,11 +434,11 @@ pub struct LabelId(pub u8, pub u32);
#[derive(Serialize, Deserialize, Copy, Clone, PartialEq, Eq, PartialOrd, Ord, Hash, Debug)]
pub struct ArtifactIdNumber(pub u32);

/// Cached results of various queries.
#[derive(Debug, Clone, PartialEq, Eq, Default)]
pub struct Index {
commits: Indexed<Commit>,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to add comments to these fields, but I'm still not exactly sure what they are. For instance, commits seems to be from the artifact table with type master or try. I'm not exactly sure what that should be.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you say more about your confusion? These fields are intended to be essentially duplicates of the various tables (artifacts, error_series, pstat_series, etc.) so that we can avoid a network round-trip in the case of Postgres when fetching the ID used in the "main" table.

To some extent I think the presence of these is mostly an artifact of not wanting to write too much SQL -- it's much easier to have ad-hoc searching in Rust code than threading through SQL queries. In practice IIRC it was also a little faster, but whether that's still true or if I'm remembering right I'm less sure.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My comment is a bit old as I know understand more about the structure. Perhaps we should do a short write up on the database structure. We could then refer to that here.

@@ -15,24 +15,33 @@ pub use self_profile::{QueryData, SelfProfile};
#[derive(Debug, Copy, Clone, PartialEq, PartialOrd, Deserialize)]
pub struct DeltaTime(#[serde(with = "round_float")] pub f64);

/// The bound of a range changes in codebase
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bound doesn't seem to always be a bound in the true sense. If it's a Commit we're usually looking for exact matches. I'm not sure what Bound is really supposed to be.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is supposed to be "exact" in the sense of identifying a particular commit in our storage, at least at the time of access.

Commits hashes are easy, obviously, though the string in Commit(String) can also be the artifact ID (e.g., 1.53.0 or whatever the user provides on the command line when recording data locally).

pub fn left_match(&self, commit: &Commit) -> bool {
let last_month = chrono::Utc::now().date().naive_utc() - chrono::Duration::days(30);
match self {
Bound::Commit(sha) => commit.sha == **sha,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, for example, we don't treat the Bound as a bound but rather an exact match.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think the thing that might be missing in the intuition is that for the date bounds, we don't want an exact match because there may not be commits on that day, but you still want to have the query for e.g. the last month or year of data to work, not error out saying "no commits on 2021-01-01" or something like that.

With commits there's not really such a problem; it doesn't make sense to have a partial match there as much.

///
/// Searches in the indexed commits either from the right or left
/// If not found in the indexed commits, searches through the artifacts cache
pub fn data_for(data: &Index, is_left: bool, bound: Bound) -> Option<ArtifactId> {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function was hard for me to grok at first. I mean it's simple to understand, but every time I came across it, I'd have to reread the entire body because the name didn't help me at all to remember what it does.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As always, happy to make any changes -- I've never had good experiences naming things in rustc-perf.

@@ -257,18 +259,30 @@ pub async fn compare_given_commits(
}))
}

/// Data associated with a specific date
/// Data associated with a specific artifact
#[derive(Debug, Clone, Serialize)]
pub struct DateData {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably be renamed ArtifactData as this is for a given artifact which may or may not have a date associated with it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems reasonable. It was named prior to the notion of artifacts -- and "CommitData" was already used for something else I think at that time...

#[derive(Debug, Clone, Serialize)]
pub struct DateData {
/// The artifact in question
pub commit: String,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably be artifact: ArtifactId since commit is wrong (it might be an artifact "tag" and not a commit sha) and there's no need for it to be a string at this point.

/// For the given `ArtifactId`, consume the first datapoint in each of the given `SeriesResponse`
///
/// It is assumed that the provided ArtifactId is the same as artifact id returned as the next data
/// point from all of the series `SeriesResponse`s. If this is not true, this function will panic.
async fn consume_one<'a, T>(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I finally understand why this is called consume_one but I don't really like the name....

/// The incremental cache state
///
/// These are usually reported to users in a "flipped" way. For example,
/// `Cache::Empty` means we're doing a "full" build. We present this to users as "full".
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW it would be fine by me to rename the enum variants here -- they were named when we were trying different things but it's no big deal to rename if needed.

///
/// * $profile refers to the flavor of compilation like debug, doc, opt(timized), etc.
/// * $cache refers to how much of the compilation must be done and how much is cached
/// (e.g., "incr-unchanged" == compiling with full incremental cache and no code having changed)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's plausible we could have non-String data here too. IIRC it's like this because making sure all the JS code is up to date when changing the Rust serialization was a bit of a pain - and Serde can be a bit picky in terms of serializing enums and struct nicely when they're nested inside vecs/hashmaps etc

@Mark-Simulacrum
Copy link
Member

Left comments on most of your comments - largely I think everything looks good in terms of existing docs, don't think I had any concerns.

@rylev
Copy link
Member Author

rylev commented Jul 9, 2021

I'm going to merge this since there's no real concerns. I'll submit a small PR renaming things. I definitely feel much better about the code now. As far as documentation goes, I think the only other major thing I'd like to do is to document the database structure. Other changes can just be made along the way.

@rylev rylev merged commit 2132441 into rust-lang:master Jul 9, 2021
@rylev rylev deleted the add-docs1 branch July 9, 2021 08:14
@rylev rylev mentioned this pull request Jul 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants