Skip to content

Add some basic docs #895

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jul 9, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 17 additions & 7 deletions collector/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -15,24 +15,33 @@ pub use self_profile::{QueryData, SelfProfile};
#[derive(Debug, Copy, Clone, PartialEq, PartialOrd, Deserialize)]
pub struct DeltaTime(#[serde(with = "round_float")] pub f64);

/// The bound of a range changes in codebase
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bound doesn't seem to always be a bound in the true sense. If it's a Commit we're usually looking for exact matches. I'm not sure what Bound is really supposed to be.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is supposed to be "exact" in the sense of identifying a particular commit in our storage, at least at the time of access.

Commits hashes are easy, obviously, though the string in Commit(String) can also be the artifact ID (e.g., 1.53.0 or whatever the user provides on the command line when recording data locally).

///
/// This can either be the upper or lower bound
#[derive(Debug, Clone, PartialEq, Eq)]
pub enum Bound {
// sha, unverified
/// An unverified git commit (in sha form)
Commit(String),
/// A date in time
Date(NaiveDate),
/// No bound
None,
}

impl Bound {
/// Tests whether self bounds the commit to the left
pub fn left_match(&self, commit: &Commit) -> bool {
let last_month = chrono::Utc::now().date().naive_utc() - chrono::Duration::days(30);
match self {
Bound::Commit(sha) => commit.sha == **sha,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, for example, we don't treat the Bound as a bound but rather an exact match.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think the thing that might be missing in the intuition is that for the date bounds, we don't want an exact match because there may not be commits on that day, but you still want to have the query for e.g. the last month or year of data to work, not error out saying "no commits on 2021-01-01" or something like that.

With commits there's not really such a problem; it doesn't make sense to have a partial match there as much.

Bound::Date(date) => commit.date.0.naive_utc().date() >= *date,
Bound::None => last_month <= commit.date.0.naive_utc().date(),
Bound::None => {
let last_month = chrono::Utc::now().date().naive_utc() - chrono::Duration::days(30);
last_month <= commit.date.0.naive_utc().date()
}
}
}

/// Tests whether self bounds the commit to the right
pub fn right_match(&self, commit: &Commit) -> bool {
match self {
Bound::Commit(sha) => commit.sha == **sha,
Expand Down Expand Up @@ -148,7 +157,7 @@ pub fn run_command(cmd: &mut Command) -> anyhow::Result<()> {
pub fn robocopy(
from: &std::path::Path,
to: &std::path::Path,
extra_args: &[&dyn AsRef<std::ffi::OsStr>]
extra_args: &[&dyn AsRef<std::ffi::OsStr>],
) -> anyhow::Result<()> {
let mut cmd = Command::new("robocopy");
cmd.arg(from).arg(to).arg("/s").arg("/e");
Expand Down Expand Up @@ -219,7 +228,7 @@ pub fn command_output(cmd: &mut Command) -> anyhow::Result<process::Output> {
output.status,
String::from_utf8_lossy(&output.stderr),
String::from_utf8_lossy(&output.stdout)
));
));
}

Ok(output)
Expand All @@ -246,7 +255,8 @@ pub struct MasterCommit {
/// Note that this does not contain try commits today, so it should not be used
/// to validate hashes or expand them generally speaking. This may also change
/// in the future.
pub async fn master_commits() -> Result<Vec<MasterCommit>, Box<dyn std::error::Error + Sync + Send>> {
pub async fn master_commits() -> Result<Vec<MasterCommit>, Box<dyn std::error::Error + Sync + Send>>
{
let response = reqwest::get("https://triage.rust-lang.org/bors-commit-list").await?;
Ok(response.json().await?)
}
}
28 changes: 27 additions & 1 deletion database/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -189,13 +189,18 @@ impl Ord for Commit {
}
}

/// The compilation profile (i.e., how the crate was built)
#[derive(
Debug, Copy, Clone, PartialEq, Eq, Hash, PartialOrd, Ord, serde::Serialize, serde::Deserialize,
)]
pub enum Profile {
/// A checked build (i.e., no codegen)
Check,
/// A debug build (i.e., low optimizations)
Debug,
/// A doc build
Doc,
/// An optimized "release" build
Opt,
}

Expand Down Expand Up @@ -227,15 +232,23 @@ impl fmt::Display for Profile {
}
}

/// The incremental cache state
///
/// These are usually reported to users in a "flipped" way. For example,
/// `Cache::Empty` means we're doing a "full" build. We present this to users as "full".
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW it would be fine by me to rename the enum variants here -- they were named when we were trying different things but it's no big deal to rename if needed.

#[derive(Debug, Copy, Clone, PartialEq, Eq, Hash, serde::Serialize, serde::Deserialize)]
#[serde(tag = "variant", content = "name")]
pub enum Cache {
/// Empty cache (i.e., full build)
#[serde(rename = "full")]
Empty,
/// Empty cache but still incremental (i.e., a full incremental build)
#[serde(rename = "incr-full")]
IncrementalEmpty,
/// Cache is fully up-to-date (i.e., nothing has changed)
#[serde(rename = "incr-unchanged")]
IncrementalFresh,
/// Cache is mostly up-to-date but something has been changed
#[serde(rename = "incr-patched")]
IncrementalPatch(PatchName),
}
Expand Down Expand Up @@ -378,9 +391,12 @@ pub enum Label {
Query(QueryLabel),
}

/// An identifier for a built version of the compiler
#[derive(Deserialize, Serialize, Clone, Debug, PartialEq, Eq, Hash, PartialOrd, Ord)]
pub enum ArtifactId {
/// A built version of the compiler at an exact commit
Commit(Commit),
/// A symbolic tag for a built compiler like "1.51.0"
Artifact(String),
}

Expand Down Expand Up @@ -434,16 +450,26 @@ pub struct LabelId(pub u8, pub u32);
#[derive(Serialize, Deserialize, Copy, Clone, PartialEq, Eq, PartialOrd, Ord, Hash, Debug)]
pub struct ArtifactIdNumber(pub u32);

/// Id lookups for various things
///
/// This is a quick way to find what the database id for something
#[derive(Debug, Clone, PartialEq, Eq, Default)]
pub struct Index {
/// Id look for a commit
commits: Indexed<Commit>,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to add comments to these fields, but I'm still not exactly sure what they are. For instance, commits seems to be from the artifact table with type master or try. I'm not exactly sure what that should be.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you say more about your confusion? These fields are intended to be essentially duplicates of the various tables (artifacts, error_series, pstat_series, etc.) so that we can avoid a network round-trip in the case of Postgres when fetching the ID used in the "main" table.

To some extent I think the presence of these is mostly an artifact of not wanting to write too much SQL -- it's much easier to have ad-hoc searching in Rust code than threading through SQL queries. In practice IIRC it was also a little faster, but whether that's still true or if I'm remembering right I'm less sure.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My comment is a bit old as I know understand more about the structure. Perhaps we should do a short write up on the database structure. We could then refer to that here.

/// Id lookup of the errors for a crate
artifacts: Indexed<Box<str>>,

/// Id lookup of the errors for a crate
errors: Indexed<Crate>,
/// Id lookup of a given process stastic profile
pstats: Indexed<(Crate, Profile, Cache, ProcessStatistic)>,
/// Id lookup of a given process query label
queries: Indexed<(Crate, Profile, Cache, QueryLabel)>,
}

/// An index lookup
///
/// Given a `T` find what its database id is
#[derive(Debug, Clone, Serialize, Deserialize)]
struct Indexed<T> {
#[serde(with = "index_serde")]
Expand Down
23 changes: 19 additions & 4 deletions site/src/comparison.rs
Original file line number Diff line number Diff line change
Expand Up @@ -237,14 +237,17 @@ pub async fn compare_given_commits(
Some(b) => b,
None => return Ok(None),
};
let cids = Arc::new(vec![a.clone().into(), b.clone().into()]);
let cids = Arc::new(vec![a.clone(), b.clone()]);

// get all crates, cache, and profile combinations for the given stat
let query = selector::Query::new()
.set::<String>(Tag::Crate, selector::Selector::All)
.set::<String>(Tag::Cache, selector::Selector::All)
.set::<String>(Tag::Profile, selector::Selector::All)
.set(Tag::ProcessStatistic, selector::Selector::One(stat.clone()));

// `responses` contains a series iterators. The first element in the iterator is the data
// for `a` and the second is the data for `b`
let mut responses = data.query::<Option<f64>>(query, cids).await?;

let conn = data.conn().await;
Expand All @@ -257,18 +260,30 @@ pub async fn compare_given_commits(
}))
}

/// Data associated with a specific date
/// Data associated with a specific artifact
#[derive(Debug, Clone, Serialize)]
pub struct DateData {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably be renamed ArtifactData as this is for a given artifact which may or may not have a date associated with it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems reasonable. It was named prior to the notion of artifacts -- and "CommitData" was already used for something else I think at that time...

/// The artifact in question
pub commit: String,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably be artifact: ArtifactId since commit is wrong (it might be an artifact "tag" and not a commit sha) and there's no need for it to be a string at this point.

/// The date of the artifact if known
pub date: Option<Date>,
/// The pr of the artifact if known
pub pr: Option<u32>,
pub commit: String,
/// Benchmark data in the form "$crate-$profile" -> Vec<("$cache", nanoseconds)>
///
/// * $profile refers to the flavor of compilation like debug, doc, opt(timized), etc.
/// * $cache refers to how much of the compilation must be done and how much is cached
/// (e.g., "incr-unchanged" == compiling with full incremental cache and no code having changed)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's plausible we could have non-String data here too. IIRC it's like this because making sure all the JS code is up to date when changing the Rust serialization was a bit of a pain - and Serde can be a bit picky in terms of serializing enums and struct nicely when they're nested inside vecs/hashmaps etc

pub data: HashMap<String, Vec<(String, f64)>>,
// crate -> nanoseconds
/// Bootstrap data in the form "$crate" -> nanoseconds
pub bootstrap: HashMap<String, u64>,
}

impl DateData {
/// For the given `ArtifactId`, consume the first datapoint in each of the given `SeriesResponse`
///
/// It is assumed that the provided ArtifactId is the same as artifact id returned as the next data
/// point from all of the series `SeriesResponse`s. If this is not true, this function will panic.
async fn consume_one<'a, T>(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I finally understand why this is called consume_one but I don't really like the name....

conn: &dyn database::Connection,
commit: ArtifactId,
Expand Down
13 changes: 11 additions & 2 deletions site/src/load.rs
Original file line number Diff line number Diff line change
Expand Up @@ -64,23 +64,32 @@ impl TryCommit {
}
}

/// Keys for accessing various services
///
/// At the moment only used for accessing GitHub
#[derive(Debug, Default, Deserialize)]
pub struct Keys {
/// GitHub API token from the `GITHUB_API_TOKEN` env variable
pub github: Option<String>,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These fields might have better names like github_api_token and github_webhook_secret.

/// GitHub webhook secret from the `GITHUB_WEBHOOK_SECRET` env variable
pub secret: Option<String>,
}

/// Site configuration
#[derive(Debug, Deserialize)]
pub struct Config {
pub keys: Keys,
}

/// Site context object that contains global data
pub struct InputData {
Copy link
Member Author

@rylev rylev Jul 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might want to consider renaming this. InputData is descriptive enough to make me think it's something specific but not descriptive enough to know what that might be.

Considering it's sort of a collection of cached items we just want to keep around, something purposefully vague like SiteCtx might be better.

It's also interesting that it's referred to as Db elsewhere in the code (presumably because it has the pool and cached index.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to see this renamed -- I didn't ever come up with a nice name for it. I think SiteCtx is not a bad idea.

/// Site configuration
pub config: Config,

/// Cached site landing page
pub landing_page: ArcSwap<Option<Arc<crate::api::graph::Response>>>,

/// Index of various common queries
pub index: ArcSwap<crate::db::Index>,
/// Database connection pool
pub pool: Pool,
}

Expand Down
12 changes: 8 additions & 4 deletions site/src/selector.rs
Original file line number Diff line number Diff line change
Expand Up @@ -33,22 +33,26 @@ use std::fmt;
use std::ops::RangeInclusive;
use std::sync::Arc;

pub fn data_for(data: &Index, is_left: bool, query: Bound) -> Option<ArtifactId> {
/// Finds the most appropriate `ArtifactId` for a given bound.
///
/// Searches the commits in the index either from the left or the right.
/// If not found in those commits, searches through the artifacts in the index.
pub fn data_for(data: &Index, is_left: bool, bound: Bound) -> Option<ArtifactId> {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function was hard for me to grok at first. I mean it's simple to understand, but every time I came across it, I'd have to reread the entire body because the name didn't help me at all to remember what it does.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As always, happy to make any changes -- I've never had good experiences naming things in rustc-perf.

let commits = data.commits();
let commit = if is_left {
commits
.iter()
.find(|commit| query.left_match(commit))
.find(|commit| bound.left_match(commit))
.cloned()
} else {
commits
.iter()
.rfind(|commit| query.left_match(commit))
.rfind(|commit| bound.left_match(commit))
.cloned()
};
commit.map(|c| ArtifactId::Commit(c)).or_else(|| {
data.artifacts()
.find(|aid| match &query {
.find(|aid| match &bound {
Bound::Commit(c) => *c == **aid,
Bound::Date(_) => false,
Bound::None => false,
Expand Down