Skip to content

[refactor] Move checkpoint saving into trainer #4034

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Jun 4, 2020

Conversation

ervteng
Copy link
Contributor

@ervteng ervteng commented May 28, 2020

Proposed change(s)

This PR removes the --save-freq CLI option and replaces it with a checkpoint_interval option in the YAML. This option is now specified in trainer steps and not global steps, and can be different per-trainer. It also slightly refactors the summary writing logic to decrease repeated code.

Types of change(s)

  • Bug fix
  • New feature
  • Code refactor
  • Breaking change
  • Documentation update
  • Other (please describe)

Checklist

  • Added tests that prove my fix is effective or that my feature works
  • Updated the changelog (if applicable)
  • Updated the documentation (if applicable)
  • Updated the migration guide (if applicable)

Other comments

@ervteng ervteng requested a review from andrewcoh May 28, 2020 18:20
@ervteng ervteng marked this pull request as ready for review May 28, 2020 18:20
@ervteng
Copy link
Contributor Author

ervteng commented May 28, 2020

@andrewcoh will this break the saving in the ghost trainer?

@ervteng ervteng merged commit 8b54e2e into master Jun 4, 2020
@delete-merged-branch delete-merged-branch bot deleted the develop-trainersavingmodel branch June 4, 2020 00:17
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jun 4, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants