Skip to content

Archived Data Files

Carl Raiden Worley edited this page Jan 2, 2020 · 4 revisions

When debugging failed tests, it may be useful to examine the mongod data files that were archived after the failed test run. This page documents the process of archiving and things that may be worth considering when using archived data files.

Configuring Resmoke to Archive Data Files

When resmoke executes a test, it only archives the data files for the mongod fixtures if the test fails and the test suite is configured to do so. Configuration for each suite can be found in buildscripts/resmokeconfig/suites. A suite can be configured to archive tests in the archive heading underneath the executor options. There are configuration options for hooks or js file tests, and either can be configured to archive for all tests or for individual test files. For example, the configuration of the core suite only archives data files when the ValidateCollections test hook fails: 

core_archive_configuration

The concurrency_replication suite, on the other hand, archives data files when there is a failure in the CheckReplDBHashInBackround hook, ValidateCollectionsInBackground hook, CheckReplDBHash hook, ValidateCollections hook, or any js test:

concurrency_replication_archive_configuration

Note that tests may be a boolean or a list of tests.

Using Archived Data Files

When data files are archived and stored to S3, they are kept for 180 days, and may be downloaded from the relevant Evergreen task page. If there is no data file listed, then either there was no test failure, the test wasn't configured to archive data files, or there was an error in archiving. (If the latter, please contact STM.) Example of a task with archived data files:

archived_task_page

The data files are in a tarball and named after the failed test or hook that triggered archiving. After downloading and untarring, they can be used by starting a mongod with the dbpath set to the data file directory. Make sure the mongod version used is exactly the same as the version in the commit with a failed test; using a different version may cause invariants to fail at startup.  In most cases, just bringing up standalone mongod instances for each collected data file makes sense, because that allows the data files to be examined as they were at each node when the failure was recorded. If the original test used a replica set fixture, creating a local replica set may allow data to propagate between nodes and cause the data files to be changed from their state at failure time.

That said, if you do want to intialize a local replica set based on archived data files, make sure to use the same port and replica set name as in the original invocation (which can be seen in the logs). Otherwise, each node will enter the REMOVED state and will need to be reconfigured.

Data Archival and Core Dumps

When resmoke detects that a test has failed, it dynamically generates a new FixtureKillTestCase for immediate execution. This test case sends a SIGKILL to each running mongod. A failure in the killing, such as if the process isn't currently running, will be recorded as a test case failure so the logs can be examined.  Afterwards, resmoke executes a FixtureSetupTestCase so that further tests can be executed. Please keep in mind that some duration may pass between when a test failure is recorded and when the SIGKILL is sent to the mongods, and writes to disk may occur in this time, so it is possible that data files don't perfectly represent the state at test failure. If a test failed because of a crash or an assert.soon() failure, there may be a core dump available which can be used to view more recent data than had been written to disk by WiredTiger.

Clone this wiki locally