Skip to content

Minimal viable merge command #821

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Feb 17, 2024
Merged

Conversation

d-ronnqvist
Copy link
Contributor

@d-ronnqvist d-ronnqvist commented Feb 9, 2024

Bug/issue #, if applicable: rdar://114730477

Summary

This adds a first version of a docc merge command that combines documentation archives for different modules into a "combined" documentation archive.

Two notable features that are missing in this initial version are:

  • Merging the LMDB navigator index
  • Landing pages (either authored or synthesized) for the combined archive.

However, those can be added incrementally. The merge command in its current form is enough to already be usable and bring value. For example, it allows us to host our documentation for SwiftDocC and for SwiftDocCUtilities from the same repository without breaking their navigator sidebar.

Dependencies

None

Testing

  • Build documentation for two or more different targets.

  • Run docc merge /path/to/First.doccarchive /path/to/Second.doccarchive --output-path /path/to/Something.doccarchive

  • Use a local web server to host "Something.doccarchive".

    • It should be possible to browse both localhost:PORT/documentation/first and localhost:PORT/documentation/second
      • Each target displays its own assets (if assets had the same name in both)
      • Each target displays a navigator sidebar

Checklist

Make sure you check off the following items. If they cannot be completed, provide a reason.

  • Added tests
  • Ran the ./bin/test script and it succeeded
  • Updated documentation if necessary

@d-ronnqvist d-ronnqvist changed the title Merge command Minimal viable merge command Feb 9, 2024
Copy link
Contributor

@patshaughnessy patshaughnessy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a great start! I think we should be a bit more careful with validations...

  1. Can we use a temporary staging area to create the merged result in, before creating the output directory if there are no errors? This would avoid the possibility of leaving a partially merged, invalid archive in the case there are errors.

  2. Also we should not be deleting existing directories under any circumstances. If the users of the merge command wants to delete an existing output directory, possibly because they are repeatedly merging archives over and over again for some reason, then they should use rm -rf as needed in the terminal or from a shell script. Docc should not do this for them.

  3. Finally, let's run a validation on the paths present inside the provided archives: It's possible that two or more archives could have the same page paths for some reason. Or maybe the user provides a repeated archive path by accident on the command line. This should be a validation error I think. We don't want to be overwriting pages in an undetermined manner.

mutating func validate() throws {
let fileManager = Docc.Merge._fileManager

guard !archives.isEmpty else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to merge just one archive? Shouldn't we also raise a validation error when archives.count == 1 ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, merging a single archive is supported. Doing so will inactivate all the external links in that archive.


// MARK: - Inputs & outputs

@OptionGroup(title: "Inputs & outputs")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a standard idiom for using ParsableArguments ? What's the reasoning behind having this option group? Why not simply place the properties directly on Merge and get rid of InputAndOutputOptions ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Defining option groups customizes how the help text for the command looks. It barely matters for a small command like this but for a command with many more options like docc convert it makes a big difference. I created an "Inputs & output" groups because two other commands already have one.

return ActionResult(didEncounterError: true, outputs: [])
}

try? fileManager.removeItem(at: outputURL)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably we should raise a validation error in Merge if the output directory already exists. Otherwise, the user might delete an entire directory structure by accident if they have a typo or other mistake in their command line.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in f98be84 with a new testErrorWhenOutputDirectoryIsNotEmpty test

}

try? fileManager.removeItem(at: outputURL)
try fileManager.copyItem(at: firstArchive, to: outputURL)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be a good idea to create a temporary directory here, and use that to build up the result, and then copy/move it to the final location when and if the entire merge operation is successful. Otherwise, the user would end up with a partially merged but invalid result if there was some error or exception during the merge processing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in f98be84

}
let renderIndex = try JSONDecoder().decode(RenderIndex.self, from: jsonIndexData)

try combinedJSONIndex.merge(renderIndex)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we be sure that none of the archives will have overlapping paths? Should we validate this before merging the indexes like this? And before copying around all of the files above?

Probably we should have a "pre-flight" check... meaning we iterate over all the files to be merged, to be sure none of them overlap or conflict. If they do, we can report an error before actually copying any of them.

Or alternatively, as mentioned in an earlier comment, if we use a temporary directory as a staging area, then we could abort the merge if we find any conflicting paths, before creating the output path and writing anything into it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in f98be84 with a new testErrorWhenArchivesContainOverlappingData test

Raise an error if the archives to merge has overlapping data

Use a temporary directory while merging the archives
@d-ronnqvist
Copy link
Contributor Author

@swift-ci please test

Copy link
Contributor

@patshaughnessy patshaughnessy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work thanks for adding all of that extra validation!

var archivesByTopLevelTutorialDirectory = ArchivesByDirectoryName()

// Gather all the top level /data/documentation and /data/tutorials directories to ensure that the different archives don't have overlapping data
for archive in archives {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we avoid hard-coding the list of prefixes like this? (e.g. "documentation" and "tutorials") This will lead to bugs someday if we ever add more prefixes. Maybe just search for the contents of the entire "data" directory? Or do that but exclude some known files we want to exclude?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I did it this way because I only wanted to check the if more than one archive had the same top-level module or technology. It's not intended to be supported to merge multiple archives that contain the same modules and/or technologies so checking the top-level directories is sufficient.

I updated the to code do two nested loops to gather all the unique module names and technology names inside the data directory's direct subdirectories.

@d-ronnqvist
Copy link
Contributor Author

@swift-ci please test

@d-ronnqvist
Copy link
Contributor Author

@swift-ci please test

creating intermediate directories as necessary
@patshaughnessy
Copy link
Contributor

@swift-ci please test

@patshaughnessy patshaughnessy merged commit 491d26a into swiftlang:main Feb 17, 2024
d-ronnqvist added a commit to d-ronnqvist/swift-docc that referenced this pull request Feb 19, 2024
* Move assets into subdirectory to avoid collisions when merging archives

* Add basic merge command to combine documentation archives

rdar://114730477

* Publish docs for SwiftDocC and SwiftDocCUtilities as a combined archive

* Pass flags to generate source links for published framework docs

* Raise an error if the merge output directory already exists

Raise an error if the archives to merge has overlapping data

Use a temporary directory while merging the archives

* Consider all (future) subdirectories in the data directory

* Fix to docc merge command: Copy data/documentation and data/tutorials,
creating intermediate directories as necessary

---------

Co-authored-by: Pat Shaughnessy <[email protected]>
@d-ronnqvist d-ronnqvist deleted the merge-command branch February 19, 2024 08:20
xedin added a commit that referenced this pull request Feb 19, 2024
@dreampiggy
Copy link

Finally. Good work

@AndrewVorobev
Copy link

#1131
Is it possible to make somehow navigation through combined modules?

@d-ronnqvist
Copy link
Contributor Author

Is it possible to make somehow navigation through combined modules?

Yes. I replied to the new issue that you filed.

@dreampiggy
Copy link

Does this PR, support the rendering web page (maybe this is another tool, not swift-docc core) cross references of symbols?

For example:

  • ModuleA:
public class A {}
  • ModuleB
public class B : A{}

Seems currently the simple docc merge A.doccarchive B.doccarchive -o output.doccarchive, and setup the preview at output.doccarchive/documentation, the class B and A symbol are highlighted, but can not has hyper link cross the ModuleA to ModuleB 😂

image

@d-ronnqvist
Copy link
Contributor Author

Note: this is a one year old closed pull request.

Merging archives into a combined archives is an orthogonal feature to linking between archives. If the two archives were built with cross target links, the merged archive would display cross target links. If two archives without cross target links are merged, the combined output also won't have cross target links.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants