CXX-3097 update sitemaps and patch mongocxx-3.11.0 pages with redirects #1239
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
Resolves CXX-3097. See also #1238.
This is a followup to #1209:
This PR along with #1238 implements this thorough solution.
Note
Some iteration and followup changes may be required after examining actual search engine results following the deployment of these changes.
Sitemaps and Sitemap Indexes
The search engine quality of API doc pages is currently very poor:
The cached
/current
page is labeled asmongocxx-3.6.3
with subpages referencing even older versions (3.0.1, 3.1.2, etc.). This suggests that search engines have not indexed the API doc pages in a long time (likely August 15, 2016 per the current sitemap's<lastmod>
values), nor with any appreciable throughness (API doc pages are not referenced at all by the sitemap). This issue may be directly attributed to the long-outdated sitemap.xml.Per Google:
Doxygen 1.9.7 implemented a
SITEMAP_URL
configuration option to generate a sitemap describing the generated API doc pages, which was enabled in #1209 and used during generation of themongocxx-3.11.0
API doc pages. This PR finally utilizes the generated sitemap by introducing a sitemap index file:Although size limits appear to be the primary motivation behind sitemap index files, this permits a very simple and straightforward integration method for Doxgyen-generated sitemap files. See updated release instructions in #1238. These changes should enable search engines to finally crawl the C++ Driver's API doc pages and return up-to-date and relevant results for C++ Driver library interfaces.
Redirection and Canonical URLs
Due to the use of a symlink for
/current
(which is expected to be the most often used URL path when referencing or initially navigating to the API doc pages) as well as the Github Pages static site structure, finding a satisfying way to inform both users and search engines that/current
pages are actually aliases to/mongocxx-<version>
equivalents was a challenge. Hugo (used for page generation) does not appear to support a convenient method to implement redirects for subpages. Doxygen certainly does not support such behavior either in its configuration options. Therefore, a new patch script is introduced instead (in #1238) to directly modify the HTML pages with redirect routines.Redirection is implemented using the window.location.replace() pattern. This pattern was chosen because it does not generate browser navigation history for the
/current
page prior to redirect, which avoids the perpetual "Go to Last Page -> Redirected Right Back" problem. The redirect is guarded by a conditional check to avoid perpetual redirection.A "canonical element" is included alongside the redirection script:
This (alongside the redirection script itself) indicates to search engines the "canonical URL" of all
/current
pages:This should ensure search engines understand the
/current
pages are "aliases" for their versioned URL equivalents and prefer the versioned URL for indexing purposes.These changes has a very nice benefit: when a new release updates the
/current
symlink, the old API doc pages do not need to be updated at all. The new API doc pages (which also contain the redirect routines) will automatically work with their new/current
status, while the old API doc pages will continue to be navigatable-to via the versioned URL path (which search engines will remember as being canonical, rather than their old/current
aliases). The update to the sitemap index entry's<lastmod>
field for/current
pages should also trigger (re-)indexing of the API doc pages by search engines according to the updated symlink and the new canonical URLs, thus permitting up-to-date search engine results following a new release while preserving the "stability" of old API doc page indexes.Legacy Pages
The legacy doc pages are given a priority of 0.0 (default is 0.5) to discourage ranking them above the current API doc pages, which are given a priority of 1.0. The obsoleted
/categories
and/tags
are given stub pages which redirect users to the front page.