-
Notifications
You must be signed in to change notification settings - Fork 46
fix: search only for valid attachment links in Confluence page #535
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
def _find_hyperlink_tags(self, html_soup: "BeautifulSoup") -> list["Tag"]: | ||
"""Find hyperlink tags in the HTML. | ||
|
||
Overwrite this method to customize the tag search. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively I'm thinking this method could be extended instead of overwritten, such that in subclass you'd do the following - however this means you can't use a customized query which you might want to.
tags = super()._find_hyperlink_tags(html_soup)
return [tag for tag in tags if subclass_predicate(tag)]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Modify the
HtmlMixin
to allow for customized hyperlink tag search in subclasses using the mixin by overwriting a dedicated method.Overwrite said method in the
ConfluenceDownloaderConfig
to target only embedded attachment files when running withextract_files
flag.