-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Unable to read docx containing pictures linking to internal bookmarks: KeyError: "There is no item named 'word/#MyBookmark' in the archive" #902
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I'm inclined to think the quick fix (if you're patching your fork) is to add code to if self.target_ref and self.target_ref.startswith("#"):
return RTM.EXTERNAL If you want to study the spec, perhaps there is some mention there or clarification of the criteria for internal/external and a more robust long-term behavior could be proposed. I suppose an alternative is to add code that distinguishes "internal" hyperlinks from internal "parts" and only loads internal part references. But that sounds like a bigger deal that would invoke a broader scope. If I was going to do that I would rework the whole Now that I'm thinking about it, maybe just a try-except-continue around the part loading bit would be a better alternative, since that would also address the NULL target issue. Maybe that could just wrap that |
#1350 not yet merged |
Thank you @scanny your fix : it has solved my issue. I am waiting for a new version. |
I also faced a similar issue and I tried the change of the existing PR and it works. Please, it would be very useful if the proposed PR could get merged. |
Possible temporary fixes, in your requirements.txt instead of python-docx==1.1.2 or similar, use instead: OPTION A: Thank you al-rahul OPTION B: This last one is a merge into the current master branch as per the latest version available (including solved conflicts) |
Yes, it's interesting @TommasoPetrolito Maybe @scanny can unblock this situation as regards this very old bug (5 yrs). |
Uh oh!
There was an error while loading. Please reload this page.
The document below contains a picture with a hyperlink to an internal bookmark.
PictureBookmarks.docx (The very last picture links to the very first Heading1)
I get this error message when reading the file using python-docx:
KeyError: "There is no item named 'word/#MyBookmark' in the archive"
Stack trace
How to recreate
This is achieved by:
Then the hyperlink ends up like this, notice the
a:hlinkClick
relationship ID:Now, in
word/_rels/document.xml.rels
, we get:This item bugs python-docx for me. I'll admit I'm using a 2.5-year-old version of the package, since I needed to modify stuff for my own usecase, so I am not sure whether this has been fixed after that. I was looking for whether this had been solved somehow, and it seems it is very much related to this issue.
Investigation
I see in the
pkgreader
that thetarget_mode
can be used to identify external targets, and that external targets receive special treatment to avoid such zipfile issues. External targets are recognized in the relationship file for e.g. hyperlinks to web sites, and add aTarget
attribute to the<Relationship>
object.From what I gather,
RT.HYPERLINK
elements that have aTarget
starting with#
should be treated specially - like some sort of internal bookmark relationship (or similar).The text was updated successfully, but these errors were encountered: