Skip to content

inserting xml-snippet into docx using the python-docx api #55

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ghost opened this issue May 22, 2014 · 20 comments
Closed

inserting xml-snippet into docx using the python-docx api #55

ghost opened this issue May 22, 2014 · 20 comments

Comments

@ghost
Copy link

ghost commented May 22, 2014

We need to change header text-orientation of tables. We are aware that this may not be possible with the current state of the API. We identify the xml-snippet to be inserted using opc-diag as suggested elsewhere. Can we use xml-snippet insertion to achieve this? If yes what is the API-command to do the xml-insertion at a specific point of the docx?
-- sub

@scanny
Copy link
Contributor

scanny commented May 22, 2014

Hi Sub, can you give me an idea what the snippet you need to insert would look like? There are a couple different possible approaches. The best choice probably depends mostly on the size of the snippet, but if you could provide some example XML, like we want to insert something like: this in the document roughly: here. That would help me see what's likely to be the best way.

Also, if you could mention how you plan to identify the right place, that would be a help, like what object you'll have a reference to as a starting point.

@ghost
Copy link
Author

ghost commented May 22, 2014

Hi Scanny, we would need such a feature to use it as a workaround for the following four requirements if not available in a simpler way with the API:

  • header text orientation in tables
  • straddling of columns and/or rows in tables
  • mid-document layout changes between portrait & landscape modes (and vice-versa)
  • table cell background colour (text-background colour apparently does not fill the cell) (very low priority)

We are still studying how to specify the location and also how to insert the snippet.
If you can provide some hints to the above.
— sub

On 22 May 2014, at 10:46, scanny [email protected] wrote:

Hi Sub, can you give me an idea what the snippet you need to insert would look like? There are a couple different possible approaches. The best choice probably depends mostly on the size of the snippet, but if you could provide some example XML, like we want to insert something like: this in the document roughly: here. That would help me see what's likely to be the best way.

Also, if you could mention how you plan to identify the right place, that would be a help, like what object you'll have a reference to as a starting point.


Reply to this email directly or view it on GitHub.

@scanny
Copy link
Contributor

scanny commented May 22, 2014

These would be four separate jobs, each requiring its own method, although the same approach would probably work for all four.

In general I'd say these are small enough that inserting the elements one at a time using the lxml API would be the best approach.

A good approach is to do a before and after diff using opc-diag (http://opc-diag.readthedocs.org/). You create the simplest possible file containing the "before" situation, like a 2x2 table with regular orientation headers, perhaps saved as "before.docx". Then you make the change you want using Word and save it again as "after.docx". Then you diff the two files with opc-diag with something like:

$ opc diff-item before.docx after.docx document.xml

This should narrow right down what XML changes need to be made.

If you can send me one of those diffs for, say, header text orientation, I can give you an example code snippet to get you going.

@ghost
Copy link
Author

ghost commented May 22, 2014

as requested for table-header-orientation: I am sending you only the opc-diag diffs found for document.xml and ignoring diffs found for core.xml , app.xml & settings.xml.

— sub

@@ -21,21 +21,27 @@

     <w:gridCol w:w="4893"/>
     <w:gridCol w:w="4893"/>
   </w:tblGrid>
  •  <w:tr w:rsidR="001C3170" w:rsidTr="001C3170">
    
  •  <w:tr w:rsidR="001C3170" w:rsidTr="00DD5E16">
     <w:trPr>
    
  •      <w:cantSplit/>
       <w:trHeight w:val="1450"/>
     /w:trPr
     <w:tc>
       <w:tcPr>
         <w:tcW w:w="4893" w:type="dxa"/>
    
  •        <w:textDirection w:val="btLr"/>
       /w:tcPr
    
  •      <w:p w:rsidR="001C3170" w:rsidRDefault="001C3170"/>
    
  •      <w:p w:rsidR="001C3170" w:rsidRDefault="001C3170" w:rsidP="002E1BE0">
    
  •        <w:pPr>
    
  •          <w:ind w:left="113" w:right="113"/>
    
  •        /w:pPr
    
  •      /w:p
     /w:tc
     <w:tc>
       <w:tcPr>
         <w:tcW w:w="4893" w:type="dxa"/>
       /w:tcPr
    
  •      <w:p w:rsidR="001C3170" w:rsidRDefault="001C3170"/>
    
  •      <w:p w:rsidR="001C3170" w:rsidRDefault="001C3170" w:rsidP="00DD5E16"/>
     /w:tc
    
    /w:tr
    <w:tr w:rsidR="001C3170" w:rsidTr="001C3170">

@@ -56,7 +62,7 @@

     </w:tc>
   </w:tr>
 </w:tbl>
  • <w:p w:rsidR="00000000" w:rsidRDefault="001C3170"/>
  • <w:p w:rsidR="00000000" w:rsidRDefault="00DD5E16"/>
    <w:sectPr w:rsidR="00000000">
    <w:pgSz w:w="12240" w:h="15840"/>
    <w:pgMar w:top="1440" w:right="1440" w:bottom="1440" w:left="1440" w:header="720" w:footer="720" w:gutter="0"/>

@scanny
Copy link
Contributor

scanny commented May 22, 2014

Ok, good, this narrows it right down.

First thing is we can ignore all the changes to w:rsid.. attributes, those are part of the revision tracking mechanism and just indicate this is a new revision.

The operative changes are:

  • adding a <w:cantSplit> element to the row properties
  • adding a <w:textDirection> element to the cell properties
  • adding indentation to the paragraph properties

I give an example here of the textDirection element since that seems to be the key one:

from docx.oxml.shared import OxmlElement, qn

def set_vert_cell_direction(cell):
    tc = cell._tc
    tcPr = tc.tcPr
    textDirection = OxmlElement('w:textDirection')
    textDirection.set(qn('w:val'), 'btLr')
    tcPr.append(textDirection)

cell._tc is the internal reference to the docx.oxml.table.CT_Tc instance containing the <w:tc> element. tc.tcPr is its child tcPr element. OxmlElement creates a new element from a tagname and the set method on it sets an attribute. append() on an element adds another element as the last child.

Does that give you enough to go on?

@ghost
Copy link
Author

ghost commented May 23, 2014

trying it out...
On 22 May 2014, at 22:18, scanny [email protected] wrote:

Ok, good, this narrows it right down.

First thing is we can ignore all the changes to w:rsid.. attributes, those are part of the revision tracking mechanism and just indicate this is a new revision.

The operative changes are:

adding a <w:cantSplit> element to the row properties
adding a <w:textDirection> element to the cell properties
adding indentation to the paragraph properties
I give an example here of the textDirection element since that seems to be the key one:

from docx.oxml.shared import OxmlElement, qn

def set_vert_cell_direction(cell):
tc = cell._tc
tcPr = tc.tcPr
textDirection = OxmlElement('w:textDirection')
textDirection.set(qn('w:val'), 'btLr')
tcPr.append(textDirection)
cell._tc is the internal reference to the docx.oxml.table.CT_Tc instance containing the <w:tc> element. tc.tcPr is its child tcPr element. OxmlElement creates a new element from a tagname and the set method on it sets an attribute. append() on an element adds another element as the last child.

Does that give you enough to go on?


Reply to this email directly or view it on GitHub.

@ghost
Copy link
Author

ghost commented May 23, 2014

i have been able to achieve using your hints:

  • change table header text orientation
  • straddling multiple rows within table
  • setting specific table cell background colour

however i face some challenges for:
a) straddling multiple columns within table
b) mid-document page orientation
c) yet to write code to set header row-height

we would appreciate your help to points a) and b)
— sub

Begin forwarded message:

From: SubRegi [email protected]
Subject: Re: [python-docx] inserting xml-snippet into docx using the python-docx api (#55)
Date: 23 May 2014 09:30:10 GMT+5:30
To: python-openxml/python-docx [email protected]
Cc: python-openxml/python-docx [email protected]

trying it out...
On 22 May 2014, at 22:18, scanny [email protected] wrote:

Ok, good, this narrows it right down.

First thing is we can ignore all the changes to w:rsid.. attributes, those are part of the revision tracking mechanism and just indicate this is a new revision.

The operative changes are:

adding a <w:cantSplit> element to the row properties
adding a <w:textDirection> element to the cell properties
adding indentation to the paragraph properties
I give an example here of the textDirection element since that seems to be the key one:

from docx.oxml.shared import OxmlElement, qn

def set_vert_cell_direction(cell):
tc = cell._tc
tcPr = tc.tcPr
textDirection = OxmlElement('w:textDirection')
textDirection.set(qn('w:val'), 'btLr')
tcPr.append(textDirection)
cell._tc is the internal reference to the docx.oxml.table.CT_Tc instance containing the <w:tc> element. tc.tcPr is its child tcPr element. OxmlElement creates a new element from a tagname and the set method on it sets an attribute. append() on an element adds another element as the last child.

Does that give you enough to go on?


Reply to this email directly or view it on GitHub.

@scanny
Copy link
Contributor

scanny commented May 23, 2014

Pick one and post the diff and I'll see what I can offer in the way of advice :)

@ghost
Copy link
Author

ghost commented May 24, 2014

unable to find a workaround for b) mid-document page orientation change, thanks
— sub

On 24 May 2014, at 00:38, scanny [email protected] wrote:

Pick one and post the diff and I'll see what I can offer in the way of advice :)


Reply to this email directly or view it on GitHub.

@scanny
Copy link
Contributor

scanny commented May 24, 2014

I need to see the diff out of opc-diag like you sent for the first one.

@ghost
Copy link
Author

ghost commented May 24, 2014

oops! here they are:

landscape2portrait:

@@ -22,8 +22,8 @@

   </w:pPr>
 </w:p>
 <w:p w:rsidR="00936EEB" w:rsidRDefault="00936EEB"/>
  • <w:sectPr w:rsidR="00936EEB" w:rsidSect="008312B3">
  •  <w:pgSz w:w="15840" w:h="12240" w:orient="landscape"/>
    
  • <w:sectPr w:rsidR="00936EEB" w:rsidSect="006C03E9">
  •  <w:pgSz w:w="12240" w:h="15840"/>
    
    <w:pgMar w:top="1440" w:right="1440" w:bottom="1440" w:left="1440" w:header="720" w:footer="720" w:gutter="0"/>
    <w:cols w:space="720"/>
    <w:docGrid w:linePitch="360"/>

portrait2landscape:

@@ -20,9 +20,13 @@

       <w:docGrid w:linePitch="360"/>
     </w:sectPr>
   </w:pPr>
  •  <w:proofErr w:type="spellStart"/>
    
  •  <w:proofErr w:type="gramStart"/>
    

    <w:r>
    <w:t>asd/w:t
    /w:r

  •  <w:proofErr w:type="spellEnd"/>
    
  •  <w:proofErr w:type="gramEnd"/>
    

    /w:p
    <w:p w:rsidR="00936EEB" w:rsidRDefault="00152D42">
    <w:r>
    @@ -30,8 +34,8 @@

     <w:t>asd</w:t>
    

    /w:r
    /w:p

  • <w:sectPr w:rsidR="00936EEB" w:rsidSect="004D104E">

  •  <w:pgSz w:w="12240" w:h="15840"/>
    
  • <w:sectPr w:rsidR="00936EEB" w:rsidSect="003F2429">

  •  <w:pgSz w:w="15840" w:h="12240" w:orient="landscape"/>
    

    <w:pgMar w:top="1440" w:right="1440" w:bottom="1440" w:left="1440" w:header="720" w:footer="720" w:gutter="0"/>
    <w:cols w:space="720"/>
    <w:docGrid w:linePitch="360"/>

On 24 May 2014, at 08:58, scanny [email protected] wrote:

I need to see the diff out of opc-diag like you sent for the first one.


Reply to this email directly or view it on GitHub.

@scanny
Copy link
Contributor

scanny commented May 24, 2014

You can ignore the w:rsid.. attributes, they're part of the revision tracking scheme. You can also ignore the proofErr bits, those are the red squiggly lines under spelling errors. The w:pgSz element is the one you want.

If there is only one section throughout the document, e.g. it is all portrait or all landscape, then you'll find the w:sectPr parent element as the last child of <w:document><w:body>. If there are other section breaks, they are in the pPr element of the last paragraph in the section.

You can access the "sentinel" <w:sectPr> element using the following internals if you like:

sectPr = document._document_part._element.body._sentinel_sectPr

There is a little more on sections here:
http://python-docx.readthedocs.org/en/latest/dev/analysis/features/sections.html

@ghost
Copy link
Author

ghost commented May 25, 2014

thanks, will try…

On 25 May 2014, at 01:32, scanny [email protected] wrote:

You can ignore the w:rsid.. attributes, they're part of the revision tracking scheme. You can also ignore the proofErr bits, those are the red squiggly lines under spelling errors. The w:pgSz element is the one you want.

If there is only one section throughout the document, e.g. it is all portrait or all landscape, then you'll find the w:sectPr parent element as the last child of <w:document><w:body>. If there are other section breaks, they are in the pPr element of the last paragraph in the section.

You can access the "sentinel" <w:sectPr> element using the following internals if you like:

sectPr = document.document_part._element.body._sentinel_sectPr
There is a little more on sections here:
http://python-docx.readthedocs.org/en/latest/dev/analysis/features/sections.html


Reply to this email directly or view it on GitHub.

@ghost
Copy link
Author

ghost commented May 26, 2014

found the solution using your tips and thanks a lot
from previous wish-list, still lost with column straddling, much appreciate your help.
— sub

@@ -23,33 +23,22 @@

         <w:gridCol w:w="2371"/>
         <w:gridCol w:w="2371"/>
       </w:tblGrid>
-      <w:tr w:rsidR="007C1D62" w:rsidTr="007C1D62">
+      <w:tr w:rsidR="006B42ED" w:rsidTr="00633BDB">
         <w:trPr>
           <w:trHeight w:val="1015"/>
         </w:trPr>
         <w:tc>
           <w:tcPr>
-            <w:tcW w:w="2371" w:type="dxa"/>
+            <w:tcW w:w="7113" w:type="dxa"/>
+            <w:gridSpan w:val="3"/>
           </w:tcPr>
-          <w:p w:rsidR="007C1D62" w:rsidRDefault="007C1D62"/>
+          <w:p w:rsidR="006B42ED" w:rsidRDefault="006B42ED"/>
         </w:tc>
         <w:tc>
           <w:tcPr>
             <w:tcW w:w="2371" w:type="dxa"/>
           </w:tcPr>
-          <w:p w:rsidR="007C1D62" w:rsidRDefault="007C1D62"/>
-        </w:tc>
-        <w:tc>
-          <w:tcPr>
-            <w:tcW w:w="2371" w:type="dxa"/>
-          </w:tcPr>
-          <w:p w:rsidR="007C1D62" w:rsidRDefault="007C1D62"/>
-        </w:tc>
-        <w:tc>
-          <w:tcPr>
-            <w:tcW w:w="2371" w:type="dxa"/>
-          </w:tcPr>
-          <w:p w:rsidR="007C1D62" w:rsidRDefault="007C1D62"/>
+          <w:p w:rsidR="006B42ED" w:rsidRDefault="006B42ED"/>
         </w:tc>
       </w:tr>
       <w:tr w:rsidR="007C1D62" w:rsidTr="007C1D62">
@@ -111,7 +100,7 @@

         </w:tc>
       </w:tr>
     </w:tbl>
-    <w:p w:rsidR="00000000" w:rsidRDefault="007C1D62"/>
+    <w:p w:rsidR="00000000" w:rsidRDefault="006B42ED"/>
     <w:sectPr w:rsidR="00000000">
       <w:pgSz w:w="12240" w:h="15840"/>
       <w:pgMar w:top="1440" w:right="1440" w:bottom="1440" w:left="1440" w:header="720" w:footer="720" w:gutter="0"/>

On 25 May 2014, at 01:32, scanny [email protected] wrote:

You can ignore the w:rsid.. attributes, they're part of the revision tracking scheme. You can also ignore the proofErr bits, those are the red squiggly lines under spelling errors. The w:pgSz element is the one you want.

If there is only one section throughout the document, e.g. it is all portrait or all landscape, then you'll find the w:sectPr parent element as the last child of <w:document><w:body>. If there are other section breaks, they are in the pPr element of the last paragraph in the section.

You can access the "sentinel" <w:sectPr> element using the following internals if you like:

sectPr = document.document_part._element.body._sentinel_sectPr
There is a little more on sections here:
http://python-docx.readthedocs.org/en/latest/dev/analysis/features/sections.html


Reply to this email directly or view it on GitHub.

@trampas
Copy link

trampas commented May 28, 2014

I was trying to change text direction on a cell using the code above:
def set_vert_cell_direction(cell):
tc = cell._tc
print "TC is "
print tc
tcPr = tc.tcPr
print "TcPr is "
print tcPr
textDirection = OxmlElement('w:textDirection')
textDirection.set(qn('w:val'), 'btLr')
tcPr.append(textDirection)

I get the error:
TC is
<Element {http://schemas.openxmlformats.org/wordprocessingml/2006/main}tc at 0x4ece030>
TcPr is
None
Traceback (most recent call last):
File "C:\Projects\Metrology\iPERL\python\testing\wr7_csv_parse.py", line 454, in
AddTablesToDoc(files,document)
File "C:\Projects\Metrology\iPERL\python\testing\wr7_csv_parse.py", line 333, in AddTablesToDoc
set_vert_cell_direction(row_cells[0])
File "C:\Projects\Metrology\iPERL\python\testing\wr7_csv_parse.py", line 25, in set_vert_cell_direction
tcPr.append(textDirection)
AttributeError: 'NoneType' object has no attribute 'append'

Am I missing something?

@scanny
Copy link
Contributor

scanny commented May 29, 2014

@trampas the tcPr (table cell properties) child element is optional. When it's not present, tc.tcPr returns None. If you get None, you'll need to add a tcPr element before you can add a <w:textDirection> child to it.

Most of the relevant XML schema definitions are here:
http://python-docx.readthedocs.org/en/latest/dev/analysis/features/table.html

You can get it in the right place with something like this:

...
tcPr = tc.tcPr
if tcPr is None:
    tcPr = OxmlElement('w:tcPr')
    tc.insert(0, tcPr)
...

Note that the ordering of child elements within tcPr is significant, so just appending a textDirection element might cause a "repair-step" error on document load if there's already one or more child elements within the tcPr element.

@scanny
Copy link
Contributor

scanny commented May 29, 2014

@subregi I think you got yours working, right? If not feel free to reopen, closing for now.

@scanny scanny closed this as completed May 29, 2014
@trampas
Copy link

trampas commented May 29, 2014

Thank you!

I got it working!

I also made ability to merge cells and bold text in a cell. My next task is
changing column widths and font colors.

Thanks
Trampas

On Thu, May 29, 2014 at 12:16 AM, scanny [email protected] wrote:

@trampas https://github.com/trampas the tcPr (table cell properties)
child element is optional. When it's not present, tc.tcPr returns None. If
you get None, you'll need to add a tcPr element before you can add a
<w:textDirection> child to it.

Most of the relevant XML schema definitions are here:

http://python-docx.readthedocs.org/en/latest/dev/analysis/features/table.html

You can get it in the right place with something like this:

...tcPr = tc.tcPrif tcPr is None:
tcPr = OxmlElement('w:tcPr')
tc.insert(0, tcPr)...

Note that the ordering of child elements within tcPr is significant, so
just appending a textDirection element might cause a "repair-step" error on
document load if there's already one or more child elements within the tcPr
element.


Reply to this email directly or view it on GitHub
#55 (comment)
.

@scanny
Copy link
Contributor

scanny commented May 29, 2014

Glad to hear it Trampas :)

@trampas
Copy link

trampas commented Mar 26, 2015

I have just upgraded python-docx and the following code no longer works.

   hdr_cells = table.rows[0].cells
    print hdr_cells
    hdr_cells[0].text = title
    set_merge(hdr_cells[0],nCols)
    for ki in range(nCols-1):
            if len(hdr_cells)>1:
                n=hdr_cells.__getitem__(1)._tc
                if (n!=None):
                    hdr_cells._tr.remove(n)

Specifically I get the following error:
hdr_cells._tr.remove(n)
AttributeError: 'tuple' object has no attribute '_tr'

I use the _tr for accessing XML for the row in several places and was
wondering if there was a better way with new code?

For example I set the row height like:
def set_row_height(row, height):
tr = row._tr
trPr = tr.find(qn('w:trPr'));
if trPr==None:
x=OxmlElement('w:trPr')
tr.append(x);
trPr = tr.find(qn('w:trPr'));
textDirection = OxmlElement('w:trHeight')
textDirection.set(qn('w:val'), str(height))
trPr.append(textDirection)

Thanks
Trampas

On Thu, May 29, 2014 at 3:13 PM, scanny [email protected] wrote:

Glad to hear it Trampas :)


Reply to this email directly or view it on GitHub
#55 (comment)
.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants