Skip to content

How to return the heading number #590

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
tfitzhardinge opened this issue Dec 31, 2018 · 14 comments
Open

How to return the heading number #590

tfitzhardinge opened this issue Dec 31, 2018 · 14 comments

Comments

@tfitzhardinge
Copy link

Hi

I have managed to return the heading text in a given filename (thank you to the great experts at Stack Overflow) however I cannot return the heading number. Refer to code below. Is there a way to print/return the heading number (list numbering value not heading level).

Thanks

import docx

doc=docx.Document('filename.docx')

def iter_heading(paragraphs):
    for paragraph in paragraphs:
        if paragraph.style.name('Heading 1'):
            yield paragraph

for heading in iter_heading(doc.paragraphs):
    print(heading.text)
@lilyzhaochina
Copy link

up

@ppebay
Copy link

ppebay commented Apr 23, 2019

This is loosely related but is the closest issue I could find to my question: how can the heading number(s) be made visible? Currently,
document.add_heading(heading_title, 1)
creates a section heading with the desired content, size, indentation, etc., but NO section number.

It is my understanding from the current documentation that making the numbers appear in the rendered document is not doable via the current API, but that a numbering style is envisioned for the future. Am I missing something? Thank you.

@chdelfosse
Copy link

chdelfosse commented May 28, 2019

a solution, call addHeaderNumbering just before saving the document

# --- from https://github.com/python-openxml/python-docx/issues/590,
# --- mods by CD
def iter_heading(paragraphs):
    for paragraph in paragraphs:
        isItHeading=re.match('Heading ([1-9])',paragraph.style.name)
        if isItHeading:
            yield int(isItHeading.groups()[0]),paragraph

def addHeaderNumbering(document):
    hNums=[0,0,0,0,0]
    for index,hx in iter_heading(document.paragraphs):
        # ---put zeroes below---
        for i in range(index+1,5):
            hNums[i]=0
        # ---increment this---
        hNums[index]+=1
        # ---prepare the string---
        hStr=""
        for i in range(1,index+1):
            hStr+="%d."%hNums[i]
        # ---add the numbering---
        hx.text=hStr+" "+hx.text

@yaleLeeNGA
Copy link

The solution above does work. Still, is this feature planned to be added to python-docx library?
@ppebay

@jfthuong
Copy link

jfthuong commented Jan 17, 2020

a solution, call addHeaderNumbering just before saving the document

# --- from https://github.com/python-openxml/python-docx/issues/590,
# --- mods by CD
def iter_heading(paragraphs):
    for paragraph in paragraphs:
        isItHeading=re.match('Heading ([1-9])',paragraph.style.name)
        if isItHeading:
            yield int(isItHeading.groups()[0]),paragraph

def addHeaderNumbering(document):
    hNums=[0,0,0,0,0]
    for index,hx in iter_heading(document.paragraphs):
        # ---put zeroes below---
        for i in range(index+1,5):
            hNums[i]=0
        # ---increment this---
        hNums[index]+=1
        # ---prepare the string---
        hStr=""
        for i in range(1,index+1):
            hStr+="%d."%hNums[i]
        # ---add the numbering---
        hx.text=hStr+" "+hx.text

@chdelfosse

Hi, 2 remarks:

  1. You could replace isItHeading.groups()[0] by isItHeading.group(0), that would be more elegant...
  2. You do not support the case with more than 5 levels of headings ;)

I have this function to help name the headings:

def get_heading_numbers(level: int, hierarchy: List[int]) -> str:
    """Return heading numbers crumbpath (level starts with '1', and not 0)"""
    # We need to fill-up indexes of 0 before the level, if needed
    # We clean-up elements after the level
    # Then join all elements with "." character
    index = level - 1
    for _ in range(len(hierarchy), index + 1):
        hierarchy.append(0)
    del hierarchy[index + 1 :]

    hierarchy[index] += 1
    return ".".join(str(e or 1) for e in hierarchy)

hierarchy: List[int] = list()
print(get_heading_numbers(1, hierarchy))
print(get_heading_numbers(1, hierarchy))
print(get_heading_numbers(2, hierarchy))
print(get_heading_numbers(2, hierarchy))
print(get_heading_numbers(3, hierarchy))
print(get_heading_numbers(1, hierarchy))
print(get_heading_numbers(2, hierarchy))
print(get_heading_numbers(5, hierarchy))
exit()

@bushnerd
Copy link

a solution, call addHeaderNumbering just before saving the document

# --- from https://github.com/python-openxml/python-docx/issues/590,
# --- mods by CD
def iter_heading(paragraphs):
    for paragraph in paragraphs:
        isItHeading=re.match('Heading ([1-9])',paragraph.style.name)
        if isItHeading:
            yield int(isItHeading.groups()[0]),paragraph

def addHeaderNumbering(document):
    hNums=[0,0,0,0,0]
    for index,hx in iter_heading(document.paragraphs):
        # ---put zeroes below---
        for i in range(index+1,5):
            hNums[i]=0
        # ---increment this---
        hNums[index]+=1
        # ---prepare the string---
        hStr=""
        for i in range(1,index+1):
            hStr+="%d."%hNums[i]
        # ---add the numbering---
        hx.text=hStr+" "+hx.text

Great, it works. But it will be change the content of the original docx.

@chdelfosse
Copy link

so what? nobody said you had to use it

@beyond2002
Copy link

Not found a solution yet. UP

@de-adshot
Copy link

any solutions for this?... been more than 2 yrs

@chdelfosse
Copy link

chdelfosse commented May 15, 2021 via email

@disarticulate
Copy link

so i've looked into this several times. The issue appears to be that the underlying xml is very convoluted in how apps like Word generate actual heading numbers.

Here's a primer: http://officeopenxml.com/WPnumbering.php

it seems implented in javascript's docx here: https://docx.js.org/#/usage/numbering

Regardless, it's pretty difficult to parse since it refers to document level metadata, which tracks numbering not as shown but as calculated through whatever is happening in the document.

While this would be a great feature, I can see already someone would need to spend quite a bit of energy getting it to work.

@UchihaArk
Copy link

I simply implemented the function, but it does not support numbers such as: 1.1, 1.1.1

def get_number_text(paragraph, num_id_map):
    numXML = paragraph.part.numbering_part.numbering_definitions._numbering.xml
    root = etree.fromstring(numXML)
    if paragraph.style.paragraph_format.element.pPr.numPr is not None:
        num_id = paragraph.style.paragraph_format.element.pPr.numPr.numId.val
    elif paragraph.paragraph_format.element.pPr.numPr is not None:
        num_id = paragraph.paragraph_format.element.pPr.numPr.numId.val
    else:
        return
    if num_id in num_id_map:
        val = num_id_map[num_id] + 1
        for key in num_id_map:
            if key > num_id:
                num_id_map[key] = 0
    else:
        val = 1

    num_id_map[num_id] = val
    abstractNumId = 0
    for num_data in paragraph.part.numbering_part.numbering_definitions._numbering.num_lst:
        if num_id == num_data.numId:
            abstractNumId = num_data.abstractNumId.val
            break
    abstract_nums = root.xpath(f'.//w:abstractNum[@w:abstractNumId="{abstractNumId}"]',
                               namespaces=root.nsmap)
    for abstract_num in abstract_nums:
        lvls_1 = abstract_num.xpath('.//w:lvl[@w:ilvl="0"]', namespaces=root.nsmap)
        if lvls_1 and len(lvls_1) > 0:
            lvlText = lvls_1[0].xpath(f'.//w:lvlText/@w:val', namespaces=root.nsmap)
            if lvlText and len(lvlText) > 0:
                num_text_format = lvlText[0]
                num_text = num_text_format.replace("%1", str(val))
    paragraph.text = num_text + " " + paragraph.text

@chdelfosse
Copy link

chdelfosse commented Jan 12, 2024 via email

@nguyendangson
Copy link

It seems that python-docx has no function to extract number headings so I created it, you can use it, please see in my github: https://github.com/nguyendangson/extract_number_heading_python-docx

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests