Skip to content

Basic HTML Reader #254

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
May 30, 2014
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ This is the changelog between releases of PHPWord. Releases are listed in revers

## 0.11.0 - Not yet released

This release marked the change of PHPWord license from LGPL 2.1 to LGPL 3. Four new elements were added: TextBox, ListItemRun, Field, and Line. Relative and absolute positioning for images and textboxes were added. Writer classes were refactored into parts, elements, and styles. ODT and RTF features were enhanced. Ability to add elements to PHPWord object via HTML were implemeted. RTF reader were initiated.
This release marked the change of PHPWord license from LGPL 2.1 to LGPL 3. Four new elements were added: TextBox, ListItemRun, Field, and Line. Relative and absolute positioning for images and textboxes were added. Writer classes were refactored into parts, elements, and styles. ODT and RTF features were enhanced. Ability to add elements to PHPWord object via HTML were implemeted. RTF and HTML reader were initiated.

### Features

Expand Down Expand Up @@ -33,6 +33,7 @@ This release marked the change of PHPWord license from LGPL 2.1 to LGPL 3. Four
- RTF Reader: Basic RTF reader - @ivanlanin GH-72 GH-252
- Element: New `Line` element - @basjan GH-253
- Title: Ability to apply numbering in heading - @ivanlanin GH-193
- HTML Reader: Basic HTML reader - @ivanlanin GH-80 GH-254

### Bugfixes

Expand Down
110 changes: 55 additions & 55 deletions docs/intro.rst
Original file line number Diff line number Diff line change
Expand Up @@ -117,61 +117,61 @@ Writers
Readers
~~~~~~~

+---------------------------+----------------------+--------+-------+-------+
| Features | | DOCX | ODT | RTF |
+===========================+======================+========+=======+=======+
| **Document Properties** | Standard | ✓ | | |
+---------------------------+----------------------+--------+-------+-------+
| | Custom | ✓ | | |
+---------------------------+----------------------+--------+-------+-------+
| **Element Type** | Text | ✓ | ✓ | ✓ |
+---------------------------+----------------------+--------+-------+-------+
| | Text Run | ✓ | | |
+---------------------------+----------------------+--------+-------+-------+
| | Title | ✓ | ✓ | |
+---------------------------+----------------------+--------+-------+-------+
| | Link | ✓ | | |
+---------------------------+----------------------+--------+-------+-------+
| | Preserve Text | ✓ | | |
+---------------------------+----------------------+--------+-------+-------+
| | Text Break | ✓ | | |
+---------------------------+----------------------+--------+-------+-------+
| | Page Break | ✓ | | |
+---------------------------+----------------------+--------+-------+-------+
| | List | ✓ | ✓ | |
+---------------------------+----------------------+--------+-------+-------+
| | Table | ✓ | | |
+---------------------------+----------------------+--------+-------+-------+
| | Image | ✓ | | |
+---------------------------+----------------------+--------+-------+-------+
| | Object | | | |
+---------------------------+----------------------+--------+-------+-------+
| | Watermark | | | |
+---------------------------+----------------------+--------+-------+-------+
| | Table of Contents | | | |
+---------------------------+----------------------+--------+-------+-------+
| | Header | ✓ | | |
+---------------------------+----------------------+--------+-------+-------+
| | Footer | ✓ | | |
+---------------------------+----------------------+--------+-------+-------+
| | Footnote | ✓ | | |
+---------------------------+----------------------+--------+-------+-------+
| | Endnote | ✓ | | |
+---------------------------+----------------------+--------+-------+-------+
| **Graphs** | 2D basic graphs | | | |
+---------------------------+----------------------+--------+-------+-------+
| | 2D advanced graphs | | | |
+---------------------------+----------------------+--------+-------+-------+
| | 3D graphs | | | |
+---------------------------+----------------------+--------+-------+-------+
| **Math** | OMML support | | | |
+---------------------------+----------------------+--------+-------+-------+
| | MathML support | | | |
+---------------------------+----------------------+--------+-------+-------+
| **Bonus** | Encryption | | | |
+---------------------------+----------------------+--------+-------+-------+
| | Protection | | | |
+---------------------------+----------------------+--------+-------+-------+
+---------------------------+----------------------+--------+-------+-------+-------+
| Features | | DOCX | ODT | RTF | HTML |
+===========================+======================+========+=======+=======+=======+
| **Document Properties** | Standard | ✓ | | | |
+---------------------------+----------------------+--------+-------+-------+-------+
| | Custom | ✓ | | | |
+---------------------------+----------------------+--------+-------+-------+-------+
| **Element Type** | Text | ✓ | ✓ | ✓ | ✓ |
+---------------------------+----------------------+--------+-------+-------+-------+
| | Text Run | ✓ | | | |
+---------------------------+----------------------+--------+-------+-------+-------+
| | Title | ✓ | ✓ | | |
+---------------------------+----------------------+--------+-------+-------+-------+
| | Link | ✓ | | | |
+---------------------------+----------------------+--------+-------+-------+-------+
| | Preserve Text | ✓ | | | |
+---------------------------+----------------------+--------+-------+-------+-------+
| | Text Break | ✓ | | | |
+---------------------------+----------------------+--------+-------+-------+-------+
| | Page Break | ✓ | | | |
+---------------------------+----------------------+--------+-------+-------+-------+
| | List | ✓ | ✓ | | ✓ |
+---------------------------+----------------------+--------+-------+-------+-------+
| | Table | ✓ | | | ✓ |
+---------------------------+----------------------+--------+-------+-------+-------+
| | Image | ✓ | | | |
+---------------------------+----------------------+--------+-------+-------+-------+
| | Object | | | | |
+---------------------------+----------------------+--------+-------+-------+-------+
| | Watermark | | | | |
+---------------------------+----------------------+--------+-------+-------+-------+
| | Table of Contents | | | | |
+---------------------------+----------------------+--------+-------+-------+-------+
| | Header | ✓ | | | |
+---------------------------+----------------------+--------+-------+-------+-------+
| | Footer | ✓ | | | |
+---------------------------+----------------------+--------+-------+-------+-------+
| | Footnote | ✓ | | | |
+---------------------------+----------------------+--------+-------+-------+-------+
| | Endnote | ✓ | | | |
+---------------------------+----------------------+--------+-------+-------+-------+
| **Graphs** | 2D basic graphs | | | | |
+---------------------------+----------------------+--------+-------+-------+-------+
| | 2D advanced graphs | | | | |
+---------------------------+----------------------+--------+-------+-------+-------+
| | 3D graphs | | | | |
+---------------------------+----------------------+--------+-------+-------+-------+
| **Math** | OMML support | | | | |
+---------------------------+----------------------+--------+-------+-------+-------+
| | MathML support | | | | |
+---------------------------+----------------------+--------+-------+-------+-------+
| **Bonus** | Encryption | | | | |
+---------------------------+----------------------+--------+-------+-------+-------+
| | Protection | | | | |
+---------------------------+----------------------+--------+-------+-------+-------+

Contributing
------------
Expand Down
56 changes: 28 additions & 28 deletions docs/src/documentation.md
Original file line number Diff line number Diff line change
Expand Up @@ -111,34 +111,34 @@ Below are the supported features for each file formats.

### Readers

| Features | | DOCX | ODT | RTF |
|-------------------------|--------------------|------|-----|-----|
| **Document Properties** | Standard | ✓ | | |
| | Custom | ✓ | | |
| **Element Type** | Text | ✓ | ✓ | ✓ |
| | Text Run | ✓ | | |
| | Title | ✓ | ✓ | |
| | Link | ✓ | | |
| | Preserve Text | ✓ | | |
| | Text Break | ✓ | | |
| | Page Break | ✓ | | |
| | List | ✓ | ✓ | |
| | Table | ✓ | | |
| | Image | ✓ | | |
| | Object | | | |
| | Watermark | | | |
| | Table of Contents | | | |
| | Header | ✓ | | |
| | Footer | ✓ | | |
| | Footnote | ✓ | | |
| | Endnote | ✓ | | |
| **Graphs** | 2D basic graphs | | | |
| | 2D advanced graphs | | | |
| | 3D graphs | | | |
| **Math** | OMML support | | | |
| | MathML support | | | |
| **Bonus** | Encryption | | | |
| | Protection | | | |
| Features | | DOCX | ODT | RTF | HTML|
|-------------------------|--------------------|------|-----|-----|-----|
| **Document Properties** | Standard | ✓ | | | |
| | Custom | ✓ | | | |
| **Element Type** | Text | ✓ | ✓ | ✓ | ✓ |
| | Text Run | ✓ | | | |
| | Title | ✓ | ✓ | | |
| | Link | ✓ | | | |
| | Preserve Text | ✓ | | | |
| | Text Break | ✓ | | | |
| | Page Break | ✓ | | | |
| | List | ✓ | ✓ | | ✓ |
| | Table | ✓ | | | ✓ |
| | Image | ✓ | | | |
| | Object | | | | |
| | Watermark | | | | |
| | Table of Contents | | | | |
| | Header | ✓ | | | |
| | Footer | ✓ | | | |
| | Footnote | ✓ | | | |
| | Endnote | ✓ | | | |
| **Graphs** | 2D basic graphs | | | | |
| | 2D advanced graphs | | | | |
| | 3D graphs | | | | |
| **Math** | OMML support | | | | |
| | MathML support | | | | |
| **Bonus** | Encryption | | | | |
| | Protection | | | | |

## Contributing

Expand Down
6 changes: 5 additions & 1 deletion samples/Sample_26_Html.php
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,11 @@
$section = $phpWord->addSection();
$html = '<h1>Adding element via HTML</h1>';
$html .= '<p>Some well formed HTML snippet needs to be used</p>';
$html .= '<p>With for example <strong>some <em>inline</em> formatting</strong></p>';
$html .= '<p>With for example <strong>some<sup>1</sup> <em>inline</em> formatting</strong><sub>1</sub></p>';
$html .= '<p>Unordered (bulleted) list:</p>';
$html .= '<ul><li>Item 1</li><li>Item 2</li><ul><li>Item 2.1</li><li>Item 2.1</li></ul></ul>';
$html .= '<p>Ordered (numbered) list:</p>';
$html .= '<ol><li>Item 1</li><li>Item 2</li></ol>';

\PhpOffice\PhpWord\Shared\Html::addHtml($section, $html);

Expand Down
15 changes: 15 additions & 0 deletions samples/Sample_30_ReadHTML.php
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
<?php
include_once 'Sample_Header.php';

// Read contents
$name = basename(__FILE__, '.php');
$source = realpath(__DIR__ . "/resources/{$name}.html");

echo date('H:i:s'), " Reading contents from `{$source}`", EOL;
$phpWord = \PhpOffice\PhpWord\IOFactory::load($source, 'HTML');

// Save file
echo write($phpWord, basename(__FILE__, '.php'), $writers);
if (!CLI) {
include_once 'Sample_Footer.php';
}
15 changes: 15 additions & 0 deletions samples/resources/Sample_30_ReadHTML.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
<html>
<head>
<meta charset="UTF-8" />
<title>PHPWord</title>
</head>
<body>
<h1>Adding element via HTML</h1>
<p>Some well formed HTML snippet needs to be used</p>
<p>With for example <strong>some<sup>1</sup> <em>inline</em> formatting</strong><sub>1</sub></p>
<p>Unordered (bulleted) list:</p>
<ul><li>Item 1</li><li>Item 2</li><ul><li>Item 2.1</li><li>Item 2.1</li></ul></ul>
<p>Ordered (numbered) list:</p>
<ol><li>Item 1</li><li>Item 2</li></ol>
</body>
</html>
2 changes: 1 addition & 1 deletion src/PhpWord/IOFactory.php
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@ public static function createWriter(PhpWord $phpWord, $name = 'Word2007')
*/
public static function createReader($name = 'Word2007')
{
if (!in_array($name, array('ReaderInterface', 'Word2007', 'ODText', 'RTF'))) {
if (!in_array($name, array('ReaderInterface', 'Word2007', 'ODText', 'RTF', 'HTML'))) {
throw new Exception("\"{$name}\" is not a valid reader.");
}

Expand Down
50 changes: 50 additions & 0 deletions src/PhpWord/Reader/HTML.php
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
<?php
/**
* This file is part of PHPWord - A pure PHP library for reading and writing
* word processing documents.
*
* PHPWord is free software distributed under the terms of the GNU Lesser
* General Public License version 3 as published by the Free Software Foundation.
*
* For the full copyright and license information, please read the LICENSE
* file that was distributed with this source code. For the full list of
* contributors, visit https://github.com/PHPOffice/PHPWord/contributors.
*
* @link https://github.com/PHPOffice/PHPWord
* @copyright 2010-2014 PHPWord contributors
* @license http://www.gnu.org/licenses/lgpl.txt LGPL version 3
*/

namespace PhpOffice\PhpWord\Reader;

use PhpOffice\PhpWord\PhpWord;
use PhpOffice\PhpWord\Shared\Html as HTMLParser;

/**
* HTML Reader class
*
* @since 0.11.0
*/
class HTML extends AbstractReader implements ReaderInterface
{
/**
* Loads PhpWord from file
*
* @param string $docFile
* @throws \Exception
* @return \PhpOffice\PhpWord\PhpWord
*/
public function load($docFile)
{
$phpWord = new PhpWord();

if ($this->canRead($docFile)) {
$section = $phpWord->addSection();
HTMLParser::addHtml($section, file_get_contents($docFile), true);
} else {
throw new \Exception("Cannot read {$docFile}.");
}

return $phpWord;
}
}
Loading