The Fastest Word File Writer

Highlights

It's a fast, full-featured word file format writer. And It also can write RTF files (but I remove this function in the published version provided here). And this version is only for technology show. You can't use it in commercial way.

Why it is fast?

  • First, we use GC Allocator to accelerate memory allocation.
  • Second, we create a DOM tree which its internal data structures is similar to word file format. It provides powerful APIs and reduces IO times.

Full-featured

  • Yes. it can write all word2003 functions such as rich text, font/stylesheet, bullet/list, header/footer, footnote/endnote, fields, bookmark, annotation, frame, table, drawing (including picture, textbox, group shape, ole, etc), revision, and more.
  • You can choose save as a word file, or a RTF file. And It's easy to support a new file format because our DOM tree also provided reading APIs to access all data.

Taking a fast look

Our word file writer is named as "wpio" (wpio.dll in windows, and libwpio.so in linux).

You can download our demo applications. Here I will give a demonstration of "hello, world":

#pragma warning(disable: 4192)
#import "wpio.dll"
 
using namespace WpioLib;
 
#define outputPath(file)    L ## file
 
void main()
{
    WpioWriterPtr writer("Wpio.Writer");
 
    writer->NewWordDocfile(
        outputPath("testFirst.doc"), STGM_CREATE|STGM_READWRITE|STGM_SHARE_EXCLUSIVE);
 
    WpioDocument* doc = writer->Data;
 
    doc->NewNormalSection();
    doc->NewNormalParagraph();
    doc->NewNormalSpan();
    doc->AddText(L"Hello world!\x0d", 13);
 
    writer->Save();
}

Yes, that's all. Is it simple? Let's try to run it:

  1. First, we need to register wpio.dll. We have provided register.bat. Just double click register.bat to run register script. You also can type this in command line: regsvr32 wpio.dll
  2. Second, create a project (Visual C++ 6.0 or higher), and build it.
  3. Last, double click run the demo application. It will create a word file named "testFirst.doc" which only has a simple "Hello, world!" text content.

A better COM

There are an important thing that need to explain for our demo. Notice this line:

WpioDocument* doc = writer->Data;

I don't write it like this (normally you may write such code):

WpioDocumentPtr doc = writer->Data;
// or: CComPtr<WpioDocument> = writer->Data;

COM is bad in a sense. You have to take a serious look at reference count of objects. If I make a mistake, It may cause a memory leak. And it's more difficult to eliminate memory leaks than an application that doesn't use COM technology.

GC Allocator take things to be simple. We provide a DOM tree using COM technology. But only the root object named "CWpioWriter" (its progid is "Wpio.Writer") has a reference count, all the other objects don't owned reference counts. They are allocated by GC allocator, and will be destroyed automatically when the root object is destroyed.

About the download resources

Download from here: our demo applications (This version is only for technology show. You can't use it in commercial way).

Directory tree:

wpio
   |--- bin
   |      |--- input
   |      |        +--- bullet.jpg, test.png
   |      |
   |      |--- output
   |      |
   |      |--- wpio.dll, register.bat
   |      |--- testwpio.exe, testdw.exe
   |      +--- cppunit.dll, zlib1.dll 
   | 
   +--- test
           +--- testwpio
                     +--- testwpio.dsp

wpio.dll: Our word file writer component.

register.bat: A script to register wpio.dll.

testwpio.exe: A demo application using wpio. Source code also is provided.

testdw.exe: A demo application that write word files without using wpio, but directly using the underlying non-COM components. It generate more than 100 word files. Normally a word file means one word feature is tested.

Performance

testwpio.exe has two performance testcases. You can run these testcases, and use msword to open them, and save as another word file. Feel pleasure to compare the saving speed.

One is called "drawingPerformance". It generate a word file with 32768 pages. Every page contains one drawing shape. Let's see the source code:

void drawingPerformance()
{
    srand( (int)time(NULL) );
 
    WpioWriterPtr writer("Wpio.Writer");
 
    writer->NewWordDocfile(
        outputPath("drawingPerformance.doc"), 
        STGM_CREATE|STGM_READWRITE|STGM_SHARE_EXCLUSIVE);
 
    WpioDocument* doc = writer->Data;
 
    doc->NewNormalSection();
 
    for (int i = 0; i < 0x8000; ++i)
    {
        doc->NewNormalParagraph();
        doc->NewNormalSpan();
 
        DgioShape* shape = doc->AddShape(FALSE);
        shape->ShapeType = (DgioShapeType)((rand() & 0x7f) + 1);
        shape->WpClientAnchor = makeAnchor(0x768, 0x8D4, 0x10EC, 0x1018);
 
        doc->AddChar(0x0c); // new page
    }
 
    doc->AddChar(0x0d);
 
    writer->Save();
}

The other testcase is called "textPerformance". It generate 65536 paragraphs of text. Let's see the source code:

void textPerformance()
{
    WpioWriterPtr writer("Wpio.Writer");
 
    writer->NewWordDocfile(
        outputPath("textPerformance.doc"), 
        STGM_CREATE|STGM_READWRITE|STGM_SHARE_EXCLUSIVE);
 
    WpioDocument* doc = writer->Data;
 
    doc->NewNormalSection();
 
    for (int i = 0; i < 0x10000; ++i)
    {
        doc->NewNormalParagraph();
        doc->NewNormalSpan();
        doc->AddText(L"Hello world!\x0d", 13);
    }
 
    writer->Save();
}
Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License