word文書から生成されるxmlであるwordmlはスタンダードでないけれども、doc文書コンバータdocvertを使うと一旦スタンダードであるoasis opendocumentに変換した上で、htmlやrssやどんな形式のxmlフォーマットにも変換することが出来る。

Docvert is easy to integrate as it uses a simple REST-style interface, and it's released under the LGPL so although it's open source there's no legal problems developing proprietary software ontop of it. The XML produced is easier to understand and more structured than the WordML or .DOC formats.
Docvert builds upon the work of several word processors such as, Abiword, and soon KOffice. Docvert is very grateful to benefit from their years of reverse engineering work.