We ❤️ Open Source

A community education resource

6 min read

5 elements to explore in ODT files

Use tools such as unzip and zipinfo to examine an OpenDocument Text word processing file and extract data from it.

Word processing files used to be closed, proprietary formats. In some older word processors, the document file was essentially a memory dump from the word processor. While this made for faster loading of the document into the word processor, it also made the document file format an opaque mess.

Around 2005, the Organization for the Advancement of Structured Information Standards (OASIS) group defined an open format for office documents of all types, the Open Document Format for Office Applications (ODF). You may also see ODF referred to as simply “OpenDocument Format” because it is an open standard based on the OpenOffice.org’s XML file specification. ODF includes several file types, including ODT for OpenDocument Text documents. There’s a lot to explore in an ODT file, and it starts with a zip file.

The zip file structure

Like all ODF files, ODT is actually an XML document and other files wrapped in a zip file container. Using zip means files take less room on disk, but it also means you can use standard zip tools to examine an ODF file.

I have an article about IT leadership called “Nibbled to death by ducks” that I saved as an ODT file. Since this is an ODF file, which is a zip file container, you can use the unzip command line tool to examine it:

$ unzip -l Ducks.odt 
Archive:  Ducks.odt
 Length      Date    Time    Name
---------  ---------- -----   ----
      39  05-09-2023 23:41   mimetype
   12713  05-09-2023 23:41   Thumbnails/thumbnail.png
  915001  05-09-2023 23:41   Pictures/10000201000004500000026DBF6636B0B9352031.png
   10879  05-09-2023 23:41   content.xml
   20048  05-09-2023 23:41   styles.xml
    9576  05-09-2023 23:41   settings.xml
     757  05-09-2023 23:41   meta.xml
     260  05-09-2023 23:41   manifest.rdf
       0  05-09-2023 23:41   Configurations2/accelerator/
       0  05-09-2023 23:41   Configurations2/toolpanel/
       0  05-09-2023 23:41   Configurations2/statusbar/
       0  05-09-2023 23:41   Configurations2/progressbar/
       0  05-09-2023 23:41   Configurations2/toolbar/
       0  05-09-2023 23:41   Configurations2/popupmenu/
       0  05-09-2023 23:41   Configurations2/floater/
       0  05-09-2023 23:41   Configurations2/menubar/
    1192  05-09-2023 23:41   META-INF/manifest.xml
---------                     -------
  970465                     17 files

If you have the zipinfo tool installed on your system, you can use that to see more details about the ODT file contents:

$ zipinfo Ducks.odt 
Archive:  Ducks.odt
Zip file size: 938232 bytes, number of entries: 17
-rw----     2.0 fat       39 b- stor 23-May-09 23:41 mimetype
-rw----     2.0 fat    12713 b- stor 23-May-09 23:41 Thumbnails/thumbnail.png
-rw----     2.0 fat   915001 b- stor 23-May-09 23:41 Pictures/10000201000004500000026DBF6636B0B9352031.png
-rw----     2.0 fat    10879 bl defN 23-May-09 23:41 content.xml
-rw----     2.0 fat    20048 bl defN 23-May-09 23:41 styles.xml
-rw----     2.0 fat     9576 bl defN 23-May-09 23:41 settings.xml
-rw----     2.0 fat      757 bl defN 23-May-09 23:41 meta.xml
-rw----     2.0 fat      260 bl defN 23-May-09 23:41 manifest.rdf
-rw----     2.0 fat        0 b- stor 23-May-09 23:41 Configurations2/accelerator/
-rw----     2.0 fat        0 b- stor 23-May-09 23:41 Configurations2/toolpanel/
-rw----     2.0 fat        0 b- stor 23-May-09 23:41 Configurations2/statusbar/
-rw----     2.0 fat        0 b- stor 23-May-09 23:41 Configurations2/progressbar/
-rw----     2.0 fat        0 b- stor 23-May-09 23:41 Configurations2/toolbar/
-rw----     2.0 fat        0 b- stor 23-May-09 23:41 Configurations2/popupmenu/
-rw----     2.0 fat        0 b- stor 23-May-09 23:41 Configurations2/floater/
-rw----     2.0 fat        0 b- stor 23-May-09 23:41 Configurations2/menubar/
-rw----     2.0 fat     1192 bl defN 23-May-09 23:41 META-INF/manifest.xml
17 files, 970465 bytes uncompressed, 936092 bytes compressed:  3.5%

Elements in the ODT zip file structure

I want to highlight a few elements of the zip file structure:

1. Note that the mimetype file is the first file in the archive, and it is stored uncompressed. That is part of the ODT specification.

2. The mimetype file contains a single line that defines the ODF document. Programs that process ODT files, such as a word processor, can use this file to verify the MIME type of the document. For an ODT file, this should always be:


3. The META-INF directory has a single manifest.xml file in it. This file contains all the information about where to find other components of the ODT file. Any program that reads ODT files starts with this file to locate everything else. For example, the manifest.xml file for my ODT document contains this line that defines where to find the main content:

<manifest:file-entry manifest:full-path="content.xml" manifest:media-type="text/xml"/>

4. The content.xml file contains the actual content of the document.

5. My document includes a single screenshot, which is contained in the Pictures directory.

Extracting files from an ODT file

Because the ODT document is just a zip file with a specific structure to it, you can extract files from it. You can start by unzipping the entire ODT file, such as with this unzip command:

$ unzip -q Ducks.odt -d Ducks

To do this in a file manager, rename the file by changing the .odt extension to a .zip extension. Most file managers will recognize the .zip extension and allow you to extract it with a menu action when you right-click on the file.

A colleague recently asked for a copy of the image that I included in my article. I was able to locate the exact location of any embedded image by looking in the META-INF/manifest.xml file. The grep command can display any lines that describe an image:

$ cd Ducks
$ grep image META-INF/manifest.xml 
<manifest:file-entry manifest:full-path="Thumbnails/thumbnail.png" manifest:media-type="image/png"/>
<manifest:file-entry manifest:full-path="Pictures/10000201000004500000026DBF6636B0B9352031.png" manifest:media-type="image/png"/>

This tells me the image I’m looking for is saved in the Pictures folder. The small thumbnail image of the document is stored in Thumbnails/thumbnail.png. The full size image that I included in my article is saved in Pictures/10000201000004500000026DBF6636B0B9352031.png

OpenDocument Format (ODF) files are an open file format that can describe word processing files (ODT), spreadsheet files (ODS), presentations (ODP), and other file types. Because ODF files are based on open standards, you can use other tools to examine them and even extract data from them. You just need to know where to start. All ODF files start with the META-INF/manifest.xml file, which is the “root” or “bootstrap” file for the rest of the ODF file format. Once you know where to look, you can find the rest of the content.

This article is adapted from “A look inside: ODT files” by Jim Hall, and is republished with permission from the author.

About the Author

Jim Hall is an open source software advocate and developer, best known for usability testing in GNOME and as the founder + project coordinator of FreeDOS. At work, Jim is CEO of Hallmentum, an IT executive consulting company that provides hands-on IT Leadership training, workshops, and coaching.

Read Jim's Full Bio
Creative Commons License

This work is licensed under a Creative Commons Attribution-Share Alike 4.0 International License.

Save the Date for All Things Open 2024

Join thousands of open source friends October 27-29 in downtown Raleigh for ATO 2024!

Upcoming Events

We do more than just All Things Open and Open Source 101. See all upcoming events here.

Open Source Meetups

We host some of the most active open source meetups in the U.S. Get more info and RSVP to an upcoming event.