Home/Campaigns/Document preparation

Document preparation

Your AI Assistant performs best when source material is clear and structured. Well-organized files with clear headings, readable text, and simple tables let the system index content precisely — so answers stay relevant and grounded in your actual material. This guide covers which file formats work best, what good documents look like, and how to prepare common problem areas — complex tables, charts, and diagrams — before uploading. Most documents need only minor adjustments; preparation effort scales with content complexity.


Preferred document types

These formats preserve text structure and give the most consistent results.

FormatBest for
PDFPolicies, whitepapers, case studies, long-form guides
DOCXProduct documentation, SOPs, FAQs, internal playbooks
PPTXDecks with clear slide titles and bullet points
TXTTranscripts, call notes, exported knowledge base articles
JPG / PNGScreenshots and images of text-heavy content (reports, slides, signs)

Documents that work well with AI systems

Use these patterns to keep extraction clean and responses accurate.

Characteristics of a clean document

  • Headings create a clear hierarchy (H1 → H2 → H3)
  • Each section has a single topic and short paragraphs
  • Lists use bullets or numbers instead of dense blocks of text
  • Tables are simple and have one header row
  • Text is readable with strong contrast and minimal background noise

Clean, structured examples are the easiest for the AI to index. You can keep branding — just make sure the core text is clear and consistent.

✅ Good example — Clean report page with clear sections
Report page with a large headline and three-column sections

Large heading, consistent subheads, and short paragraphs make the structure easy to extract.

✅ Good example — Report page with a clean table
Report page showing a greenhouse gas emissions table with column headers

The table has a single header row and clear labels, with supporting text in short sections.

✅ Good example — Slide with one idea per slide
Presentation slide with a title and bullet list

One title and a short bullet list keep the content readable and easy to index.

✅ Good example — Structured Q&A with consistent formatting
Dark-themed page listing numbered questions and answers with sources

Consistent question-answer formatting and readable contrast help extraction even with a dark theme.


Documents that need preparation

The documents below need minor preparation before upload. Follow the steps to make them AI-ready.

Tables

Tables lose structure when flattened into text. Preparation depends on table type: layout complexity vs. structured datasets.

Layout issues

Tables with merged cells or multi-level headers need reshaping to keep the meaning intact.

⚠️ Needs preparation — Merged cells and multi-row headings
Oracle event agenda with merged time cells spanning multiple session rows

Merged cells and nested rows can break time-to-session alignment. Flatten the structure first.

🔧 Preparation steps — complex tables

  1. Convert to bullet points when possible — list each row as a short sentence (e.g. "Session: Keynote | Time: 9:00 AM | Venue: Hall A").
  2. Write out dates and venues in full on every row — never rely on merged cells or implied context from a row above.
  3. Make every row self-contained — a reader (or the AI) should understand each row without reading the others.
  4. Remove extraneous content such as side navigation columns, banners, or decorative panels that can confuse extraction.
  5. If the table spans multiple pages, repeat the header row at the top of each page so column context is never lost.
  6. Add a short plain-text summary above or below the table describing what it covers (e.g. "This table lists all session times, speakers, and room assignments for Day 1 of the event.").

Structured data tables

Structured data tables with embedded visuals, formula-driven values, or Gantt-style layouts are not suitable for direct upload. Export only the task data as a clean, flat table.

⚠️ Needs preparation — Gantt chart with merged cells and embedded visuals
Excel Gantt chart project planner with merged date columns and embedded bar charts

Gantt charts with embedded visuals, merged date columns, and formula-driven progress bars are not suitable for direct upload. Export only the task data as a clean table.

✅ Good example — Clean flat table with consistent columns
CSAT data table with ID, Customer Name, Sentiment, CSAT Score, and Call Timestamp columns

A flat table with clear headers and consistent row data works well. Row-level details like IDs, names, and scores can be retrieved accurately.

  • Always pre-aggregate totals, averages, counts, and derived values before uploading. Do not rely on the AI to compute these from raw rows — results may be unreliable.
  • Limit rows and columns. Avoid wide tables or dense matrices.
  • Row-level lookups for specific records (IDs, names, values) are reliable. Validate in the sandbox before using in production.
  • Summarize key outcomes (totals, trends, decisions) in plain text alongside the table — this gives the AI a reliable reference without requiring calculation.
  • Split large workbooks into smaller, topic-specific summaries.

Charts and data visualizations

Charts are visual by nature. The AI needs the underlying numbers in text to interpret the chart correctly.

✅ Good example — Chart with values printed
Horizontal bar chart with values printed on each bar

Printing the numeric value on each bar helps, but a plain-text list is still best.

🔧 Preparation steps — charts

  1. Extract values into a short table or bullet list.
  2. Include units and time periods in the text (e.g. 2024 Q3, USD).
  3. Keep the chart image as optional visual context, but lead with the data list.

Technical diagrams and process flows

Diagrams encode meaning through layout and arrows. Provide a text version so the AI can follow the relationships.

⚠️ Needs preparation — Architecture diagram with icons and arrows
Network architecture diagram with icons and colored zones

Icons and arrows are visual relationships. Convert them into text steps or connections.

🔧 Preparation steps — diagrams

  1. List the main components and what each one does.
  2. Describe connections in a simple A -> B format.
  3. Group components by layer or zone (e.g. Edge, Core, Data).
  4. Add a short legend for icons or colors used in the diagram.

Pre-upload checklist

A summary of the preparation steps covered in this guide.

  1. Use headings and bullets — Keep each section short and focused
  2. Remove noise — Delete irrelevant pages, duplicate content, or filler text
  3. Flatten complex tables — One header row, no merged cells
  4. Turn charts into text — Provide the data values in a list or table
  5. Describe diagrams — Add a text summary of components and flows
  6. Check image quality — Ensure text is sharp, high-contrast, and readable
  7. Keep one topic per file — Split large multi-topic documents when possible
  8. Sanity check readability — If you can scan it quickly, the AI can too

Quick reference

✅ Great as-is⚠️ Needs preparation
Digital PDFs with clear headingsScanned PDFs with skewed or faint text
DOCX with short sections and listsTables with merged cells or multi-row headers
PPTX with one idea per slideCharts without printed values
Plain text transcriptsTechnical diagrams without a text summary
Readable images with high-contrast textLow-contrast images or patterned backgrounds

Validate in the sandbox

Use the ZipTier Sandbox (testing environment) to verify response quality after uploading your documents. If responses are not accurate, refine the source content using this guide and reupload the documents.