Skip to main content

Evidence Submission and Mapping

What it is

Evidence submission is the process of uploading documents that demonstrate compliance with requirements. Evidence mapping is how the platform associates uploaded documents with specific requirements through AI-powered semantic analysis.

This guide covers best practices for preparing evidence, understanding the upload process, and maximizing the accuracy of automatic mapping.

When to use

Use this guide when:

  • Preparing documents for upload
  • Wanting to improve assessment accuracy
  • Understanding why certain requirements show as MISSING
  • Optimizing evidence organization

Do not use when:

Prerequisites

Before starting, ensure you have:

  • Access to the portal interface
  • Evidence documents in supported formats
  • Understanding of the requirements you're addressing

Step-by-step

Step 1: Understand supported formats

The platform accepts these file types:

FormatExtensionNotes
PDF.pdfBest for formatted documents. Ensure text is selectable (not scanned images).
Word.docxConverted to text with formatting preserved.
Excel.xlsxEach sheet processed separately. Good for checklists and matrices.
CSV.csvSimple tabular data.
Text.txtPlain text files.
Images.png, .jpg, .jpegLimited text extraction. Best for diagrams with OCR.
Archives.zipContents extracted and processed individually.

Step 2: Prepare your documents

For best results:

  1. Use descriptive filenames: Access-Control-Policy-v2.1.pdf is better than doc1.pdf
  2. Ensure text is extractable: For PDFs, verify you can select and copy text
  3. Remove password protection: Protected files cannot be processed
  4. Keep files under size limit: Default upload limit is 100 MB (deployment-configured). (✅ Verified: server/routes.ts:88)
  5. Use native formats: Convert scanned documents to text via OCR before upload

Step 3: Organize by topic

Structure your evidence strategically:

/Evidence Bundle/
├── Access-Management/
│ ├── Access-Control-Policy.pdf
│ ├── Privileged-Access-Procedure.docx
│ └── Access-Review-Records.xlsx
├── Security-Operations/
│ ├── Incident-Response-Plan.pdf
│ └── Security-Monitoring-Procedure.docx
└── Governance/
├── Information-Security-Policy.pdf
└── Risk-Assessment-Report.pdf

You can upload the entire folder as a ZIP file, and the platform preserves the structure.

Step 4: Upload documents

  1. Navigate to the Portal (served at /cm in this repo)
  2. Click Upload or drag files to the document panel
  3. Monitor processing status for each file
  4. Wait for all documents to reach READY status

Step 5: Understand the processing pipeline

Each document goes through these stages:

StageWhat happens
UPLOADEDFile received and stored securely
EXTRACTEDText content pulled from file format
PARSEDContent split into chunks (🧩 Template: chunk size is deployment-configured)
INDEXEDVector embeddings created for semantic search
READYAvailable for assessment runs

Step 6: Verify successful processing

Check each document:

  1. Status shows READY (green indicator)
  2. Click the document to see extracted content preview
  3. Verify text was extracted correctly
  4. Note any extraction issues for re-upload

Step 7: Start an assessment

Once documents are READY:

  1. Click Run Assessment
  2. The AI evaluates each requirement against all indexed evidence
  3. For each requirement, the AI:
    • Searches for semantically relevant chunks
    • Evaluates if the evidence satisfies the requirement
    • Assigns a status (COMPLETE, PARTIAL, MISSING)
    • Provides citations and reasoning

Example

Scenario: Uploading a ZIP bundle containing access management evidence.

Input:

AccessManagement-Bundle.zip/
├── Access-Control-Policy.pdf (15 pages)
├── Privileged-Access-Procedure.docx (8 pages)
└── Quarterly-Access-Review.xlsx (3 sheets)

Process:

  1. Upload ZIP → Three child documents created
  2. Each document processes independently → All reach READY (🧩 Template: timing varies by deployment)
  3. Run assessment → 10 access-related requirements evaluated

Result:

  • Access-Control-Policy.pdf cited for 6 requirements
  • Privileged-Access-Procedure.docx cited for 3 requirements
  • Quarterly-Access-Review.xlsx cited for 2 requirements (overlapping)
  • 1 requirement still MISSING (requires password policy documentation)

Troubleshooting

Document stuck in EXTRACTED status

Symptom: Processing stops after text extraction.

Causes and fixes:

  • Very large document: Split into smaller files
  • Unusual encoding: Re-export as UTF-8
  • System timeout: Wait and check if it eventually completes

No text extracted from PDF

Symptom: Document processes but shows empty content.

Causes and fixes:

  • Scanned image without OCR: Use Adobe Acrobat or similar to add text layer
  • PDF is image-only: Convert to text first using OCR software
  • Corrupted file: Re-export the PDF

Excel sheets not indexed

Symptom: Spreadsheet reaches READY but content not found in searches.

Causes and fixes:

  • Empty cells dominate: Ensure sheets have meaningful text content
  • Data is numeric only: Add text descriptions or headers
  • Hidden sheets: Unhide all sheets before upload

ZIP extraction fails

Symptom: ZIP upload fails or creates no child documents.

Causes and fixes:

  • Archive corrupted: Re-create the ZIP file
  • Nested too deeply: Maximum 5 levels of nesting
  • Too many files: Maximum 1000 files per ZIP
  • Size limit: Maximum 500MB total extracted size

Evidence not found during assessment

Symptom: AI marks requirements as MISSING despite relevant evidence.

Causes and fixes:

  • Terminology mismatch: Evidence uses different words than requirements
  • Implicit compliance: Evidence implies rather than explicitly states compliance
  • Low similarity: Content is too tangential to match

Duplicate document warning

Symptom: Upload shows "duplicate detected" message.

Causes and fixes:

  • Exact duplicate: File with same SHA256 hash already exists—this is expected
  • Near-duplicate: Different file with same content—platform processes it as new
  • Intentional re-upload: No action needed—original remains indexed

Document shows FAILED status

Symptom: Document processing failed with error.

Causes and fixes:

  • Unsupported format: Convert to a supported format
  • File corrupted: Re-export or re-download the original
  • Password protected: Remove protection before upload

Large document processing timeout

Symptom: Processing never completes for very large files.

Causes and fixes:

  • Split large documents into sections
  • Remove unnecessary pages (cover pages, blanks)
  • Use a more efficient format (DOCX instead of large PDF)

Gotchas and edge cases

  1. Duplicate detection uses SHA256: Two files with identical content share the same hash and won't be re-indexed. Modify the file (even adding a space) to force reprocessing.

  2. Parent-child relationships: Files extracted from ZIPs have a parent. Deleting the parent ZIP deletes all extracted children.

  3. Image files have limited value: PNG/JPG files yield minimal text unless they contain text that OCR can extract.

  4. Assessment is point-in-time: Documents uploaded after a run completes are not included in that run's results. Start a new run to include new evidence.

  5. Chunk size affects matching: Very short documents may become a single chunk, while long documents create many. This affects how precisely the AI can cite evidence.

  6. Filename is metadata: The original filename is preserved and displayed, but content is what matters for matching.

  7. XLSX formulas are not evaluated: Only text values are extracted; formula results may not appear if cells show formulas.