Skip to main content
Projects are multi-file workspaces. Attach documents, spreadsheets, and slide decks to a project, and the backend runs the same analysis pipeline the Overten web app uses: fact extraction, reconciliation across files, and agentic verification of escalated findings. You poll the results via GET endpoints and (optionally) generate a .docx report that summarizes everything. There is no separate “start analysis” call. Uploading a file to a project triggers ingestion + analysis automatically.

What you get

  • Fact extraction per file — entities, numerical values, dates, named parties, claims.
  • Cross-file reconciliation — the pipeline compares new facts against the project’s existing ledger and surfaces discrepancies (conflicting values across files) and insights (notable single-file findings).
  • Verified decisions — high-confidence escalated items pass through an agentic verification stage with citations and recommendations.
  • A structured insights feed — one document per alert, filterable by severity, kind, source file, or resolution state.
  • Optional .docx report — a polished, human-readable rollup generated on demand via the same Word agent /word/generate uses.

Lifecycle

  1. POST /projects — create a project (name, description, tags).
  2. POST /projects/{id}/files — upload each file. Ingestion and analysis kick off automatically.
  3. GET /projects/{id}/analysis/status — poll until the pipeline is completed.
  4. GET /projects/{id}/insights — read the results. Filter as needed.
  5. PATCH /projects/{id}/insights/{insight_id} — mark items resolved.
  6. POST /projects/{id}/reports — (optional) generate a .docx insights report.

Quickstart

import requests, time

API = "https://backend.overtenai.com/api/v1"
KEY = "sk_live_..."
H = {"X-API-Key": KEY}

# 1. Create a project
project = requests.post(
    f"{API}/projects",
    headers=H,
    json={"name": "Acme Q3 Due Diligence", "tags": ["dd", "acme"]},
).json()
pid = project["project_id"]

# 2. Attach files. Every upload triggers ingestion + analysis.
for path in ["report.pdf", "financials.xlsx", "contract.docx"]:
    with open(path, "rb") as f:
        requests.post(
            f"{API}/projects/{pid}/files",
            headers=H,
            files={"file": f},
            data={"role": "primary"},
        ).raise_for_status()

# 3. Poll status
while True:
    s = requests.get(f"{API}/projects/{pid}/analysis/status", headers=H).json()
    print(f"phase={s['pipeline']['phase']} progress={s['progress']}%")
    if s["pipeline"]["status"] == "completed":
        break
    time.sleep(10)

# 4. Read insights
insights = requests.get(
    f"{API}/projects/{pid}/insights",
    headers=H,
    params={"severity": "critical,high", "limit": 50},
).json()
for item in insights["items"]:
    print(item["alert"]["description"])

# 5. Generate a .docx report
report = requests.post(
    f"{API}/projects/{pid}/reports",
    headers=H,
    json={"title": "Acme Due Diligence — Key Findings"},
).json()
print(report["download_url"])

Files and roles

POST /projects/{id}/files takes one file per call (multipart):
Form fieldRequiredValues
fileyesthe binary payload
roleno (default context)primary, context, or reference
categorynoone of document, spreadsheet, presentation, image, cv_resume, contract, report, research, other
descriptionnoup to 1000 chars
tagsnoJSON array string, e.g. '["audit", "q3"]'
The role is a hint to the analysis pipeline: primary files carry the authoritative claims; context adds supporting material; reference is for appendices / lookups. Allowed MIME types: PDF, DOCX, XLSX, PPTX, CSV, TSV, TXT, JSON, XML, legacy Office formats, and common image types (PNG, JPEG, WebP).

Polling status

r = requests.get(f"{API}/projects/{pid}/analysis/status", headers=H).json()
Response shape:
{
  "project_id": "proj_abc...",
  "pipeline": {
    "status": "running",
    "phase": "reconciling_facts",
    "analysis_id": "an_...",
    "updated_at": "2026-04-19T18:24:00+00:00",
    "reports_created": 14,
    "disparities_found": 3,
    "insights_found": 11
  },
  "progress": 62,
  "files": {
    "total": 3,
    "indexed": 3,
    "analyzed": 2,
    "failed": 0,
    "in_progress": 1,
    "phase_counts": {"verifying_decisions": 1}
  },
  "insights": {
    "total": 14,
    "by_severity": {"critical": 1, "high": 4, "medium": 7, "low": 2},
    "by_kind": {"disparity": 3, "insight": 11},
    "resolved": 0
  }
}

Phases

The pipeline moves through these phases (pipeline-level, surfaced in pipeline.phase):
  • extracting_facts — per-file fact extraction (LLM + NER).
  • reconciling_facts — comparing against the project’s fact ledger.
  • verifying_decisions — agentic verification of escalated items.
  • completed — terminal success.
  • failed — terminal failure (see pipeline.error_message).
Per-file phases in the same set appear in files.phase_counts.

Filtering insights

Query parameters on GET /projects/{id}/insights:
ParamTypeNotes
limitint1–200, default 50
cursorstringopaque, returned as next_cursor
severitystringcomma list: critical,high,medium,low
alert_kindstringcomma list: disparity,insight,similarity
file_idstringcomma list of source file_ids to restrict to
is_resolvedbooltrue or false
Example: only unresolved critical or high findings —
GET /projects/{id}/insights?severity=critical,high&is_resolved=false

Insight shape

{
  "insight_id": "ins_...",
  "alert_kind": "disparity",
  "is_resolved": false,
  "provisional": false,
  "created_at": "2026-04-19T18:20:00+00:00",
  "alert": {
    "description": "Quarterly revenue differs across two sources.",
    "type": "numerical_disparity",
    "field": "q3_revenue",
    "severity": "critical",
    "current_value": "12,400,000",
    "conflicting_value": "11,800,000",
    "explanation": "...",
    "citations": [...],
    "recommendations": ["Reconcile with finance before publishing."]
  },
  "source": {
    "file_id": "file_...",
    "file_name": "financials.xlsx",
    "sheet_name": "Summary",
    "page_number": null,
    "slide_number": null,
    "chunk_id": "..."
  }
}

Marking insights resolved

requests.patch(
    f"{API}/projects/{pid}/insights/{insight_id}",
    headers=H,
    json={"is_resolved": true, "resolution_note": "Confirmed by finance."},
)
Resolved state is shared with the Overten web app — if someone resolves an insight in the UI, it’s resolved for your API calls too, and vice versa.

Generating a report

POST /projects/{id}/reports produces a .docx built from the project’s insights, using the same Word agent /word/generate uses. Default behaviour: run sync, return the download URL inline.
report = requests.post(
    f"{API}/projects/{pid}/reports",
    headers=H,
    json={
        "title": "Acme Q3 Due Diligence — Findings",
        "include_severity": ["critical", "high", "medium"],
        "include_resolved": false,
    },
).json()
# → {"run_id": "...", "status": "completed", "download_url": "...", "file_name": "insights_report.docx", ...}
Request body fields:
FieldDefaultNotes
format"docx"Only .docx is supported today.
title"<project name> — Insights Report"Overrides the header shown in the doc.
include_severityallArray of critical|high|medium|low.
include_alert_kindallArray of disparity|insight|similarity.
include_resolvedfalseWhether to include items already marked resolved.
file_idsallRestrict to insights whose source file is in this list.
asyncfalseEnqueue and return 202 instead of blocking.
webhook_urlCalled on completion when async: true.
The response shape matches POST /word/generate — same run_id, download_url, edit_url, summary, credits_used, etc. — so you can reuse your existing Word-generation handling. 409 no_insights is returned if the filter produces zero matching insights.

Listing past reports

requests.get(f"{API}/projects/{pid}/reports", headers=H).json()
Returns reports previously generated for this project, newest first, paginated via next_cursor.

Fetching a specific report

requests.get(f"{API}/projects/{pid}/reports/{run_id}", headers=H).json()
Always returns a fresh 24-hour signed download_url, so you don’t need to store it — re-fetch when the URL expires.

Archiving

DELETE /projects/{id} soft-archives the project. Files and insights are preserved; the project just stops appearing in default listings. Pass ?include_archived=true on GET /projects to see archived ones.

Scope and visibility

The same scope rules as the rest of the API apply:
  • Personal keys see projects whose owner matches the key’s user.
  • Workspace keys with canViewAllFiles see every project in the org. Without that permission, members only see projects they own.
If you try to read or mutate a project that’s out of scope, you’ll get 404 not_found — we don’t leak existence across tenants.

Credits

Report generation charges Word credits on the org’s balance, identical to a POST /word/generate call. The per-file extraction and reconciliation stages run on the internal pipeline and are logged through the same LLM usage tracking the web app uses, so org-level usage rolls up automatically. Every attach call and every GET is free.