Project Intelligence - Overten API

Projects are multi-file workspaces. Attach documents, spreadsheets, and slide decks to a project, and the backend runs the same analysis pipeline the Overten web app uses: fact extraction, reconciliation across files, and agentic verification of escalated findings. You poll the results via GET endpoints and (optionally) generate a .docx report that summarizes everything. There is no separate “start analysis” call. Uploading a file to a project triggers ingestion + analysis automatically.

What you get

Fact extraction per file — entities, numerical values, dates, named parties, claims.
Cross-file reconciliation — the pipeline compares new facts against the project’s existing ledger and surfaces discrepancies (conflicting values across files) and insights (notable single-file findings).
Verified decisions — high-confidence escalated items pass through an agentic verification stage with citations and recommendations.
A structured insights feed — one document per alert, filterable by severity, kind, source file, or resolution state.
Optional .docx report — a polished, human-readable rollup generated on demand via the same Word agent /word/generate uses.

Lifecycle

POST /projects — create a project (name, description, tags).
POST /projects/{id}/files — upload each file. Ingestion and analysis kick off automatically.
GET /projects/{id}/analysis/status — poll until the pipeline is completed.
GET /projects/{id}/insights — read the results. Filter as needed.
PATCH /projects/{id}/insights/{insight_id} — mark items resolved.
POST /projects/{id}/reports — (optional) generate a .docx insights report.

Quickstart

import requests, time

API = "https://backend.overtenai.com/api/v1"
KEY = "sk_live_..."
H = {"X-API-Key": KEY}

# 1. Create a project
project = requests.post(
    f"{API}/projects",
    headers=H,
    json={"name": "Acme Q3 Due Diligence", "tags": ["dd", "acme"]},
).json()
pid = project["project_id"]

# 2. Attach files. Every upload triggers ingestion + analysis.
for path in ["report.pdf", "financials.xlsx", "contract.docx"]:
    with open(path, "rb") as f:
        requests.post(
            f"{API}/projects/{pid}/files",
            headers=H,
            files={"file": f},
            data={"role": "primary"},
        ).raise_for_status()

# 3. Poll status
while True:
    s = requests.get(f"{API}/projects/{pid}/analysis/status", headers=H).json()
    print(f"phase={s['pipeline']['phase']} progress={s['progress']}%")
    if s["pipeline"]["status"] == "completed":
        break
    time.sleep(10)

# 4. Read insights
insights = requests.get(
    f"{API}/projects/{pid}/insights",
    headers=H,
    params={"severity": "critical,high", "limit": 50},
).json()
for item in insights["items"]:
    print(item["alert"]["description"])

# 5. Generate a .docx report
report = requests.post(
    f"{API}/projects/{pid}/reports",
    headers=H,
    json={"title": "Acme Due Diligence — Key Findings"},
).json()
print(report["download_url"])

Files and roles

POST /projects/{id}/files takes one file per call (multipart):

Form field	Required	Values
`file`	yes	the binary payload
`role`	no (default `context`)	`primary`, `context`, or `reference`
`category`	no	one of `document`, `spreadsheet`, `presentation`, `image`, `cv_resume`, `contract`, `report`, `research`, `other`
`description`	no	up to 1000 chars
`tags`	no	JSON array string, e.g. `'["audit", "q3"]'`

The role is a hint to the analysis pipeline: primary files carry the authoritative claims; context adds supporting material; reference is for appendices / lookups. Allowed MIME types: PDF, DOCX, XLSX, PPTX, CSV, TSV, TXT, JSON, XML, legacy Office formats, and common image types (PNG, JPEG, WebP).

Polling status

r = requests.get(f"{API}/projects/{pid}/analysis/status", headers=H).json()

Response shape:

{
  "project_id": "proj_abc...",
  "pipeline": {
    "status": "running",
    "phase": "reconciling_facts",
    "analysis_id": "an_...",
    "updated_at": "2026-04-19T18:24:00+00:00",
    "reports_created": 14,
    "disparities_found": 3,
    "insights_found": 11
  },
  "progress": 62,
  "files": {
    "total": 3,
    "indexed": 3,
    "analyzed": 2,
    "failed": 0,
    "in_progress": 1,
    "phase_counts": {"verifying_decisions": 1}
  },
  "insights": {
    "total": 14,
    "by_severity": {"critical": 1, "high": 4, "medium": 7, "low": 2},
    "by_kind": {"disparity": 3, "insight": 11},
    "resolved": 0
  }
}

Phases

The pipeline moves through these phases (pipeline-level, surfaced in pipeline.phase):

extracting_facts — per-file fact extraction (LLM + NER).
reconciling_facts — comparing against the project’s fact ledger.
verifying_decisions — agentic verification of escalated items.
completed — terminal success.
failed — terminal failure (see pipeline.error_message).

Per-file phases in the same set appear in files.phase_counts.

Filtering insights

Query parameters on GET /projects/{id}/insights:

Param	Type	Notes
`limit`	int	1–200, default 50
`cursor`	string	opaque, returned as `next_cursor`
`severity`	string	comma list: `critical,high,medium,low`
`alert_kind`	string	comma list: `disparity,insight,similarity`
`file_id`	string	comma list of source file_ids to restrict to
`is_resolved`	bool	`true` or `false`

Example: only unresolved critical or high findings —

GET /projects/{id}/insights?severity=critical,high&is_resolved=false

Insight shape

{
  "insight_id": "ins_...",
  "alert_kind": "disparity",
  "is_resolved": false,
  "provisional": false,
  "created_at": "2026-04-19T18:20:00+00:00",
  "alert": {
    "description": "Quarterly revenue differs across two sources.",
    "type": "numerical_disparity",
    "field": "q3_revenue",
    "severity": "critical",
    "current_value": "12,400,000",
    "conflicting_value": "11,800,000",
    "explanation": "...",
    "citations": [...],
    "recommendations": ["Reconcile with finance before publishing."]
  },
  "source": {
    "file_id": "file_...",
    "file_name": "financials.xlsx",
    "sheet_name": "Summary",
    "page_number": null,
    "slide_number": null,
    "chunk_id": "..."
  }
}

Marking insights resolved

requests.patch(
    f"{API}/projects/{pid}/insights/{insight_id}",
    headers=H,
    json={"is_resolved": true, "resolution_note": "Confirmed by finance."},
)

Resolved state is shared with the Overten web app — if someone resolves an insight in the UI, it’s resolved for your API calls too, and vice versa.

Generating a report

POST /projects/{id}/reports produces a .docx built from the project’s insights, using the same Word agent /word/generate uses. Default behaviour: run sync, return the download URL inline.

report = requests.post(
    f"{API}/projects/{pid}/reports",
    headers=H,
    json={
        "title": "Acme Q3 Due Diligence — Findings",
        "include_severity": ["critical", "high", "medium"],
        "include_resolved": false,
    },
).json()
# → {"run_id": "...", "status": "completed", "download_url": "...", "file_name": "insights_report.docx", ...}

Request body fields:

Field	Default	Notes
`format`	`"docx"`	Only `.docx` is supported today.
`title`	`"<project name> — Insights Report"`	Overrides the header shown in the doc.
`include_severity`	all	Array of `critical\|high\|medium\|low`.
`include_alert_kind`	all	Array of `disparity\|insight\|similarity`.
`include_resolved`	`false`	Whether to include items already marked resolved.
`file_ids`	all	Restrict to insights whose source file is in this list.
`async`	`false`	Enqueue and return `202` instead of blocking.
`webhook_url`	—	Called on completion when `async: true`.

The response shape matches POST /word/generate — same run_id, download_url, edit_url, summary, credits_used, etc. — so you can reuse your existing Word-generation handling. 409 no_insights is returned if the filter produces zero matching insights.

Listing past reports

requests.get(f"{API}/projects/{pid}/reports", headers=H).json()

Returns reports previously generated for this project, newest first, paginated via next_cursor.

Fetching a specific report

requests.get(f"{API}/projects/{pid}/reports/{run_id}", headers=H).json()

Always returns a fresh 24-hour signed download_url, so you don’t need to store it — re-fetch when the URL expires.

Archiving

DELETE /projects/{id} soft-archives the project. Files and insights are preserved; the project just stops appearing in default listings. Pass ?include_archived=true on GET /projects to see archived ones.

Scope and visibility

The same scope rules as the rest of the API apply:

Personal keys see projects whose owner matches the key’s user.
Workspace keys with canViewAllFiles see every project in the org. Without that permission, members only see projects they own.

If you try to read or mutate a project that’s out of scope, you’ll get 404 not_found — we don’t leak existence across tenants.

Credits

Report generation charges Word credits on the org’s balance, identical to a POST /word/generate call. The per-file extraction and reconciliation stages run on the internal pipeline and are logged through the same LLM usage tracking the web app uses, so org-level usage rolls up automatically. Every attach call and every GET is free.

​What you get

​Lifecycle

​Quickstart

​Files and roles

​Polling status

​Phases

​Filtering insights

​Insight shape

​Marking insights resolved

​Generating a report

​Listing past reports

​Fetching a specific report

​Archiving

​Scope and visibility

​Credits