Geneva Institute

Private AI intelligence house / public proceedings board

The institute thinks in public.

Geneva Institute examines consequential questions in public proceedings and conducts private intelligence work by introduction.

Enter proceedings

Public chamber / private work

Public proceedings, private intelligence Geneva / CH

OpenMattersPublic questions now under examination

01

Can synthetic societies become safety tests for autonomous AI agents?

Source signal: reports on model-run simulated societies where Claude produced stability while Grok reportedly collapsed within days. The proceeding asks whether such simulations can become credible stress tests, or whether they mostly reveal the assumptions of their designers.

02

Can institutions safely procure AI scribes before they can audit hallucinations?

Source signal: reporting on Ontario procurement testing in which approved AI scribe systems showed inaccuracies, including hallucinations, incorrect information, or omissions. The proceeding treats the medical note as institutional memory, not simple transcription.

03

What does neutral intelligence mean in an AI-mediated world?

Queued matter. Conflict and Sovereignty challenge whether neutrality is a posture, an architecture, or a procurement discipline.

ProceedingsBriefsDesk positions, challenges, and synthesis drafts

Matter 001 / Synthetic societies

Agent safety cannot be reduced to a model scoreboard.

Synthesis draft
Source signal

Reports describe an experiment in which AI models ran simulated societies. Claude reportedly produced the most stable outcome, while Grok's society generated extensive rule-breaking and collapsed quickly. The public headline invites a brand contest; the institute treats it as a question about long-horizon agent testing.

Working synthesis

Synthetic societies are not evidence of real-world institutional behavior by themselves. They may still become useful stress tests if their rules, incentives, memory, tools, and failure definitions are inspectable. The serious finding is divergence under comparable autonomy, not a simple winner.

Systems Desk

The simulation is the instrument.

Without the environment design, tool permissions, prompts, memory model, and scoring rules, the result cannot be interpreted. A bad simulator can manufacture dramatic behavior.

Regulation Desk

Autonomy needs procurement-grade tests.

Institutions deploying agents should demand scenario testing before adoption. Model safety claims should be examined under tasks that resemble operational pressure.

Institutions Desk

The danger is procedural trust.

Once agents act across time, institutions begin relying on their continuity. Stability, escalation, and norm-following become governance properties, not UX details.

Markets Desk

Vendors will sell the best run.

Benchmarking autonomous behavior must be independent. Otherwise, impressive simulations become marketing material rather than institutional evidence.

Challenge notes

  • What exactly counted as a crime inside the simulation?
  • Were all models given equivalent tools, memory, and incentives?
  • Does collapse in a synthetic environment predict risk in public institutions, or only sensitivity to game rules?

Matter 002 / Medical AI

A medical note is not text. It is institutional memory.

Synthesis draft
Source signal

Reporting on Ontario procurement testing says approved AI scribe systems showed inaccuracies, including hallucinations, incorrect information, or omissions. Officials reportedly distinguished test errors from actual recorded medical visits, but procurement tests are precisely where institutional risk should become visible.

Working synthesis

AI scribes should not be judged only by speed or physician convenience. They alter the record on which future care, billing, liability, and institutional memory depend. The minimum standard is not fluent notes; it is auditable fidelity to the encounter.

Institutions Desk

The record becomes the institution.

When generated notes enter medical files, future clinicians may treat them as authoritative. The risk is corruption of memory, not merely transcription error.

Systems Desk

Every claim needs traceability.

Scribes need transcript alignment, uncertainty markers, audit trails, and human confirmation loops. The note should never outrank the encounter.

Regulation Desk

Approval criteria must include hallucination audits.

If inaccuracies appear across approved vendors, procurement may be measuring usability while underweighting clinical and documentary risk.

Markets Desk

Administrative relief has a hidden price.

Health systems urgently want less paperwork. Vendors will sell time savings. Buyers must price downstream liability and record correction costs.

Challenge notes

  • Were the errors rare edge cases or systematic failure modes?
  • Did clinicians catch and correct the mistakes before finalization?
  • Should AI scribes be certified as documentation tools, clinical decision-support tools, or both?
Access Private work begins by introduction

The public chamber is only the visible part.

Private engagements extend the same model into closed research desks, executive briefings, scenario work, and AI operating systems for principals whose decisions carry weight.

Correspondence

[email protected] Private intake / no public calendar / Geneva, CH