RESEARCH

Active work in multilingual archives, source-grounded retrieval, and AI infrastructure security.

Archival Methods in Practice

Rasin is where much of the archival method is being built in public: cross-lingual search, citation verification, entity graphs, claim provenance, evaluation, and review workflows. As the nonprofit structure is planned, studio1804 remains the technical home for the work and continues to publish reusable methods where they are ready.

Demonstrated Contributions

01

Cross-Lingual Search

A multilingual archive system should let a question in one language surface relevant passages in another. Rasin is the working implementation: French, Haitian Kreyòl, English, and Spanish search across 401,608 indexed text chunks from 104 source collections.

BGE-M3 · Hybrid retrieval · Qdrant · Full-text search · Query expansion

Open Questions
  • Benchmark expansion for Haitian Kreyòl retrieval quality beyond the current 59-query golden set
  • Cross-lingual query performance when query and document languages differ
02

Citation Verification for AI Answers

Citations should be evidence, not decoration. Generated answers are mapped back to retrieved passages, and a separate verification step checks whether each cited passage supports the claim it is attached to.

Citation extraction · NLI entailment · Quote matching · Confidence labels

Open Questions
  • Answer-faithfulness scoring across factual, synthesis, and adversarial queries
  • How to expose low-confidence or unsupported claims without overwhelming researchers
03

Knowledge Graph and Entity Resolution

Historical archives need entities that survive spelling variation, language shifts, and fragmented source records. The graph treats people, places, events, concepts, sources, and claims as connected research objects.

GLiNER · Neo4j · Entity resolution · Authority IDs · Relationship traversal

Open Questions
  • Named expert review for priority-domain entities and relationships
  • Formal precision and recall evaluation by entity type and source category
04

OCR and Source Processing for Historical Documents

Archive search begins before retrieval: documents have to be collected, read, structured, and described. The pipeline processes newspapers, legal codes, primary documents, maps, and scholarly materials while tracking OCR quality and source metadata.

GPU OCR · Source registry · Page versioning · OCR profiles · Structured artifacts

Open Questions
  • Character and word error benchmarks by century, language, and document condition
  • How to represent OCR uncertainty directly in source metadata and user-facing citations

Active Directions

05In Progress

Claim Provenance for Priority Domains

The research agenda moves beyond cited passages toward structured claim provenance: subject, predicate, object, source, confidence, contestation, language of record, and scope. The method is corpus-wide; early depth starts where Rasin already has the strongest source coverage.

Claim schema · Confidence levels · Contested-by links · Priority domains

06In Progress

Perspective-Aware Retrieval

Colonial archives over-represent the people and institutions that produced records. The reusable method is source authority: authorial perspective, quality tiers, and warning surfaces that show when results skew toward one kind of voice.

Source authority · Perspective labels · Bias warnings · Quality tiers

07In Progress

Scholar-Ready Evaluation and Export

If archive AI is going to be used by scholars, it needs reviewable outputs. The next layer is golden-set evaluation, stable permalinks, citation export, public correction logs, and institutional review packets.

Golden sets · Chicago / BibTeX / RIS · Stable permalinks · Correction logs

Rasin — Evaluation & Review

How we know the system is right.

Technical evaluation

Golden-set retrieval scoring, citation entailment checks, OCR quality tracking. Current R@10 is 0.75 on a 59-query golden evaluation set.

Scholarly review

Source-domain experts review claim-provenance work and contested entries before they enter the public index.

Community correction

A public path for source institutions, scholars, and Haitian-American readers to flag errors, contest entries, and request additions, with the correction history kept in the open.

Lab Research in Formation

Some work lives at studio1804 before it has a project home. The current published example is infrastructure security for autonomous AI systems, separate from Rasin's humanities mission.

08

Security of Autonomous AI Infrastructure

Attack surfaces at the agent-tool boundary when autonomous systems invoke external tools and services. The Capability-Container Pattern proposes infrastructure-level isolation: agents never directly access tools — invocations flow through a mediation gateway into containers with minimal provisioned capabilities.

Capability containment · Mediation gateway · MCP · Infrastructure isolation

Published

The Capability-Container Pattern (Zenodo, 2026). DOI: 10.5281/zenodo.18614503