Active work in multilingual archives, source-grounded retrieval, and AI infrastructure security.
Archival Methods in Practice
Rasin is where much of the archival method is being built in public: cross-lingual search, citation verification, entity graphs, claim provenance, evaluation, and review workflows. As the nonprofit structure is planned, studio1804 remains the technical home for the work and continues to publish reusable methods where they are ready.
Demonstrated Contributions
Cross-Lingual Search
A multilingual archive system should let a question in one language surface relevant passages in another. Rasin is the working implementation: French, Haitian Kreyòl, English, and Spanish search across 401,608 indexed text chunks from 104 source collections.
BGE-M3 · Hybrid retrieval · Qdrant · Full-text search · Query expansion
- —Benchmark expansion for Haitian Kreyòl retrieval quality beyond the current 59-query golden set
- —Cross-lingual query performance when query and document languages differ
Citation Verification for AI Answers
Citations should be evidence, not decoration. Generated answers are mapped back to retrieved passages, and a separate verification step checks whether each cited passage supports the claim it is attached to.
Citation extraction · NLI entailment · Quote matching · Confidence labels
- —Answer-faithfulness scoring across factual, synthesis, and adversarial queries
- —How to expose low-confidence or unsupported claims without overwhelming researchers
Knowledge Graph and Entity Resolution
Historical archives need entities that survive spelling variation, language shifts, and fragmented source records. The graph treats people, places, events, concepts, sources, and claims as connected research objects.
GLiNER · Neo4j · Entity resolution · Authority IDs · Relationship traversal
- —Named expert review for priority-domain entities and relationships
- —Formal precision and recall evaluation by entity type and source category
OCR and Source Processing for Historical Documents
Archive search begins before retrieval: documents have to be collected, read, structured, and described. The pipeline processes newspapers, legal codes, primary documents, maps, and scholarly materials while tracking OCR quality and source metadata.
GPU OCR · Source registry · Page versioning · OCR profiles · Structured artifacts
- —Character and word error benchmarks by century, language, and document condition
- —How to represent OCR uncertainty directly in source metadata and user-facing citations
Active Directions
Claim Provenance for Priority Domains
The research agenda moves beyond cited passages toward structured claim provenance: subject, predicate, object, source, confidence, contestation, language of record, and scope. The method is corpus-wide; early depth starts where Rasin already has the strongest source coverage.
Claim schema · Confidence levels · Contested-by links · Priority domains
Perspective-Aware Retrieval
Colonial archives over-represent the people and institutions that produced records. The reusable method is source authority: authorial perspective, quality tiers, and warning surfaces that show when results skew toward one kind of voice.
Source authority · Perspective labels · Bias warnings · Quality tiers
Scholar-Ready Evaluation and Export
If archive AI is going to be used by scholars, it needs reviewable outputs. The next layer is golden-set evaluation, stable permalinks, citation export, public correction logs, and institutional review packets.
Golden sets · Chicago / BibTeX / RIS · Stable permalinks · Correction logs
Rasin — Evaluation & Review
How we know the system is right.
Golden-set retrieval scoring, citation entailment checks, OCR quality tracking. Current R@10 is 0.75 on a 59-query golden evaluation set.
Source-domain experts review claim-provenance work and contested entries before they enter the public index.
A public path for source institutions, scholars, and Haitian-American readers to flag errors, contest entries, and request additions, with the correction history kept in the open.
Lab Research in Formation
Some work lives at studio1804 before it has a project home. The current published example is infrastructure security for autonomous AI systems, separate from Rasin's humanities mission.
Security of Autonomous AI Infrastructure
Attack surfaces at the agent-tool boundary when autonomous systems invoke external tools and services. The Capability-Container Pattern proposes infrastructure-level isolation: agents never directly access tools — invocations flow through a mediation gateway into containers with minimal provisioned capabilities.
Capability containment · Mediation gateway · MCP · Infrastructure isolation
The Capability-Container Pattern (Zenodo, 2026). DOI: 10.5281/zenodo.18614503