
rasin
/ra.zɛ̃/ Kreyòl n.
Root. Origin. Foundation.
Explore Haiti's history through the sources themselves. Search across languages, follow the evidence, and read the original page behind each answer.
ArevolutionthatdefeatedNapoleon,endedslavery,anddoubledthesizeoftheUnitedStatesalmostvanishedfromthehistoricalrecord.Thedocumentssurvived—scatteredacrossarchivesinFrance,theUnitedStates,andtheCaribbean.Butfortwocenturies,nothingconnectedthem. Thisisthestoryofwhytheywereseparated,andwhatittooktomakethemspeaktoeachotheragain.



Preserved, but
never connected.
They won. And then the world pretended it hadn't happened. The Haitian historian Michel-Rolph Trouillot called it “an unthinkable history.” The entire intellectual framework of the Enlightenment — the system that produced the Declaration of the Rights of Man — ranked humanity on a ladder with Europeans at the top and Africans at the bottom. Enslaved people defeating Napoleon's army didn't just challenge a political order. It broke the categories through which the Western world understood who could be a political actor, who could wage war, who could govern. For thirteen years, every Western power cycled through denial, minimization, and prediction of collapse — until there was nothing left to deny. Hobsbawm's The Age of Revolutions, 1789–1848 gives it barely a mention. The Penguin Dictionary of Modern History doesn't include an entry for Haiti at all.
The records survived — vast, public, and scattered across institutions on three continents. But most of it is photographs — scanned pages, microfilm transfers, ink fading on paper that spent two centuries in tropical humidity. Each collection lives on its own site, in its own language, behind its own interface. A plantation listed in a French indemnity claim might also appear in a fugitive advertisement and a gazette decree about the same parish — but nothing ties them together. The documents survived. The structure connecting them never existed.
Trouillot wrote that silences enter history at four moments — when sources are made, when archives are assembled, when narratives are constructed, when significance is assigned. The sources survived. The archives were assembled. What was missing was a layer that made them legible, searchable, and connected across institutions and languages. Rasin builds that layer — reading pages, unifying 104 source collections into a single index, and matching meaning across languages.

Seven stages from scan to citation.
Every answer on Rasin traces back through a reproducible pipeline. Each stage is documented at rasin.ai/methodology.
Collect
Custom scrapers pull from Gallica, LoC, DLOC, Internet Archive
Read
OCR turns scanned documents into searchable source text
Chunk
Semantic segmentation preserves context and readability
Embed
1024-dim BGE-M3 vectors, including multi-granularity indexes
Extract
Entity extraction identifies people, places, events, dates across sources
Connect
Knowledge graph links people, events, places, concepts, and sources
Answer
Hybrid retrieval fused with citation and claim verification
The entire pipeline — inference, embeddings, vector search, knowledge graph — runs on portable, self-hostable infrastructure. Partner institutions can export source files, indexes, and embeddings and operate the system independently.
Collect
This is Article Premier of the 1805 Constitution of Haiti — the first national constitution to permanently abolish slavery. It sat in the Bibliothèque nationale de France for two centuries, digitized but buried in a catalog of millions.
Every archive has its own API, its own rate limits, its own format. Gallica's IIIF endpoint allows five requests per minute at full resolution — with circuit breakers and 90-second backoffs when it pushes back. Collection-specific ingestion jobs handle the differences, each tracking provenance back to the original institution.
The corpus spans scanned pages, metadata, source text, embeddings, and provenance records. A PostgreSQL queue coordinates parallel downloads with resource-aware batching — heavy sources like Gallica run two workers; lighter APIs run five. Every download is resumable. The collection phase alone took weeks.
Python · httpx · Playwright · circuit breakers · PostgreSQL queue · source-specific ingest


Le peuple habitant l'isle ci-devant appelée St. Domingue, convient ici de se former en état libre, souverain et indépendant de toute autre puissance de l'univers, sous le nom d'Empire d'Hayti.
Read
A search engine cannot read a photograph. This page is a scan — aging paper, faded ink, eighteenth-century typefaces that modern software wasn't built for. Until it's converted to text, it's invisible to any search.
Before OCR runs, every image passes through a preprocessing pipeline — deskewing rotated scans, denoising damaged pages, enhancing contrast on faded ink, sharpening text edges blurred by two centuries of tropical storage. Then docTR reads what's left, tracking per-word confidence so pages that fail can be retried automatically.
Multiple GPU workers process the corpus in parallel, coordinated through PostgreSQL row locks — no external queue, no Redis. If a worker hits an out-of-memory error, it halves its batch size and retries. The system recovers without human intervention.
docTR · PyMuPDF · adaptive batching · PostgreSQL coordination · CUDA + MPS
Chunk
A 200-page constitution can't be searched as a single block. The chunker splits text at semantic boundaries — paragraph breaks first, then sentence boundaries, then hard token limits as a fallback. Each passage lands between 512 and 1,024 tokens, with 128 tokens of overlap so context is never lost at a split.
Every passage carries its provenance: which document it came from, which page, which section. When an answer cites a passage, the chain traces all the way back to a specific page in a specific archive. The citation chain starts here.
tiktoken · semantic boundary detection · 128-token overlap · provenance metadata
Art. 1 — Le peuple habitant l'isle ci-devant appelée St. Domingue, convient ici de se former en état libre, souverain et indépendant...
Art. 2 — L'esclavage est à jamais aboli.
Art. 12 — Aucun blanc, quelle que soit sa nation, ne mettra le pied sur ce territoire, à titre de maître ou de propriétaire...
Art. 14 — Toute acception de couleur parmi les enfans d'une seule et même famille, dont le chef de l'État est le père, devant...
Embed
Each passage is converted into a 1024-dimensional representation of its meaning — not its words. A Kreyòl question about abolition and this French decree land in the same region of vector space, even though they share zero vocabulary. Queries are asymmetrically prefixed so the model distinguishes questions from documents, and every vector is stored twice — in Qdrant for low-latency search, in PostgreSQL for durability and crash recovery.
L'esclavage est à jamais aboli.
Ki konstitisyon ki te aboli esklavaj pou tout tan?
Zero shared words. Same meaning. Same vector space.
The language of the question should never limit
the reach of the answer.
Kreyòl is a first-class search language in Rasin today. Native answer generation in Kreyòl is the next milestone — via fine-tuning.
BGE-M3 · 1024-dim · Qdrant HNSW · PostgreSQL backup · multilingual BM25
Extract
The constitution names the men who signed it — Christophe, Pétion, Clervaux, Geffrard, Gabart. GLiNER, a zero-shot NER model, reads every passage in the corpus and identifies every person, place, organization, event, date, document, and ship it mentions — seven entity types chosen to stay neutral rather than impose interpretive categories on historical figures.
Names that appear differently across centuries and languages — “Toussaint Louverture” and “Toussaint L'Ouverture” — are resolved to a single canonical identity. The model processes roughly a thousand documents per minute on CPU alone.
GLiNER2 zero-shot · 7 entity types · ~1,000 docs/min · deduplication + resolution
Nous H. Christophe, Clervaux, Vernet, Gabart, Pétion, Geffrard, Toussaint Brave... en notre nom particulier, qu'en celui du peuple d'Hayti...
Connect
Christophe signed this constitution. He also appears in an American diplomatic dispatch, a Moniteur decree from his own kingdom, and two nineteenth-century histories by Ardouin and Madiou. Five archives that never referenced each other — now linked through one person.
A second stage uses Qwen3 via structured output to extract relationships between entities — who participated in which event, who was located where, who authored which document. The current graph holds 797 curated nodes and 28,722 relationships, each tied back to the source text that evidences it.
At search time, the graph doesn't just find documents that match your query — it expands it. Search for “Vodou” and the graph injects related terms like “voduisant” and “Legba” into the text search, surfacing passages that no keyword match alone would find.
Neo4j · Qwen3 + Instructor · 797 curated nodes · 28,722 relationships
Six steps turn a photograph of a deteriorating page into a node in a multilingual knowledge system. The seventh is where it matters — when someone asks a question.
Answer
The Bois Caïman ceremony of August 1791 launched the Haitian Revolution. A Vodou priest named Boukman led the gathering that would ignite thirteen years of war and end with the founding of a nation. Try searching for it.
That question is in Kreyòl. The documents that answer it are in French and English — Ardouin's nineteenth-century history describing the ceremony on the Lenormand de Mézy plantation, a Vodou ethnography recording oral traditions about that night, and C.L.R. James analyzing its significance two centuries later. They sit in different archives, catalogued under different systems. No keyword search connects them.
Description of the ceremony at Lenormand de Mézy plantation, August 1791
Oral tradition recording of the Bois Caïman gathering and Boukman's invocation
Analysis of Bois Caïman as the catalyst for the general insurrection of August 22
The query is first translated into all four corpus languages by an LLM, then embedded. Vector search and keyword search run in parallel — results merged through reciprocal rank fusion with source diversity caps so no single archive dominates the results. A cross-encoder reranks the top candidates.
Before the answer reaches you, a DeBERTa NLI model checks entailment between every claim and its cited passage. If a citation contradicts or doesn't support its claim, it's flagged. Quote verification confirms that any direct quotes actually appear in the source text. Evidence, not guesses.
Nemotron-3-Nano via TRT-LLM · RRF fusion (k=20) · BGE reranker · DeBERTa NLI · quote verification
Every answer traces back to a specific passage in a specific document. The evidence speaks for itself.
What the pipeline reads.
104 indexed source collections spanning three centuries and four languages. The full catalog is browsable at rasin.ai/sources.
Archives & Digital Collections
Gallica, Library of Congress, DLOC, Internet Archive
31+ BnF documents (1492–1850), 9,000+ DLOC newspaper issues, 552 Island Luminous pages in 3 languages
Databases & Structured Records
SlaveVoyages, CNRS Indemnités, Marronnage.info
100K+ enslaved-person records across marronnage ads, indemnity claims, and SlaveVoyages; 3,581 Saint-Domingue voyages
Periodicals & Newspapers
Le Moniteur Haïtien, L'Abeille Haytienne, La Gazette Royale
9,411 Moniteur issues (1845–1983), 5,076 Moniteur Universel issues (1789–1810)
Primary Sources
Founders Online, Boisrond-Tonnerre, US Senate hearings
1,152 Founders Online documents, 9 Senate hearing transcripts, Kreyòl proclamation of 1793
Legal Documents
Constitutions, legal codes, Linstant de Pradine
9 Haitian legal codes, 4 Linstant de Pradine volumes (1804–1876), constitutions from 1801–1889
Scholarship & Analysis
80+ monographs, Human Rights Watch, Frederick Douglass
Saint-Rémy, Bellegarde, Ardouin, Madiou, C.L.R. James, 18 HRW reports (1993–2025)



104 collections. 404K indexed chunks. Four languages. All of it runs on one machine.
Resource-efficient
by design.
A system that connects 401,608 indexed text chunks across 104 source collections, handles 265,217 OCR pages in four languages, and runs multilingual AI search — built to be portable, exportable, and operable by the institutions it serves.
Measured, not
promised.
Every claim about retrieval quality is backed by a golden test set — 59 hand-curated queries spanning factual lookups, entity searches, cross-lingual questions, and multi-source synthesis. The numbers below are from the latest stable evaluation run.
In plain terms: 75% of the time, when a researcher poses a query, Rasin surfaces the most relevant historical passage in the top 10 results. The current run was completed on April 1, 2026.
The evaluation set covers factual lookups, entity searches, cross-lingual questions, and multi-source synthesis.
Researchers
Historians, graduate students, and digital humanists studying the Haitian Revolution, the Atlantic world, or the history of slavery. Every answer is cited back to a specific passage in a specific archive — cross-referenced evidence, not summaries.
Haitian diaspora readers
Diaspora families tracing ancestry, cultural organizations preserving heritage, and anyone who wants to search Haitian history in its own languages. Kreyòl is a first-class search language — not an afterthought.
Educators
Teachers and professors building courses on Haitian history, Caribbean studies, or the Age of Revolutions. A single query surfaces primary sources from multiple archives — the kind of cross-referencing that used to take a semester of research.

Haiti is where
it starts.
The archival problems Rasin confronts in Haiti — scattered collections, colonial languages, institutional walls — appear across many histories shaped by slavery, colonialism, and diaspora. The pipeline Rasin builds for Haiti is not specific to Haiti.
Jamaica, Martinique, Guadeloupe, Cuba — colonial archives in English, French, Spanish, and Dutch that have never been cross-referenced.
Brazil's slavery archives, Mexico's Afro-descendant communities, the plantation records of the Spanish colonies — millions of pages in institutional silence.
SlaveVoyages documents 36,000+ transatlantic crossings — Rasin already indexes 3,581 Saint-Domingue voyage records. The same pipeline can connect every port record to the people who were taken.
Pan-African movements from Accra to Harlem, Négritude in Paris, the Windrush generation in London — scattered across archives on four continents.
The roots of one history are tangled with the roots of many others. The same pipeline that connects documents across Haitian archives can connect them across any archive, any language, any continent.
Every citation traces back to a specific page in a specific archive. Every connection between documents was earned — scraped, read, chunked, embedded, extracted, linked, verified. A question can now begin the cross-referencing: across collections, languages, and pages that used to sit apart.

