تـخريــــــط الـشوّامــــ mapping al-shawwām

At a Glance

This pipeline takes scattered historical materials—scans, PDFs, gazetteers, court records, tribal dictionaries—and turns them into structured, time-aware geographic data. Each phase below corresponds to a real layer in the stack: inputs, extraction, structuring and validation, database schemas, orchestration, and the interactive app. Click any phase or component to see what it does and how it connects to the others.

Note: Multiple historical sources feed into crosslinked schemas, enabling temporally-aware feature orchestration and interactive exploration of contested geographic imaginaries.

What We Ingest

  • Gazetteers — Lists of historical place names with coordinates, administrative hierarchies, and transliteration variants.
  • Sources & Surnames — Structured exports distilled from historical documents (bibliography, tribal data, place references) with stable IDs.
  • Boundaries (Shapefiles) — GeoJSON boundary files for provinces, districts, and other units across different time periods.

Step by Step

  1. Image — Scans or photographs of manuscripts, books, registers, or maps.
  2. OCR/HTR — Software turns images into raw text (OCR for print; HTR for handwriting), with LLM vision assisting on complex layouts.
  3. Cleanup — We remove scanning noise, fix line breaks, normalize characters (including Arabic script), and keep original spellings where relevant.
  4. Structuring — The cleaned text is organized into structured arrays of objects (places, people, citations, dates, relations) using NER and LLM-based extraction.
  5. ETL & Validation — Structured objects are mapped into the core schemas (SOURCE, SURNAME, LOCATIONS, BOUNDARIES), checked for consistency, enriched with coordinates and temporal tags, and flagged if they need human review.
  6. SQL Database — Validated records are loaded into a secure, spatially-enabled, read-only database with cross-linked tables and indexes for search and mapping.
  7. App API — A safe, read-only gateway exposes curated views and domain objects (e.g., "surname with journeys", "place with sources") to the website.
  8. App — The site displays maps, timelines, journeys, and search results built from those curated datasets.

Time-Aware Boundaries

Administrative borders shift over time. To avoid anachronism, our boundary files are temporally contingent:

  • Periodized layers — Boundaries are tagged with a year or date range (e.g., 1890–1918) and selected based on the time in view.
  • Multiple sources — We compare historical atlases, official gazetteers, and archival maps; disagreements are flagged for review.
  • Best-available fit — When exact dates are uncertain, we choose the most defensible time slice and mark it as such.
  • User experience — When you move through time on the map, boundaries and labels update to match the chosen period.

Quality & Trust

  • Provenance preserved — Records keep citations, notes, and (when relevant) confidence scores.
  • Coordinate checks — If a place cannot be precisely located, the system records the issue for human review.
  • Read-only by design — Public pages cannot alter the database; changes happen through controlled updates.