تـخريــــــط الـشوّامــــ mapping al-shawwām

At a Glance

The pipeline is part of the archive, not separate from it. Scans, court registers, gazetteers, genealogical compendia, memoirs, maps, and administrative lists do not arrive on the map as facts. They move through transcription, normalization, extraction, validation, geocoding, and temporal qualification, carrying the mediations that produced them: the registrar's hand, the cartographer's projection, the route a name traveled before it reached a page.

The pipeline does not resolve those mediations into a single authoritative layer. It holds open the fracture between attestation and claim, between place and coordinate, between the geography a source assumes and the geography the interface can show. What appears on the map is the visible end of that refusal: not data lifted out of the past, but historical claims in circulation, kept legible as they pass through archive, database, and interface.

Each phase below names a stage in this chain. Click any phase or component to see the conventions and judgments it carries.

Note: The diagram is a guide, not a flowchart of facts. What moves between phases are claims about the past, each tied to the source that made them and the conventions that produced it.

What We Ingest

The pipeline does not treat its inputs as a uniform pool of data. It distinguishes between three kinds of material, because each kind enters the chain with different commitments attached.

Documentary traces. Court registers, gazetteers, census returns, genealogical compendia, memoirs, travel accounts, administrative correspondence, and historical maps. These arrive with their own conventions of authority — what counts as a place, what counts as a person, what counts as a boundary — and the pipeline preserves those conventions rather than translating them into a single house style.

Named entities and relations. People, surnames, lineages, places, administrative units, dates, journeys, and the citations that bind them. These are not extracted as isolated facts. They are extracted as claims about who was where, when, and on whose attestation.

Spatial frameworks. Coordinates, place hierarchies, polity systems, boundary files, and temporal validity ranges. These are the scaffolds against which the other two kinds of material become legible on a map. They are also the scaffolds most likely to impose anachronism if used carelessly, and they are versioned and dated for that reason.

Step by Step

  1. Trace — A scan, page, map, table, register, or bibliographic entry enters the system with source metadata attached. The first judgment is not what the material “means,” but what kind of source-form it is and where its authority comes from.
  2. Transcription — OCR, HTR, or LLM-assisted vision converts images into text. Noise, layout, damaged text, marginalia, and uncertain readings are not treated as invisible; they remain part of the record’s condition.
  3. Normalization — Names, dates, scripts, and transliterations are cleaned enough to be searched and compared. Original forms remain attached where they carry historical, linguistic, or evidentiary significance.
  4. Extraction — NER and LLM-assisted extraction identify people, surnames, places, dates, affiliations, routes, citations, and administrative terms. These are extracted as claims made by a source, not as facts detached from it.
  5. Structuring — Extracted claims become schema-shaped records with stable identifiers. The schema keeps a person, the names attached to that person, a place, and the coordinates attached to that place as separate objects, so the relationships between them remain inspectable rather than fused.
  6. Validation — Records are checked against required fields, source links, coordinate plausibility, hierarchy alignment, and temporal ranges. Conflicts and gaps are flagged rather than silently repaired.
  7. Enrichment — Validated records are linked outward to coordinates, place hierarchies, polity systems, boundary layers, temporal metadata, and confidence notes. Enrichment makes comparison possible, but it does not erase the path by which the claim arrived.
  8. Publication — Curated, read-only views expose the database to the site through the API. The interface receives claims with their provenance and uncertainty still attached.
  9. Interface — Maps, timelines, panels, labels, and journeys make the claims visible. The goal is not to settle the geography, but to let users follow how places, names, and routes circulate across sources.

Time-Aware Boundaries

Boundaries on the platform are not timeless containers. A district line drawn under late Ottoman administration, a sub-district reorganized under Mandate rule, a frontier hardened after 1948, and a locally remembered region that crosses all three are not the same kind of object, and the platform does not pretend they are. Each boundary carries the date of its source, the administrative vocabulary that produced it, and the degree of fit between that vocabulary and the territory it claimed to describe.

Where boundaries overlap, the platform shows the overlap. Where a region in one source has no equivalent in another, the platform does not invent one. Where a boundary’s exact line is uncertain, that uncertainty is preserved rather than smoothed into a confident polygon. The geography of Bilad al-Sham is not the residue of one administrative regime; it is the layered, sometimes contradictory record of many, and the platform’s job is to keep those layers distinguishable.

Accountability, not certainty

The platform does not promise that its geographies are correct. It promises that the path by which a place became a coordinate remains attached to the coordinate, and that disagreement between sources is preserved rather than adjudicated. A village named differently in an Ottoman sijill, a Mandate gazetteer, and a post-1948 map is not three errors to reconcile; it is three attestations to hold side by side, each with its date, its source, and its administrative vocabulary intact.

Where a location is uncertain, the record says so. Where coordinates are weak, where a hierarchy is contested, where a name has variants the platform cannot rank, those gaps are marked rather than smoothed. Public access is read-only — not because the archive is finished, but because the work of revising it should leave a trace. The pipeline is built on the assumption that a geography held accountable to its sources is more useful and more honest than one that has been resolved.