Urban Semantic Search: Difference between revisions

Revision as of 18:25, 11 December 2025

Introduction

This project is designed to address the dual challenges of dispersed data and semantic gaps in historical research.

Traditional urban history studies often struggle with "data silos": maps (visual) and archival records (text) are not only scattered across different systems and time periods, but also lack machine-readable connections. This project implements a multimodal, parallel processing architecture that integrates 17th–19th century Venetian historical maps and cadastral records into a unified vector index, enabling retrieval across visual (texture features), textual (cross-lingual semantics), and spatial (shared coordinate system) dimensions.

The project delivers more than just a search tool, it provides a sustainable and extensible data pipeline. Researchers can use standardized scripts to continuously incorporate newly discovered historical data into the system, progressively expanding the knowledge base.

Motivation

Project Plan and Milestones

Deliverables

This project delivers not only a website but also a reusable digital humanities research infrastructure, encompassing toolchains, platforms, and data assets.

A Data Pipeline

The pipeline is a fully automated, modular ETL workflow that processes raw scans (TIFF/CSV) into vector indices, covering slicing, feature extraction, spatial registration, and database ingestion. Its open design allows researchers to seamlessly integrate new maps or archival records into the system without modifying the core code.

A Search Platform

A user-facing, browser-based visualization platform built with Next.js (frontend) and FastAPI (backend). It supports 4 hybrid search modes—text-to-image, image-to-text, image-to-image, text-to-text search and provides 3D dynamic heatmaps with historical map overlays, enabling near real-time cross-modal exploration.

A Semantic Dataset

A fully processed collection of digital assets, cleaned, registered, and vectorized. It contains thousands of historical map patches and text entries represented as embedding vectors. By precomputing the most time-consuming feature engineering steps, the dataset allows researchers to immediately perform clustering analyses, study urban morphology evolution, or train downstream models.