Urban Semantic Search: Difference between revisions
| Line 2: | Line 2: | ||
This project is designed to address the dual challenges of dispersed data and semantic gaps in historical research. | This project is designed to address the dual challenges of dispersed data and semantic gaps in historical research. | ||
Traditional urban history studies often struggle with | Traditional urban history studies often struggle with "data silos": maps (visual) and archival records (text) are not only scattered across different systems and time periods, but also lack machine-readable connections. This project implements a multimodal, parallel processing architecture that integrates 17th–19th century Venetian historical maps and cadastral records into a unified vector index, enabling retrieval across visual (texture features), textual (cross-lingual semantics), and spatial (shared coordinate system) dimensions. | ||
The project delivers more than just a search tool, it provides a sustainable and extensible data pipeline. Researchers can use standardized scripts to continuously incorporate newly discovered historical data into the system, progressively expanding the knowledge base. | The project delivers more than just a search tool, it provides a sustainable and extensible data pipeline. Researchers can use standardized scripts to continuously incorporate newly discovered historical data into the system, progressively expanding the knowledge base. | ||
Revision as of 18:25, 11 December 2025
Introduction
This project is designed to address the dual challenges of dispersed data and semantic gaps in historical research.
Traditional urban history studies often struggle with "data silos": maps (visual) and archival records (text) are not only scattered across different systems and time periods, but also lack machine-readable connections. This project implements a multimodal, parallel processing architecture that integrates 17th–19th century Venetian historical maps and cadastral records into a unified vector index, enabling retrieval across visual (texture features), textual (cross-lingual semantics), and spatial (shared coordinate system) dimensions.
The project delivers more than just a search tool, it provides a sustainable and extensible data pipeline. Researchers can use standardized scripts to continuously incorporate newly discovered historical data into the system, progressively expanding the knowledge base.
Motivation
Project Plan and Milestones
Deliverables
This project delivers not only a website but also a reusable digital humanities research infrastructure, encompassing toolchains, platforms, and data assets.
A Data Pipeline
The pipeline is a fully automated, modular ETL workflow that processes raw scans (TIFF/CSV) into vector indices, covering slicing, feature extraction, spatial registration, and database ingestion. Its open design allows researchers to seamlessly integrate new maps or archival records into the system without modifying the core code.
A Search Platform
A user-facing, browser-based visualization platform built with Next.js (frontend) and FastAPI (backend). It supports 4 hybrid search modes—text-to-image, image-to-text, image-to-image, text-to-text search and provides 3D dynamic heatmaps with historical map overlays, enabling near real-time cross-modal exploration.
A Semantic Dataset
A fully processed collection of digital assets, cleaned, registered, and vectorized. It contains thousands of historical map patches and text entries represented as embedding vectors. By precomputing the most time-consuming feature engineering steps, the dataset allows researchers to immediately perform clustering analyses, study urban morphology evolution, or train downstream models.