Urban Semantic Search: Difference between revisions

Revision as of 18:17, 11 December 2025

Deliverables

This project delivers not only a website but also a reusable digital humanities research infrastructure, encompassing toolchains, platforms, and data assets.

A Data Pipeline

The pipeline is a fully automated, modular ETL workflow that processes raw scans (TIFF/CSV) into vector indices, covering slicing, feature extraction, spatial registration, and database ingestion. Its open design allows researchers to seamlessly integrate new maps or archival records into the system without modifying the core code.

A Search Platform

A user-facing, browser-based visualization platform built with Next.js (frontend) and FastAPI (backend). It supports 4 hybrid search modes—text-to-image, image-to-text, image-to-image, text-to-text search and provides 3D dynamic heatmaps with historical map overlays, enabling near real-time cross-modal exploration.

A Semantic Dataset

A fully processed collection of digital assets, cleaned, registered, and vectorized. It contains thousands of historical map patches and text entries represented as embedding vectors. By precomputing the most time-consuming feature engineering steps, the dataset allows researchers to immediately perform clustering analyses, study urban morphology evolution, or train downstream models.

@@ Line 3: / Line 3: @@
 This project delivers not only a website but also a reusable digital humanities research infrastructure, encompassing toolchains, platforms, and data assets.
-=== 1. A Data Pipeline ===
+=== A Data Pipeline ===
 The pipeline is a fully automated, modular ETL workflow that processes raw scans (TIFF/CSV) into vector indices, covering slicing, feature extraction, spatial registration, and database ingestion. Its open design allows researchers to seamlessly integrate new maps or archival records into the system without modifying the core code.
-=== 2. A Search Platform ===
+=== A Search Platform ===
 A user-facing, browser-based visualization platform built with Next.js (frontend) and FastAPI (backend). It supports 4 hybrid search modes—text-to-image, image-to-text, image-to-image, text-to-text search and provides 3D dynamic heatmaps with historical map overlays, enabling near real-time cross-modal exploration.
-=== 3. A Semantic Dataset ===
+=== A Semantic Dataset ===
 A fully processed collection of digital assets, cleaned, registered, and vectorized. It contains thousands of historical map patches and text entries represented as embedding vectors. By precomputing the most time-consuming feature engineering steps, the dataset allows researchers to immediately perform clustering analyses, study urban morphology evolution, or train downstream models.