Urban Semantic Search: Difference between revisions
(Created page with "== Deliverables == This project delivers not only a website but also a reusable digital humanities research infrastructure, encompassing toolchains, platforms, and data assets. === 1. A Data Pipeline === The pipeline is a fully automated, modular ETL workflow that processes raw scans (TIFF/CSV) into vector indices, covering slicing, feature extraction, spatial registration, and database ingestion. Its open design allows researchers to seamlessly integrate new maps or...") |
No edit summary |
||
| Line 3: | Line 3: | ||
This project delivers not only a website but also a reusable digital humanities research infrastructure, encompassing toolchains, platforms, and data assets. | This project delivers not only a website but also a reusable digital humanities research infrastructure, encompassing toolchains, platforms, and data assets. | ||
=== | === A Data Pipeline === | ||
The pipeline is a fully automated, modular ETL workflow that processes raw scans (TIFF/CSV) into vector indices, covering slicing, feature extraction, spatial registration, and database ingestion. Its open design allows researchers to seamlessly integrate new maps or archival records into the system without modifying the core code. | The pipeline is a fully automated, modular ETL workflow that processes raw scans (TIFF/CSV) into vector indices, covering slicing, feature extraction, spatial registration, and database ingestion. Its open design allows researchers to seamlessly integrate new maps or archival records into the system without modifying the core code. | ||
=== | === A Search Platform === | ||
A user-facing, browser-based visualization platform built with Next.js (frontend) and FastAPI (backend). It supports 4 hybrid search modes—text-to-image, image-to-text, image-to-image, text-to-text search and provides 3D dynamic heatmaps with historical map overlays, enabling near real-time cross-modal exploration. | A user-facing, browser-based visualization platform built with Next.js (frontend) and FastAPI (backend). It supports 4 hybrid search modes—text-to-image, image-to-text, image-to-image, text-to-text search and provides 3D dynamic heatmaps with historical map overlays, enabling near real-time cross-modal exploration. | ||
=== | === A Semantic Dataset === | ||
A fully processed collection of digital assets, cleaned, registered, and vectorized. It contains thousands of historical map patches and text entries represented as embedding vectors. By precomputing the most time-consuming feature engineering steps, the dataset allows researchers to immediately perform clustering analyses, study urban morphology evolution, or train downstream models. | A fully processed collection of digital assets, cleaned, registered, and vectorized. It contains thousands of historical map patches and text entries represented as embedding vectors. By precomputing the most time-consuming feature engineering steps, the dataset allows researchers to immediately perform clustering analyses, study urban morphology evolution, or train downstream models. | ||
Revision as of 18:17, 11 December 2025
Deliverables
This project delivers not only a website but also a reusable digital humanities research infrastructure, encompassing toolchains, platforms, and data assets.
A Data Pipeline
The pipeline is a fully automated, modular ETL workflow that processes raw scans (TIFF/CSV) into vector indices, covering slicing, feature extraction, spatial registration, and database ingestion. Its open design allows researchers to seamlessly integrate new maps or archival records into the system without modifying the core code.
A Search Platform
A user-facing, browser-based visualization platform built with Next.js (frontend) and FastAPI (backend). It supports 4 hybrid search modes—text-to-image, image-to-text, image-to-image, text-to-text search and provides 3D dynamic heatmaps with historical map overlays, enabling near real-time cross-modal exploration.
A Semantic Dataset
A fully processed collection of digital assets, cleaned, registered, and vectorized. It contains thousands of historical map patches and text entries represented as embedding vectors. By precomputing the most time-consuming feature engineering steps, the dataset allows researchers to immediately perform clustering analyses, study urban morphology evolution, or train downstream models.