Marino Sanudo's Diary: Difference between revisions
No edit summary |
No edit summary |
||
Line 67: | Line 67: | ||
=== Indices of places === | === Indices of places === | ||
Our primary focus was on the places mentioned in Venice. The index of places was significantly shorter, allowing us to analyze it manually. Each entry in the index included headings that often indicated a hierarchical relationship, suggesting that a location belonged to a broader area indicated by the preceding indentation. | Our primary focus was on the places mentioned in Venice. The index of places was significantly shorter, allowing us to analyze it manually. Each entry in the index included headings that often indicated a hierarchical relationship, suggesting that a location belonged to a broader area indicated by the preceding indentation. | ||
The generated dataset consists of the following features: | |||
id: A unique identifier assigned to each entry. | |||
place: The primary name of the location mentioned in the index. | |||
alias: Any alternative names or variations associated with the place. | |||
volume: The specific volume of the book in which the place is mentioned. | |||
column: The column number within the index where the place is listed. | |||
parents: The broader location or hierarchical category to which the place belongs, derived from the indentation structure in the index. | |||
This structured dataset captures both the explicit details from the index and the inferred hierarchical relationships, making it suitable for further analysis and exploration. | |||
=== Indices of names === | === Indices of names === | ||
The index of names proved to be significantly more challenging to analyze than the index of places, as it spanned approximately 80 pages. To tackle this, we first provided examples of the desired output and then used an Italian-trained OCR model to process the text and generate a preliminary table of names. This approach differed from traditional OCR methods, allowing for a more accurate extraction tailored to our project. | The index of names proved to be significantly more challenging to analyze than the index of places, as it spanned approximately 80 pages. To tackle this, we first provided examples of the desired output and then used an Italian-trained OCR model to process the text and generate a preliminary table of names. This approach differed from traditional OCR methods, allowing for a more accurate extraction tailored to our project. |
Revision as of 17:36, 14 December 2024
Introductions
The project focused on analyzing the diaries of Marino Sanudo, a key historical source for understanding the Renaissance period. The primary goal was to create an index of people and places mentioned in the diaries, pair these entities, and analyze the potential relationships between them.
Historical Context
Who Was Marino Sanudo? Marino Sanudo (1466–1536) was a Venetian historian, diarist, and politician whose extensive diaries, Diarii, provide a meticulous chronicle of daily life, politics, and events in Renaissance Venice. Sanudo devoted much of his life to recording the intricacies of Venetian society, governance, and international relations, making him one of the most significant chroniclers of his era.
The Importance of His Diaries Sanudo’s Diarii span nearly four decades, comprising 58 volumes of detailed observations. These writings offer invaluable insights into the political maneuvers of the Venetian Republic, social customs, and the geographical scope of Renaissance trade and diplomacy. His work captures not only significant historical events but also the daily rhythms of Venetian life, painting a vivid picture of one of the most influential states of the time.
Relevance Today Studying Marino Sanudo’s diaries remains highly relevant for modern historians, linguists, and data analysts. They provide a primary source for understanding Renaissance politics, diplomacy, and social hierarchies. Furthermore, the diaries’ exhaustive detail lends itself to contemporary methods of analysis, such as network mapping and data visualization, enabling new interpretations and uncovering hidden patterns in historical relationships. By examining the interconnectedness of individuals and places, Sanudo’s work sheds light on the broader dynamics of Renaissance Europe, offering lessons that resonate even in today’s globalized world.
Project Plan and Milestones
The project was organized on a weekly basis to ensure steady progress and a balanced workload. Each phase was carefully planned with clearly defined objectives and milestones, promoting effective collaboration and equitable division of tasks among team members.
The first milestone (07.10 - 13.10) involved deciding on the project's focus. After thorough discussions, we collectively chose to analyze Marino Sanudo’s diaries, given their historical significance and potential for data-driven exploration. This phase established a shared understanding of the project and laid the foundation for subsequent work.
The second milestone (14.10) focused on optimizing the extraction of indexes for names and places from the diaries. This required refining our methods for data extraction and ensuring accuracy in capturing and categorizing entities. Alongside this, we worked on identifying the geolocations of the places mentioned in the index, using historical and modern mapping tools to ensure precise identification.
The final stages of the project (19.12)marked the transition to analyzing relationships between the extracted names and places. Using the indexed data, we explored potential connections, identifying patterns and trends that revealed insights into Renaissance Venice's social, political, and geographic networks. . We collaboratively built a Wikipedia page to document our research and created a dedicated website to present our results in an accessible and visually engaging manner. This phase also included preparing for the final presentation, ensuring that every team member contributed to summarizing and showcasing the work.
By adhering to this structured approach and dividing tasks equitably, we achieved a comprehensive analysis of Sanudo’s diaries, combining historical research with modern digital tools to uncover new insights into his world.
Week | Task |
---|---|
07.10 - 13.10 | Define project and structure work |
14.10 - 20.10 |
Manually write a place index |
21.10 - 27.10 | Autumn vacation |
28.10 - 03.11 |
Work on the name dataset |
04.11 - 10.11 |
Finish the geolocation |
11.11 - 17.11 |
Midterm presentation on 14.11 |
18.11 - 24.11 |
Find naive relationship |
25.11 - 01.12 |
Standardization of the text |
02.12 - 08.12 |
Find relationship based on the distance |
09.12 - 15.12 |
Finish writing the wiki |
16.12 - 22.12 |
Deliver GitHub + wiki on 18.12 |
Methodology
Data preparation
In our project, which involved analyzing a specific book, the initial step was to obtain the text version of the book. After exploring several sources, including [1], we identified three potential websites for downloading the text. Ultimately, we selected the version available on [2] because it offered a more comprehensive set of tools for our analysis. We downloaded the text from this source and compared it to versions from Google Books and HathiTrust, confirming that it best suited our needs.
We then decided to focus our analysis on the indices included in each volume, which listed the names of people and places alongside the corresponding column numbers.
Indices of places
Our primary focus was on the places mentioned in Venice. The index of places was significantly shorter, allowing us to analyze it manually. Each entry in the index included headings that often indicated a hierarchical relationship, suggesting that a location belonged to a broader area indicated by the preceding indentation.
The generated dataset consists of the following features: id: A unique identifier assigned to each entry. place: The primary name of the location mentioned in the index. alias: Any alternative names or variations associated with the place. volume: The specific volume of the book in which the place is mentioned. column: The column number within the index where the place is listed. parents: The broader location or hierarchical category to which the place belongs, derived from the indentation structure in the index.
This structured dataset captures both the explicit details from the index and the inferred hierarchical relationships, making it suitable for further analysis and exploration.
Indices of names
The index of names proved to be significantly more challenging to analyze than the index of places, as it spanned approximately 80 pages. To tackle this, we first provided examples of the desired output and then used an Italian-trained OCR model to process the text and generate a preliminary table of names. This approach differed from traditional OCR methods, allowing for a more accurate extraction tailored to our project.
After the automated extraction, we manually reviewed and corrected the output. One common issue we encountered was that page numbers were sometimes misinterpreted as names, leading to cascading errors that required cleaning.
During the analysis, we made several observations:
Family structures: The first surname listed under a heading often applied to the subsequent names that followed the same indentation. This pattern provided insight into familial groupings. Ellipses in place of names: In some cases, the author used ellipses (...) instead of a name. Our research revealed that this practice was historically used for names that were unknown at the time of writing, with the intention that they could be added later if discovered. Alternatively, the ellipses were used to deliberately anonymize certain individuals. These findings enriched our understanding of the index and its historical and structural context.