Platform management and development : Quantitative analysis of performance
GeoParsingBot
For a first try we chose to scan the 40’679 pulses already posted by the account @secondary_sources_bot [1]. First for the geocode, a python program had to retrieve all the 40k pulses of the data source and put them in an intermediate json file of stockage. This action took 7 minutes and 60 Mo of memory, which makes approximatively 1.5 ko/pulse. After that, another part of the project took care of the geoparsing which took 6 minutes, but used 4,7 Go of RAM. The geoparsing library used, geoparsepy, spends a lot of time preprocessing the data and building an inverted index of location and other utilities to allow an almost instantaneously geoparsing of text. The RAM and time taken was proportional to the data loaded in the postgreSQL db, as only 137 MB of european places was loaded. If the data loaded on the postgreSQL is close to 5Go, the geoparsing time goes up to 45 minutes for only pre-processes, and can require for more than 12Go of RAM. After approximately 50% (19'812/40'679) of the pulses were geoparsed, it was time to put them online. However only 81 pulses were posted, as there was an error due to the Mastodon API. Indeed the API only allows posts under 500 characters long and the 82th pulse found was longer. We chose to not pot the geopulses more than 500 characters long.
The JSON file containing the informations about the total read pulses was 10 Mo big, and we counted 40’679 scanned pulses within. The JSON file containing the informations about the geocoded pulses was 7 Mo big, and we counted 19’812 geocoded pulses within. Each of them is indeed heavier because the geocoding adds informations to each pulse, for instance the coordinates. We counted 3’229 pulses posted in 27 minutes, so that we estimate it would take around 3 hours to post the total 19’812 geocoded pulses. In fact, the Overpass API is the limiting factor in terms of performances, for the reason that for each geocoded pulse it has to make an online request to get its longitude and latitude from the corresponding OSMID.
For now the geopulses are visible here : [2]
MapApp and MapBot
On the one hand the map well displayed the 3’219 pulses by showing the associated markers, and the url contained in the popups were working. On the other hand the display was slow and lag were encountered. The research bar tool was astonishingly not so slow and really responsive.