Gede: To acquire the full JSON I need to run a query on MaGog. I don't know if that causes any significant stress on your server, but static files are much lighter. Would you consider storing the last full JSON file alongside the CSV-like file at a fixed URL (maybe compressed to save on bandwidth too)?
I will consider doing that, but it probably will take a while since the code that generates the CSV-like is part of one application (the database updater that runs once every six hours), while the JSON export is part of another (the search engine). Until then, don't worry about the bandwidth and server load, but while testing your code you might want to work with a local file.
That said, the whole concept of the JSON export as part of the search engine is that it allows you to leverage all the existing capabilities of MaGog's search engine. If you want a list of all the games that have a manual, you don't need to get the entire database and code your own search engine, you just set the MaGog filter and get the results you want. Similarly, if you're interested in only say five fields, there no need to get the entire database, just set the display checkboxes to whatever you want and you have your results.
And, of course, you don't need the MaGog GUI to run the search as all the information is present in the resulting search URL.
mrkgnao: And the reason they are strings is because I believe it is a more useful data representation than a list for these cases. If you wish I can explain why this is more useful. And trust me, nothing is trivial.
Gede: I would appreciate very much if you could spare the time to share your opinion on this. I am always ready to learn.
mrkgnao: If you wish, as an exercise, try to write the pseudocode for your trivial routine for finding manuals, assuming a list, and post it here. And we'll discuss it.
Gede: Here is the full code in Python (Replace the __ with whitespace. This forum is not very Python-friendly):
import json
with open("magog.json") as magog:
__games_with_manual = [game["title"] for game in json.load(magog)["magog_search_results"] if "manual" in game["bonuses"]]
OK, so this code works on your JSON file "as is" (string) as well as on a list of strings, but if I wanted to do more complicated calculations, such as "what genres are more popular" or "what genre combination is more or less common", I'll have to decompose the string myself. The speed issue is not the most relevant here.
You say "OK, so this code works on your JSON file "as is"", but it doesn't.
It will not find, for example, either of the Anno 1404 manuals because they are called:
- Anno 1404 manual (EN, FR, ITA, SPA)
- Anno 1404: Venice manual (EN, FR, ITA, SPA)
Actually it will not find quite a lot of the manuals in the catalogue. By a rough estimate, it will find 443 manuals and miss more than 480.
So you will end up with having to implement some more or less sophisticated form of string search, which you will use within your iterator, so what do I (as a search engine writer) gain from the iterator? Why not simply do a single string search? Which is what Magog does.
And what will happen when you come across an element called "manual avatars" because someone at GOG forgot to put the separator between the two bonuses (real life example, which has since been fixed).
Parsing these strings into individual bonuses, you will learn, is not that trivial, unless you assume the original data is perfect, which it is anything but. The trick it to write a resilient application that will yield useful information even with faulty data.
I agree with you that if you want to generate statistics (which is what you hint at) rather than search, you need to parse the strings into their separate elements and in that respect MaGog's output is less useful that the original GOG data (e.g. bonuses now have HTML separators in the GOG data (separators which Magog ignores), but when MaGog was written they did not). However, MaGog was designed as a search engine and it shows its design.
P.S. Do note that the "bonuses" string is not a comma-separated list. As you can see from the Anno 1404 example above a single bonus may contains commas itself.