Yepoleb: gogrepo.py can be used to backup your collection, if you're looking for an existing solution. Otherwise use the products API to get the filesizes and check if they match the ones you have already. Version is very unreliable and hashes would probably get you banned if you scan all of them at once. There's no list of which games have been updated since you last downloaded them. The full checksum can be obtained using the chunklist for a download url, check the docs for an example.
I guess I can use the filesize AND the version to determine whether to update. Not a bad idea.
Thanks for the pointer concerning the checksum.
Yepoleb: Databases aren't made to store big chunks of binary data. MongoDB is probably even worse than PostgreSQL, because it splits them into a massive amount of tiny chunks. If you want to store files with a unique key, use a filesystem, that's what they're designed for. Metadata can be kept in json files that are much easier to manage than database columns. Whatever you do, please just don't store files in a database.
In my experience, filesystems have a much lower amount of content-integrity checking than databases: files on the filesystem get corrupted all the time without the knowledge of the storage engine until you try to access that particular file. Databases are of course not immune to this (they too run files on a regular filesystem in most scenarios), but they are by their nature a lot more pro-active at letting you know that your content was corrupted at which point you can take action rather than unknowingly sit on corrupted data.
Also, overall, I got a lot of pre-existing knowledge with MongoDB (I took a bunch of online courses and have a double-certification). It's relatively easy for me to add/remove replicas for storage redundancy or just to deprecate or add storage medium.
For me, putting and retrieving files from a remote database is a lot less hassle and more portable than having to manage, replicate and move around a filesystem directory structure.
In a previous place of employment, we originally stored media files in a directory structure and that proved to be a mess to maintain. Moving it to a database proved to be a net gain for us (from there, you could remotely access files, replicate them, shard them and overall handle them just like other database documents).
Now, they were not 1GB+ files and you a probably right that if you want to do a lot of manipulation on the entire file, the chucks may be a drawback, but for storage and retrieval? Why not?