Posted November 08, 2020
WinterSnowfall: That's quite the paradigm shift. Is it for performance reasons or just more optimal coding?
Writing an explanation for my decision took quite a lot of attempts because I myself wasn't sure what the most important factors were. I'd say relational databases are not really made for big scale updates of data but mostly smaller incremental updates where consistency is important. This means a lot of synchronization between the application and DB to keep relationships intact. The overhead of the constant select and update statements is massive for the current updater. Keeping things in memory is not really an option as well because it doesn't scale well. What I really want to do is fetch all the new data, merge it with the old one, compare the two versions, dump everything out again and not worry about any previous state. NoSQL lets me do that. Writing code to abstract the data in a way that a relational database can deal with also wasn't really fun. I'd have a Pythonic data representation and one that the DB could deal with and a lot of code to convert between the two. With the new architecture very little data conversion needs to happen.
Of course, querying becomes a problem when there's no database engine taking care of the optimization. My current plan is to either generate index-like data structures manually and store them as JSON or insert relevant values into a disposable SQLite DB that runs the query and references the full version in the NoSQL storage. Depends on if it's really worth it pulling in another dependency.