clarry: So what do you think these files store if not textures or data? Why don't you look inside?
rtcvb32: Well taking a raw look, there's an awful lot of zeros. About what i expected.
That doesn't answer the question. What are the zeros there for, what do they represent? In all likelihood they are a part of something and not just "empty unused space." Zero is valid data too! And it happens to be very common data.
Anyway, turns out I own the game and decided to take a look inside. While reverse engineering that binary format in 15 minutes is beyond my capabilities, I can make this observation: the file contains lots of recurring references to things like Chandelier, WallTorch, LightEffect, Teleporter, Destination, SetOfTiles*, SpikesSet*. Thousands of these. Which suggests to me that the file contains the entire scene with all its objects losslessly serialized.
Level15 appears to contains approximately 10k of such objects. I took a quick peek at a playthrough of the game on YouTube, and while it is not very easy to get the sense of the levels' scale, I can tell that they are quite large and can easily consist of that many tiles (and objects & effects contained within said tiles).
If all objects with all their properties and relations and references to assets are serialized in binary, the size looks quite sensible. A few kilobytes per object times 10k is how you get to a few tens of megabytes.
While I personally would not implement a game this way, I don't see anything wrong with it per se. The zeroes would be part of these serializations -- possibly including any padding bytes, if the goal is to make these files fast and easy to read & write without any marshalling (i.e. just copy the blob -- possibly even mmap the file and start using the memory as-is!).
Disk space is cheap and losslessly dumping objects in and out is fast and simple. Developer time is *not* cheap and I can see why someone would not spend it optimizing the data files, especially given how well entropy coding compresses it for distribution anyway.
Ever written your own huffman code, store and load it and compress/decompress data on the fly?
Nope. But I have reverse engineered data files in games, including written a decompressor for the RLE compression in
Killing Time and used that to uncompress & extract all the data files out of the archive.
Don't see the need to write my own huffman implementation given that there are plenty of free & open source implementations out there (including easy to use libraries). Also, if I wanted entropy coding, I would not waste time on old huffman.
Arithmetic coding is much better.