A Question About Compressing Files

Discussion in 'Audio Hardware' started by bzfgt, Nov 11, 2017.

  1. bzfgt

    bzfgt Forum Resident Thread Starter

    Location:
    New Jersey
    So I mean there are programming languages where people don't have to actually program in binary code, right? And these have more characters, whatever other differences there may be, than a binary code. So you can communicate with a computer with a richer alphabet than (1,0), so a music file should be able to, also. Isn't that right?
     
  2. Chris DeVoe

    Chris DeVoe Forum Resident

    Unless things have changed significantly since the last time I looked into it, lossless compression means getting out exactly the same bits you put in. Checking the Wikipedia entry for FLAC, it is still limited to "40% to 50%" of the size of the original uncompressed file, which is another way of saying 2:1 or slightly better.

    This is for audio. Different types of images can be more compressible, being better suited to techniques like Run Length Encoding.
     
  3. bzfgt

    bzfgt Forum Resident Thread Starter

    Location:
    New Jersey
    Yes my question isn't about the current limit, but about why this cannot be surpassed.
     
  4. David G.

    David G. Forum Resident

    Location:
    Austin, TX
    You're confusing "language" and "data." Those are two completely separate and unrelated concepts. Computers can currently only understand ones and zeros (data). There are plenty of computer "languages" (ways of programming) but the data storage required for each one is about the same.
     
    Grant and Plan9 like this.
  5. bzfgt

    bzfgt Forum Resident Thread Starter

    Location:
    New Jersey
    Crap, OK. So my suggestion is impracticable unless computers are reinvented, basically?
     
    Grant and Mr. Explorer like this.
  6. Chris DeVoe

    Chris DeVoe Forum Resident

    Math.
     
    Grant likes this.
  7. Chris DeVoe

    Chris DeVoe Forum Resident

    Not even then. It wouldn't matter at all if the symbols were stored in a tercentenary format, the amount of data would remain the same.

    I'm trying to remember the name of the book I read, but it was by the director of Bell Labs.

    Myself, I don't care about lossless versus lossy, I only care about quality of the lossy compression algorithm. That bridge was crossed a long time ago. Nobody but professional photographers deal with uncompressed still images, and nobody but the cinematographer sees uncompressed movie images. Only in audio do people seem to care, mostly because the relatively small file size makes it possible to care. Nobody is going to download a six terabyte uncompressed copy of a film.
     
    Last edited: Nov 11, 2017
  8. Jesus Jeronimo

    Jesus Jeronimo Member

    Location:
    Madrid
    This is a most entertaining thread!
    Probably the answer to your question is very close to what you are proposing, since current compression algorithms already use what you call a dictionary in order to compress. This is, they will analyse all the data in a file, then look for patterns and then turn them into little "words" that take smaller space.

    But the limit here relates to binary codes and math, as of today there is only so much compression which you can apply. New algorithms are being developed as we speak, but it's also true that since bandwidths and storage cost have been steadily getting better in the last 20 years, I don't we've had the need to really push the envelope in this subject.

    Please, people post more about this. Very interesting.

    J
     
    Mr. Explorer and bzfgt like this.
  9. qwerty

    qwerty Forum Resident

    And remember that mp3 gets the audio file smaller because it discards audio information to make it smaller.

    Given the size of the audio industry and the number of super-intelligent boffins out there, I'm sure that if there was a way to get lossless compression to the level of mp3 compression we would know about it. And I'm also sure that there are lots of people out there thinking about this problem. I think we are stuck with the computing systems we have. Maybe when we move to biological-computing or some of the nano-technology stuff gets implemented into standard computer hardware (I'm a bit out of touch with these developments now) that we will be able to get better compression. But by that stage we probably won't need it either, as huge storage space would cheap and plentiful.
     
    Mr. Explorer likes this.
  10. bzfgt

    bzfgt Forum Resident Thread Starter

    Location:
    New Jersey
    I care because my ipod gets full.
     
  11. bzfgt

    bzfgt Forum Resident Thread Starter

    Location:
    New Jersey
    So basically what you're saying is, no one is going to read this and say "Eureka! You've solved the problem! It was exactly an untutored eye that we needed to truly see the solution! I am off to work up a prototype, and if you partner with me we will both shortly be millionaires!"

    I mean, that's what I was hoping to read...
     
    qwerty likes this.
  12. David G.

    David G. Forum Resident

    Location:
    Austin, TX
    Yes. The entire world is on a quest for quantum computing. Quantum computers use a ternary system for data; instead of ones and zeros, there are three positions to each data "switch" -- 1, 0, and both 1 and 0 at the same time (or possibly -1, 0, and +1). Not only would this exponentially increase processing power, but it would also mean that tremendous amounts of data could be stored in hard drives that we would consider tiny.

    Think of it this way: it would no longer be about "compressing" data. It would be about having the room to store tens of thousands of times the amount of data on a drive 1/100 the size of the drives we have now. You no longer need to compress data for storage when storage space is no longer an issue.
     
  13. Plan9

    Plan9 Mastering Engineer

    Location:
    Toulouse, France
    FLAC is absolutely lossless, I assure you. :)
    Try encoding one track (preferably the biggest and highest resolution you have: 24bit/something) at FLAC 7 or 8 and see if your computer can play it without stuttering. If it doesn't, reduce the encode to 5 or so and test it again until your computer can process it without problems.
    Indeed if you do this you will save around 40% of the space your WAV files used to take.
     
    bzfgt likes this.
  14. Chris DeVoe

    Chris DeVoe Forum Resident

    Yes, but the difference between 50% and 40% is not worth talking about when compared to the size difference with lossy techniques at high bitrates like 256k.

    Rip your own music and experiment with different compression rates on different material. Listen for compression artifacts on crash cymbals and applause. Use an A/B/X comparator. Don't blindly accept the orthodoxy that the only good quality playback sources are uncompressed.

    Again, every single second of HD video you've ever seen has been compressed - seriously, unless you're a broadcast camera operator or cinematographer you have never seen uncompressed HD video. And audio is vastly more simple to compress than video.

    Get over it.

    Edit to add: Having kicked the hornet's nest, I'm going to sleep.
     
    Veni Vidi Vici and Tsomi like this.
  15. Randoms

    Randoms Forum Resident

    Location:
    UK
    As the saying goes, lossless is lossless.

    dBpoweramp now has a setting beyond 1-8, which is uncompressed, lossless FLAC, for the more paranoid. Funnily enough this gives an identical result to 1-8.

    I use the default of 5. The difference between 5 and 8 is an extremely small difference in file size (but takes more processing power, and hence slightly more time), but identical file quality - it is lossless.

    Apart from file size and identical quality, there are other advantages to using FLAC over WAV, especially when converting to a different file type.
     
    Last edited: Nov 11, 2017
    garymc likes this.
  16. coffeetime

    coffeetime Forum Resident

    Location:
    Lancs, UK
    Assuming your iPod's CPU can cope with higher levels of compression, the trade off would be a drop in battery life as the 'cost' of uncompressing the more highly losslessly compressed data would be greater than the 'cost' of uncompressing less compressed lossless data.

    With iOS 11, iOS devices can now handle HEIF for images and HEVC (aka H265) for video. Some older devices that support iOS 11 hardware decompression support (fast & battery efficient) with the rest having software decompression support (slower, CPU intensive and therefore battery inefficient) and only newer devices have both hardware encode and decode support.

    So with greater data compression and therefore 'more music in the same space' on your iPod, you either need a newer device that supports the more compute intensive decompression in hardware or (maybe) you have your existing device but with a shorter battery life per charge.

    As @Chris DeVoe says, the advantage gained in smaller file sizes isn't worth the trade off in either needing more compute power/dedicated decode hardware support and/or a drop in battery life. For what many manufacturers and customers deem best, the sweet spot of current file size vs playback quality vs battery life vs device production costs has been reached.

    The problem of capacity given these file sizes is being more cost effectively solved by increasing device storage capacities (128GB iPods, 256GB phones, SD expansion on higher end dedicated portable digital audio players), than would be by supporting ever more compute demanding compression ratios.

    This is before we even get to streaming, either from a paid provider or from a 'digital locker', as a means of solving device capacity issues.

    My own personal sweet spot for file size vs playback quality has been reached with 256Kb/s AAC for iTunes purchases & Apple Music (including much classical, not forgiving of poor lossy compression), and 320Kb/s AAC for my own CD rips. I find good production and mastering of the music in the first place goes much further than current, commonly sold & used codecs which, when used correctly, are pretty transparent and already highly space efficient.
     
    Last edited: Nov 11, 2017
  17. Plan9

    Plan9 Mastering Engineer

    Location:
    Toulouse, France
    Where did that come from? :laugh:
    I was addressing a specific issue for another member, not discussing the merits of lossy vs lossless compression, which is beyond the intended scope of this thread, I think.
    Lossy has its uses, lossless has its uses. They're different and both valid. I use both on a daily basis.
    What's more, we have already more than enough of these discussions, like the "analog vs digital" ones.
     
    Last edited: Nov 11, 2017
  18. bzfgt

    bzfgt Forum Resident Thread Starter

    Location:
    New Jersey
    Got it. But what I really want is a device the size of a playing card which holds the equivalent of 1 tb or more of music; more room or smaller files, I care little...

    ...but then again, I suppose my ideas are too "futuristic" and visionary for the suits to accept. The same people laughed at Edison when he said the world is round....
     
  19. bzfgt

    bzfgt Forum Resident Thread Starter

    Location:
    New Jersey
    OK, sorry, one more question about binary vs. lengthy alphabet before I give up. Can't the computer have a way of reading a file where it knows that "D2jA" = "11010001100010101000111011111101111000101000111010111110001?" So that the file could be small, but it could still communicate a variety of complex information to the computer/reader which then translates it into binary code for itself?
     
  20. coffeetime

    coffeetime Forum Resident

    Location:
    Lancs, UK
    It’s not matter of insufficient imagination. Any device is part of an ecosystem, which in turn needs support from everyone involved in the chain. So for highly compressed files & a device that is built primarily for storage and battery life above all else, it needs the manufacturer of the device to design and build it. It will need software. it will need to interface with some sort of store and delivery method (download to device? Read from SD cards?). The labels themselves will need to support the format and be easily able to produce and submit files in that format to the stores. For all that to happen, there needs to be a proven market for it for everyone in the chain to go to all the expense and bother.

    Neil Young’s Pono project attempted to do what you propose, albeit with playback quality as the prime concern. Those who bought and used the player spoke highly of the device itself, less so of the store and the entire endeavour (arguably) never achieved critical mass commercially to allow it to be a self sustaining enterprise. Ultimately the iOS/iTunes combo alongside Amazon Music, Google, Android phones & DAPs not to mention Spotify, Pandora etc collectively all won out. With established codecs, playback components etc.

    MQA as a new ‘format’ hasn’t gained any traction despite its touted benefits - launching anything new at this point isn’t impossible but faces an uphill struggle against the incumbent formats and ecosystems. Pono struggled and that was built around existing, established codecs and hardware standards.

    When many are happy with a relatively low capacity device and either pay for streaming (Spotify Premium, AM, Tidal) or stream for free (Spotify free, YouTube) then you’ve already little to no chance of selling to these people. A little over decade ago, these were people who were happy to carry on buying CDs rather than buying new hardware and media for SACD or DVD-A.

    All that said, Aston & Kern do a very well regarded small digital music player with expandable capacity and better DAC than most phones and dedicated players. It might be worth looking to see if it supports some of the more highly compressed FLAC files possible. Might fit the bill of reducing file size whilst maintaining quality, thus increasing capacity?

    The longer the length of the discrete data you want to encode, the bigger the length of the substitution dictionary you need.

    "D2jA" might well = "11010001100010101000111011111101111000101000111010111110001" But change one bit of data in and you need something other than D2jA to represent it. Change one bit more and you need another. Work out the list of 4 characters substitutions you need for all permutations of 1s and 0s in your binary ‘word’ and it starts getting pretty lengthy indeed. The list of ‘shortened substitutions’ gets to be so long that your compression ratio isn’t so great, not to mention you then need a massive look up table for the sheer number of substitutions possible.
     
    Grant, Mr. Explorer and Chris DeVoe like this.
  21. RomanZ

    RomanZ Well-Known Member

    Location:
    Minsk
  22. bzfgt

    bzfgt Forum Resident Thread Starter

    Location:
    New Jersey
    Right, if you'll remember though that was my premise to begin with--a massively long code language, so that a huge file could be shrunk to a tiny one. That's the whole idea.

    I don't think I understand how that would negatively affect compression ratio though. I thought it would put all the burden on the device reading the file.

    But I hadn't considered your other point, that there isn't a market for it.

    Ipods that they now sell hold something like, I think, 16 gb. Whatever the market is for, it's out of step with me...so I believe you and I guess the question is just is this technically possible, although also just hypothetically affordable and doable in a possible world like this one in every respect except that people want really small files.
     
  23. Tsomi

    Tsomi Well-Known Member

    Location:
    Lille, France
    This might be of some help, for the curious ones:
    https://ese.wustl.edu/ContentFiles/...bPages/su10/AlexBenjamin_AudioCompression.pdf

    FLAC doesn't have the most perfect compression algorithm. It trades a bit of compression efficiency for a few important things in your every day usage: not too much CPU processing for decoding, the ability to seek anywhere in the file, streaming, constant speed, RAM usage on small devices, etc. The FLAC compression settings (which go from 0 to 8, from memory) let you trade a bit of resources usage for a bit more compression.

    And that's probably good enough for the vast majority of people, really ("it is optimized for decoding speed at the expense of encoding speed, because it makes it easier to decode on low-powered hardware, and because you only encode once but you decode many times").

    This doesn't mean that Google (or whoever else) will not invent a new format that compresses even better, though.
     
    Last edited: Nov 11, 2017
  24. Andreas

    Andreas Forum Resident

    Location:
    Frankfurt, Germany
    If the code had more different characters, the compressed file wouldn't be any smaller.

    Scenario 1: Binary code, 16 bits of information ==> compressed file is 16 bits long
    Scenario 2: Code with 16 different characters, same file as above ==> compressed file is only 4 characters long, but each character takes up 4 bits ==> compressed file is still 16 bits long
     
    Grant and Mr. Explorer like this.
  25. bzfgt

    bzfgt Forum Resident Thread Starter

    Location:
    New Jersey
    Why would each character take up more bits? I am taking it that the code is the information.
     
    Mr. Explorer likes this.

Share This Page