A Question About Compressing Files

Discussion in 'Audio Hardware' started by bzfgt, Nov 11, 2017.

  1. bzfgt

    bzfgt Forum Resident Thread Starter

    Location:
    New Jersey
    Here is a completely naive question, I'm hesitant about this because if someone gives me a technical answer there's a 100% chance I won't understand it, and because I'm not sure I will be able to make myself understood.

    Why can't lossless files be majorly compressed (i.e. made much smaller), if you used a code that had, so to speak, a zillion characters? So, if there were one character for virtually any piece of information (not literally but as an ideal). Common sense suggests the shorter the language the larger the file and vice versa. Computers however can compute really fast and well, so couldn't a music player be programmed to understand a really, really long code and a lossless mp3-type file be made?

    In other words, my idea is that music files could be made really small if the available units of code were really, really numerous. Is it really not feasible to make a reasonably sized machine that could process all that?

    Again, sorry if my lingo is wrong or confusing.
     
    Kyhl likes this.
  2. bzfgt

    bzfgt Forum Resident Thread Starter

    Location:
    New Jersey
    I'm listening to Dave 17 now. I suspect that it plays fast. I'm going to switch to Dave 24.
     
    GerryO likes this.
  3. fatwad666

    fatwad666 Well-Known Member

    Location:
    Fort Worth, TX
    [​IMG]
     
    Galley and eric777 like this.
  4. InStepWithTheStars

    InStepWithTheStars Forum Resident

    I've always been more confused about how lossless files actually can have their size compressed. Doesn't that indicate some form of data loss?
     
  5. bzfgt

    bzfgt Forum Resident Thread Starter

    Location:
    New Jersey
    Oops, wrong thread, sorry for that one!
     
    ianuaditis and Kiss73 like this.
  6. bzfgt

    bzfgt Forum Resident Thread Starter

    Location:
    New Jersey
    This is what I mean. Let's say there's a code (which will ultimately refer to electrical pulses or something on the physical level I think) that the computer has to read which goes "Horn squonk = 2aT3jjB1." Of course "horn squonk" is too vague to really be something, but again I have to think of this in simplified terms not knowing anything on a technical level.

    If the code only contains the characters a, B, j, T,2, 3, and 1, it will have to generate really long strings of characters to differentiate between all the different sounds it encodes--remember, it has to be really specific to get the sound exact, we are talking about a level where you can differentiate Charlie Parker's tone from Cannonball Adderley's, or tell when John Lennon has a cold--in short, what we call (I think) "fidelity." With only six characters, the sheer number of novel combinations needed will require longer stretches of code for shorter stretches of music--i.e., the files will be large.

    Now imagine there are 3,000 to the 456,000,000th power characters in our code. One or two characters could be assigned to a millisecond of a particular kind of sound, or however it is done, and we still won't run out of novel combinations. In short, the files can be smaller.

    Isn't that basically how it works? So now we just need a machine with lots of computing power to recognize all these characters.
     
  7. swedgin

    swedgin Forum Resident

    Nope, think of it like zipping and unzipping documents etc. No loss of data when you do that.
     
  8. eric777

    eric777 Forum Resident

    Location:
    Tennessee
    I really wished I understood all this stuff.
     
  9. Cherrycherry

    Cherrycherry Forum Resident

    Location:
    USA
    My number line bows down to your number line.
    One zillion.:yikes:
     
    bzfgt likes this.
  10. screechmartin

    screechmartin Well-Known Member

    Location:
    British Columbia
    You either understand a lot more about digital data and compression than most of us, or a lot, lot less. I can't figure out which.
     
    Encore, drgn95, RBtl and 5 others like this.
  11. InStepWithTheStars

    InStepWithTheStars Forum Resident

    Don't claim to understand it but I'll believe you.

    Why then are there different quality levels for FLAC? When I export files in Audacity it offers eight levels of quality. Does Audacity just have a bad encoder?
     
  12. Chris DeVoe

    Chris DeVoe Forum Resident

    Lossless compression cannot be much greater than 2 to 1. Attempts to make a bigger "dictionary" of symbols just means that you lose whatever you gained by coding the material in the first place - aka "maximum entropy" where a file is indistinguishable from random mathematical noise.

    I read one book about the subject twenty years ago, so I'm not the best person to ask.
     
    Kyhl, Vinyl Socks and coffeetime like this.
  13. Chris DeVoe

    Chris DeVoe Forum Resident

    Quality of FLAC? There should be no quality difference. On the other hand, Audacity might be writing out different sample rates of file, but that just is what type of WAV file it will be when un-FLAC'd.
     
  14. David G.

    David G. Forum Resident

    Location:
    Austin, TX
    Simple answer?

    Really long code is the same thing as really large files. It's that simple. The size of the data file is equal to the amount of code (data) in it. The more data, the more space it takes to store it.

    This is why a 16-bit recording takes up less disk space than a 24-bit recording. It's why a 3-minute song takes up less space than a 10-minute song.

    Until someone comes up with digital storage systems that use something other than binary data (ones and zeros), we'll never be able to losslessly compress data to a point much smaller than we have now. This is really the holy grail of data storage; a ternary data system (three possible positions for each bit of data instead of two) would shrink data to sizes that are only the tiniest fraction of what they are today. Whoever can come up with a cost-effective means of mass-producing this will change the world.
     
    JimmyCool, Grant, ianuaditis and 3 others like this.
  15. bzfgt

    bzfgt Forum Resident Thread Starter

    Location:
    New Jersey
    Well, OK, what in what I said seems to not match how you think it works?
     
  16. subtr

    subtr Forum Resident

    Because higher quality (FLAC 8) need more processing to achieve. If you haven't got a lot of processing power, you can encode at 5, or 1 in fact, and reduce strain on your processor at the expense of final filesize. It's much less a concern now than when FLAC was started, as now everyone's phones has the ability and power to decode FLAC8, but then, domestic computers would have been put under a bit more pressure when decoding on the fly.

    There is no quality difference in the final file, it is just more, or less, compressed taking more, or less processing power to decode to a wave file that is played back.
     
    Grant, Tsomi, gregorya and 3 others like this.
  17. bzfgt

    bzfgt Forum Resident Thread Starter

    Location:
    New Jersey
    That's what I'm saying--a really long "alphabet" would make for much shorter code. The more characters available, the shorter each "statement" in code would have to be, and vice versa. The trade-off is you'd need a machine that could understand a whole lot of characters, but the files would be smaller. My question is why isn't something like this possible? Would you need a computer as big as a house to read files the code for which there are tons and tons of characters? If you could get a small or simple enough machine to do it, then the file size could be majorly reduced.

    You may have answered this with the "ternary" bit but I'm not sure if it's the same question or not because I don't know what it implies.
     
  18. bzfgt

    bzfgt Forum Resident Thread Starter

    Location:
    New Jersey
    Right, I get that, I guess my question is why the amount of processing power is prohibitive to have FLAC files say one third the size. Thank you, I think you have understood my question.
     
    subtr likes this.
  19. Randoms

    Randoms Forum Resident

    Location:
    UK
    Excellent answer, and the truth a lot of people struggle with.
     
    subtr likes this.
  20. David G.

    David G. Forum Resident

    Location:
    Austin, TX
    Except there is no longer "alphabet" in existence right now. Binary is all we have.
     
    RomanZ and bzfgt like this.
  21. bzfgt

    bzfgt Forum Resident Thread Starter

    Location:
    New Jersey
    This reply I don't understand, unless the length of a chunk of code was de facto infinite this shouldn't be the case (and I guess de facto infinite would be "too long for the computer to read" in this case, so back to the question why you can't get a computer that understands a language with lots of characters).
     
  22. stetsonic

    stetsonic Well-Known Member

    Location:
    Kouvola, Finland
    IIRC, from level 3 upwards or so the savings in file size become negligible while encoding time goes steeply up. I haven't really bothered to think about it too much, I go with whatever the default is. In most software I use I think it's level 5. Like you said, nowadays the processing power is generally not an issue.

    Does the compression level affect the processing power needed for decoding though? Again from memory, I'd say it doesn't. I could be wrong though. :)
     
    subtr and bzfgt like this.
  23. bzfgt

    bzfgt Forum Resident Thread Starter

    Location:
    New Jersey
    I see. That is exactly the answer I think. So the problem is the alphabet is a simple binary code and you have to string it out really far.

    If that's the case, then it would be nearly impossible to do what I am saying without reinventing the computer.

    But--now I'm on shaky ground and might get an answer I don't understand--I wonder if a third machine could be added to the process, a kind of "translator"--it can read a more diverse code and output binary code for the computer? Or would that really be as complicated as inventing another computer? Crap.
     
    Mr. Explorer likes this.
  24. bzfgt

    bzfgt Forum Resident Thread Starter

    Location:
    New Jersey
    Aren't there intermediary languages that can be unpacked in binary form by the computer? Or doesn't that question make sense. I can't imagine how a music file could contain every sound on a record rendered in a binary code, that already seems too large to be practical, but maybe here is where the limitedness of my understanding is too great.
     
    Mr. Explorer likes this.
  25. InStepWithTheStars

    InStepWithTheStars Forum Resident

    Thanks. Having different levels on a lossless codec made me question the actual validity of its losslessness (if that's a word).

    So in order to save space on my horrible computers, I should be fine if I set each FLAC file at level 1? The processors can barely handle the operating system, let alone more than two programs at once. I've been saving everything as WAV. If FLAC will save a ton of space, then I'd better start doing that.
     

Share This Page