A Question About Compressing Files

bzfgt · Nov 11, 2017

Here is a completely naive question, I'm hesitant about this because if someone gives me a technical answer there's a 100% chance I won't understand it, and because I'm not sure I will be able to make myself understood.

Why can't lossless files be majorly compressed (i.e. made much smaller), if you used a code that had, so to speak, a zillion characters? So, if there were one character for virtually any piece of information (not literally but as an ideal). Common sense suggests the shorter the language the larger the file and vice versa. Computers however can compute really fast and well, so couldn't a music player be programmed to understand a really, really long code and a lossless mp3-type file be made?

In other words, my idea is that music files could be made really small if the available units of code were really, really numerous. Is it really not feasible to make a reasonably sized machine that could process all that?

Again, sorry if my lingo is wrong or confusing.

bzfgt · Nov 11, 2017

I'm listening to Dave 17 now. I suspect that it plays fast. I'm going to switch to Dave 24.

fatwad666 · Nov 11, 2017

InStepWithTheStars · Nov 11, 2017

I've always been more confused about how lossless files actually can have their size compressed. Doesn't that indicate some form of data loss?

bzfgt · Nov 11, 2017

bzfgt said: ↑

I'm listening to Dave 17 now. I suspect that it plays fast. I'm going to switch to Dave 24.
Click to expand...

Oops, wrong thread, sorry for that one!

bzfgt · Nov 11, 2017

UltraSoundSquid said: ↑

I've always been more confused about how lossless files actually can have their size compressed. Doesn't that indicate some form of data loss?
Click to expand...

This is what I mean. Let's say there's a code (which will ultimately refer to electrical pulses or something on the physical level I think) that the computer has to read which goes "Horn squonk = 2aT3jjB1." Of course "horn squonk" is too vague to really be something, but again I have to think of this in simplified terms not knowing anything on a technical level.

If the code only contains the characters a, B, j, T,2, 3, and 1, it will have to generate really long strings of characters to differentiate between all the different sounds it encodes--remember, it has to be really specific to get the sound exact, we are talking about a level where you can differentiate Charlie Parker's tone from Cannonball Adderley's, or tell when John Lennon has a cold--in short, what we call (I think) "fidelity." With only six characters, the sheer number of novel combinations needed will require longer stretches of code for shorter stretches of music--i.e., the files will be large.

Now imagine there are 3,000 to the 456,000,000th power characters in our code. One or two characters could be assigned to a millisecond of a particular kind of sound, or however it is done, and we still won't run out of novel combinations. In short, the files can be smaller.

Isn't that basically how it works? So now we just need a machine with lots of computing power to recognize all these characters.

swedgin · Nov 11, 2017

UltraSoundSquid said: ↑

I've always been more confused about how lossless files actually can have their size compressed. Doesn't that indicate some form of data loss?
Click to expand...

Nope, think of it like zipping and unzipping documents etc. No loss of data when you do that.

eric777 · Nov 11, 2017

I really wished I understood all this stuff.

Cherrycherry · Nov 11, 2017

My number line bows down to your number line.
One zillion.

screechmartin · Nov 11, 2017

bzfgt said: ↑

Here is a completely naive question, I'm hesitant about this because if someone gives me a technical answer there's a 100% chance I won't understand it, and because I'm not sure I will be able to make myself understood.

Why can't lossless files be majorly compressed (i.e. made much smaller), if you used a code that had, so to speak, a zillion characters? So, if there were one character for virtually any piece of information (not literally but as an ideal). Common sense suggests the shorter the language the larger the file and vice versa. Computers however can compute really fast and well, so couldn't a music player be programmed to understand a really, really long code and a lossless mp3-type file be made?

In other words, my idea is that music files could be made really small if the available units of code were really, really numerous. Is it really not feasible to make a reasonably sized machine that could process all that?

Again, sorry if my lingo is wrong or confusing.
Click to expand...

You either understand a lot more about digital data and compression than most of us, or a lot, lot less. I can't figure out which.

InStepWithTheStars · Nov 11, 2017

swedgin said: ↑

Nope, think of it like zipping and unzipping documents etc. No loss of data when you do that.
Click to expand...

Don't claim to understand it but I'll believe you.

Why then are there different quality levels for FLAC? When I export files in Audacity it offers eight levels of quality. Does Audacity just have a bad encoder?

Chris DeVoe · Nov 11, 2017

Lossless compression cannot be much greater than 2 to 1. Attempts to make a bigger "dictionary" of symbols just means that you lose whatever you gained by coding the material in the first place - aka "maximum entropy" where a file is indistinguishable from random mathematical noise.

I read one book about the subject twenty years ago, so I'm not the best person to ask.

Chris DeVoe · Nov 11, 2017

UltraSoundSquid said: ↑

Don't claim to understand it but I'll believe you.

Why then are there different quality levels for FLAC? When I export files in Audacity it offers eight levels of quality. Does Audacity just have a bad encoder?
Click to expand...

Quality of FLAC? There should be no quality difference. On the other hand, Audacity might be writing out different sample rates of file, but that just is what type of WAV file it will be when un-FLAC'd.

David G. · Nov 11, 2017

bzfgt said: ↑

Here is a completely naive question, I'm hesitant about this because if someone gives me a technical answer there's a 100% chance I won't understand it, and because I'm not sure I will be able to make myself understood.

Why can't lossless files be majorly compressed (i.e. made much smaller), if you used a code that had, so to speak, a zillion characters? So, if there were one character for virtually any piece of information (not literally but as an ideal). Common sense suggests the shorter the language the larger the file and vice versa. Computers however can compute really fast and well, so couldn't a music player be programmed to understand a really, really long code and a lossless mp3-type file be made?

In other words, my idea is that music files could be made really small if the available units of code were really, really numerous. Is it really not feasible to make a reasonably sized machine that could process all that?

Again, sorry if my lingo is wrong or confusing.
Click to expand...

Simple answer?

Really long code is the same thing as really large files. It's that simple. The size of the data file is equal to the amount of code (data) in it. The more data, the more space it takes to store it.

This is why a 16-bit recording takes up less disk space than a 24-bit recording. It's why a 3-minute song takes up less space than a 10-minute song.

Until someone comes up with digital storage systems that use something other than binary data (ones and zeros), we'll never be able to losslessly compress data to a point much smaller than we have now. This is really the holy grail of data storage; a ternary data system (three possible positions for each bit of data instead of two) would shrink data to sizes that are only the tiniest fraction of what they are today. Whoever can come up with a cost-effective means of mass-producing this will change the world.

bzfgt · Nov 11, 2017

screechmartin said: ↑

You either understand a lot more about digital data and compression than most of us, or a lot, lot less. I can't figure out which.
Click to expand...

Well, OK, what in what I said seems to not match how you think it works?

subtr · Nov 11, 2017

UltraSoundSquid said: ↑

Don't claim to understand it but I'll believe you.

Why then are there different quality levels for FLAC? When I export files in Audacity it offers eight levels of quality. Does Audacity just have a bad encoder?
Click to expand...

Because higher quality (FLAC 8) need more processing to achieve. If you haven't got a lot of processing power, you can encode at 5, or 1 in fact, and reduce strain on your processor at the expense of final filesize. It's much less a concern now than when FLAC was started, as now everyone's phones has the ability and power to decode FLAC8, but then, domestic computers would have been put under a bit more pressure when decoding on the fly.

There is no quality difference in the final file, it is just more, or less, compressed taking more, or less processing power to decode to a wave file that is played back.

bzfgt · Nov 11, 2017

David G. said: ↑

Simple answer?

Really long code is the same thing as really large files. It's that simple. The size of the data file is equal to the amount of code (data) in it. The more data, the more space it takes to store it.
Click to expand...

That's what I'm saying--a really long "alphabet" would make for much shorter code. The more characters available, the shorter each "statement" in code would have to be, and vice versa. The trade-off is you'd need a machine that could understand a whole lot of characters, but the files would be smaller. My question is why isn't something like this possible? Would you need a computer as big as a house to read files the code for which there are tons and tons of characters? If you could get a small or simple enough machine to do it, then the file size could be majorly reduced.

You may have answered this with the "ternary" bit but I'm not sure if it's the same question or not because I don't know what it implies.

bzfgt · Nov 11, 2017

subtr said: ↑

Because higher quality (FLAC 8) need more processing to achieve. If you haven't got a lot of processing power, you can encode at 5, or 1 in fact, and reduce strain on your processor at the expense of final filesize. It's much less a concern now than when FLAC was started, as now everyone's phones has the ability and power to decode FLAC8, but then, domestic computers would have been put under a bit more pressure when decoding on the fly.

There is no quality difference in the final file, it is just more, or less, compressed taking more, or less processing power to decode to a wave file that is played back.
Click to expand...

Right, I get that, I guess my question is why the amount of processing power is prohibitive to have FLAC files say one third the size. Thank you, I think you have understood my question.

Randoms · Nov 11, 2017

subtr said: ↑

Because higher quality (FLAC 8) need more processing to achieve. If you haven't got a lot of processing power, you can encode at 5, or 1 in fact, and reduce strain on your processor at the expense of final filesize. It's much less a concern now than when FLAC was started, as now everyone's phones has the ability and power to decode FLAC8, but then, domestic computers would have been put under a bit more pressure when decoding on the fly.

There is no quality difference in the final file, it is just more, or less, compressed taking more, or less processing power to decode to a wave file that is played back.
Click to expand...

Excellent answer, and the truth a lot of people struggle with.

David G. · Nov 11, 2017

bzfgt said: ↑

That's what I'm saying--a really long "alphabet" would make for much shorter code. The more characters available, the shorter each "statement" in code would have to be, and vice versa. The trade-off is you'd need a machine that could understand a whole lot of characters, but the files would be smaller. My question is why isn't something like this possible? Would you need a computer as big as a house to read files the code for which there are tons and tons of characters? If you could get a small or simple enough machine to do it, then the file size could be majorly reduced.

You may have answered this with the "ternary" bit but I'm not sure if it's the same question or not because I don't know what it implies.
Click to expand...

Except there is no longer "alphabet" in existence right now. Binary is all we have.

bzfgt · Nov 11, 2017

Chris DeVoe said: ↑

Lossless compression cannot be much greater than 2 to 1. Attempts to make a bigger "dictionary" of symbols just means that you lose whatever you gained by coding the material in the first place - aka "maximum entropy" where a file is indistinguishable from random mathematical noise.

I read one book about the subject twenty years ago, so I'm not the best person to ask.
Click to expand...

This reply I don't understand, unless the length of a chunk of code was de facto infinite this shouldn't be the case (and I guess de facto infinite would be "too long for the computer to read" in this case, so back to the question why you can't get a computer that understands a language with lots of characters).

stetsonic · Nov 11, 2017

subtr said: ↑

Because higher quality (FLAC 8) need more processing to achieve. If you haven't got a lot of processing power, you can encode at 5, or 1 in fact, and reduce strain on your processor at the expense of final filesize. It's much less a concern now than when FLAC was started, as now everyone's phones has the ability and power to decode FLAC8, but then, domestic computers would have been put under a bit more pressure when decoding on the fly.

There is no quality difference in the final file, it is just more, or less, compressed taking more, or less processing power to decode to a wave file that is played back.
Click to expand...

IIRC, from level 3 upwards or so the savings in file size become negligible while encoding time goes steeply up. I haven't really bothered to think about it too much, I go with whatever the default is. In most software I use I think it's level 5. Like you said, nowadays the processing power is generally not an issue.

Does the compression level affect the processing power needed for decoding though? Again from memory, I'd say it doesn't. I could be wrong though.

bzfgt · Nov 11, 2017

David G. said: ↑

Except there is no longer "alphabet" in existence right now. Binary is all we have.
Click to expand...

I see. That is exactly the answer I think. So the problem is the alphabet is a simple binary code and you have to string it out really far.

If that's the case, then it would be nearly impossible to do what I am saying without reinventing the computer.

But--now I'm on shaky ground and might get an answer I don't understand--I wonder if a third machine could be added to the process, a kind of "translator"--it can read a more diverse code and output binary code for the computer? Or would that really be as complicated as inventing another computer? Crap.

bzfgt · Nov 11, 2017

David G. said: ↑

Except there is no longer "alphabet" in existence right now. Binary is all we have.
Click to expand...

Aren't there intermediary languages that can be unpacked in binary form by the computer? Or doesn't that question make sense. I can't imagine how a music file could contain every sound on a record rendered in a binary code, that already seems too large to be practical, but maybe here is where the limitedness of my understanding is too great.

InStepWithTheStars · Nov 11, 2017

subtr said: ↑

Because higher quality (FLAC 8) need more processing to achieve. If you haven't got a lot of processing power, you can encode at 5, or 1 in fact, and reduce strain on your processor at the expense of final filesize. It's much less a concern now than when FLAC was started, as now everyone's phones has the ability and power to decode FLAC8, but then, domestic computers would have been put under a bit more pressure when decoding on the fly.

There is no quality difference in the final file, it is just more, or less, compressed taking more, or less processing power to decode to a wave file that is played back.
Click to expand...

Thanks. Having different levels on a lossless codec made me question the actual validity of its losslessness (if that's a word).

So in order to save space on my horrible computers, I should be fine if I set each FLAC file at level 1? The processors can barely handle the operating system, let alone more than two programs at once. I've been saving everything as WAV. If FLAC will save a ton of space, then I'd better start doing that.

Log in or Sign up

A Question About Compressing Files

bzfgt The Grand High Exalted Mystic Ruler Thread Starter

bzfgt The Grand High Exalted Mystic Ruler Thread Starter

fatwad666 Forum Resident

InStepWithTheStars It's a miracle, let it alter you

bzfgt The Grand High Exalted Mystic Ruler Thread Starter

bzfgt The Grand High Exalted Mystic Ruler Thread Starter

swedgin Forum Resident

eric777 Astral Projectionist

Cherrycherry Forum Resident

screechmartin Senior Member

InStepWithTheStars It's a miracle, let it alter you

Chris DeVoe RIP Vickie Mapes Williams (aka Equipoise)

Chris DeVoe RIP Vickie Mapes Williams (aka Equipoise)

David G. Forum Resident

bzfgt The Grand High Exalted Mystic Ruler Thread Starter

subtr Forum Resident

bzfgt The Grand High Exalted Mystic Ruler Thread Starter

bzfgt The Grand High Exalted Mystic Ruler Thread Starter

Randoms Aerie Faerie Nonsense

David G. Forum Resident

bzfgt The Grand High Exalted Mystic Ruler Thread Starter

stetsonic Forum Resident

bzfgt The Grand High Exalted Mystic Ruler Thread Starter

bzfgt The Grand High Exalted Mystic Ruler Thread Starter

InStepWithTheStars It's a miracle, let it alter you

Share This Page

Log in or Sign up

A Question About Compressing Files

bzfgt The Grand High Exalted Mystic Ruler Thread Starter

bzfgt The Grand High Exalted Mystic Ruler Thread Starter

fatwad666 Forum Resident

InStepWithTheStars It's a miracle, let it alter you

bzfgt The Grand High Exalted Mystic Ruler Thread Starter

bzfgt The Grand High Exalted Mystic Ruler Thread Starter

swedgin Forum Resident

eric777 Astral Projectionist

Cherrycherry Forum Resident

screechmartin Senior Member

InStepWithTheStars It's a miracle, let it alter you

Chris DeVoe RIP Vickie Mapes Williams (aka Equipoise)

Chris DeVoe RIP Vickie Mapes Williams (aka Equipoise)

David G. Forum Resident

bzfgt The Grand High Exalted Mystic Ruler Thread Starter

subtr Forum Resident

bzfgt The Grand High Exalted Mystic Ruler Thread Starter

bzfgt The Grand High Exalted Mystic Ruler Thread Starter

Randoms Aerie Faerie Nonsense

David G. Forum Resident

bzfgt The Grand High Exalted Mystic Ruler Thread Starter

stetsonic Forum Resident

bzfgt The Grand High Exalted Mystic Ruler Thread Starter

bzfgt The Grand High Exalted Mystic Ruler Thread Starter

InStepWithTheStars It's a miracle, let it alter you

Share This Page

Useful Searches