A Question About Compressing Files

bzfgt · Nov 11, 2017

bzfgt said: ↑

Aren't there intermediary languages that can be unpacked in binary form by the computer? Or doesn't that question make sense. I can't imagine how a music file could contain every sound on a record rendered in a binary code, that already seems too large to be practical, but maybe here is where the limitedness of my understanding is too great.
Click to expand...

So I mean there are programming languages where people don't have to actually program in binary code, right? And these have more characters, whatever other differences there may be, than a binary code. So you can communicate with a computer with a richer alphabet than (1,0), so a music file should be able to, also. Isn't that right?

Chris DeVoe · Nov 11, 2017

bzfgt said: ↑

This reply I don't understand, unless the length of a chunk of code was de facto infinite this shouldn't be the case (and I guess de facto infinite would be "too long for the computer to read" in this case, so back to the question why you can't get a computer that understands a language with lots of characters).
Click to expand...

Unless things have changed significantly since the last time I looked into it, lossless compression means getting out exactly the same bits you put in. Checking the Wikipedia entry for FLAC, it is still limited to "40% to 50%" of the size of the original uncompressed file, which is another way of saying 2:1 or slightly better.

This is for audio. Different types of images can be more compressible, being better suited to techniques like Run Length Encoding.

bzfgt · Nov 11, 2017

Chris DeVoe said: ↑

Unless things have changed significantly since the last time I looked into it, lossless compression means getting out exactly the same bits you put in. Checking the Wikipedia entry for FLAC, it is still limited to "40% to 50%" of the size of the original uncompressed file, which is another way of saying 2:1 or slightly better.
Click to expand...

Yes my question isn't about the current limit, but about why this cannot be surpassed.

David G. · Nov 11, 2017

bzfgt said: ↑

So I mean there are programming languages where people don't have to actually program in binary code, right? And these have more characters, whatever other differences there may be, than a binary code. So you can communicate with a computer with a richer alphabet than (1,0), so a music file should be able to, also. Isn't that right?
Click to expand...

You're confusing "language" and "data." Those are two completely separate and unrelated concepts. Computers can currently only understand ones and zeros (data). There are plenty of computer "languages" (ways of programming) but the data storage required for each one is about the same.

bzfgt · Nov 11, 2017

David G. said: ↑

You're confusing "language" and "data." Those are two completely separate and unrelated concepts. Computers can currently only understand ones and zeros (data). There are plenty of computer "languages" (ways of programming) but the data storage required for each one is about the same.
Click to expand...

Crap, OK. So my suggestion is impracticable unless computers are reinvented, basically?

Chris DeVoe · Nov 11, 2017

bzfgt said: ↑

Yes my question isn't about the current limit, but about why this cannot be surpassed.
Click to expand...

Math.

Chris DeVoe · Nov 11, 2017

bzfgt said: ↑

Crap, OK. So my suggestion is impracticable unless computers are reinvented, basically?
Click to expand...

Not even then. It wouldn't matter at all if the symbols were stored in a tercentenary format, the amount of data would remain the same.

I'm trying to remember the name of the book I read, but it was by the director of Bell Labs.

Myself, I don't care about lossless versus lossy, I only care about quality of the lossy compression algorithm. That bridge was crossed a long time ago. Nobody but professional photographers deal with uncompressed still images, and nobody but the cinematographer sees uncompressed movie images. Only in audio do people seem to care, mostly because the relatively small file size makes it possible to care. Nobody is going to download a six terabyte uncompressed copy of a film.

Jesus Jeronimo · Nov 11, 2017

This is a most entertaining thread!
Probably the answer to your question is very close to what you are proposing, since current compression algorithms already use what you call a dictionary in order to compress. This is, they will analyse all the data in a file, then look for patterns and then turn them into little "words" that take smaller space.

But the limit here relates to binary codes and math, as of today there is only so much compression which you can apply. New algorithms are being developed as we speak, but it's also true that since bandwidths and storage cost have been steadily getting better in the last 20 years, I don't we've had the need to really push the envelope in this subject.

Please, people post more about this. Very interesting.

J

qwerty · Nov 11, 2017

And remember that mp3 gets the audio file smaller because it discards audio information to make it smaller.

Given the size of the audio industry and the number of super-intelligent boffins out there, I'm sure that if there was a way to get lossless compression to the level of mp3 compression we would know about it. And I'm also sure that there are lots of people out there thinking about this problem. I think we are stuck with the computing systems we have. Maybe when we move to biological-computing or some of the nano-technology stuff gets implemented into standard computer hardware (I'm a bit out of touch with these developments now) that we will be able to get better compression. But by that stage we probably won't need it either, as huge storage space would cheap and plentiful.

bzfgt · Nov 11, 2017

Chris DeVoe said: ↑

Not even then. It wouldn't matter at all if the symbols were stored in a tercentenary format, the amount of data would remain the same.

I'm trying to remember the name of the book I read, but it was by the director of Bell Labs.

Myself, I don't care about lossless versus lossy, I only care about quality of the lossy compression algorithm. That bridge was crossed a long time ago. Nobody but professional photographers sees uncompressed still images, and nobody but the cinematographer sees uncompressed movie images. Only in audio do people seem to care, mostly because the relatively small file size makes it possible to care. Nobody is going to download a six terabyte uncompressed copy of a film.
Click to expand...

I care because my ipod gets full.

bzfgt · Nov 11, 2017

qwerty said: ↑

And remember that mp3 gets the audio file smaller because it discards audio information to make it smaller.

Given the size of the audio industry and the number of super-intelligent boffins out there, I'm sure that if there was a way to get lossless compression to the level of mp3 compression we would know about it. And I'm also sure that there are lots of people out there thinking about this problem. I think we are stuck with the computing systems we have. Maybe when we move to biological-computing or some of the nano-technology stuff gets implemented into standard computer hardware (I'm a bit out of touch with these developments now) that we will be able to get better compression. But by that stage we probably won't need it either, as huge storage space would cheap and plentiful.
Click to expand...

So basically what you're saying is, no one is going to read this and say "Eureka! You've solved the problem! It was exactly an untutored eye that we needed to truly see the solution! I am off to work up a prototype, and if you partner with me we will both shortly be millionaires!"

I mean, that's what I was hoping to read...

David G. · Nov 11, 2017

bzfgt said: ↑

Crap, OK. So my suggestion is impracticable unless computers are reinvented, basically?
Click to expand...

Yes. The entire world is on a quest for quantum computing. Quantum computers use a ternary system for data; instead of ones and zeros, there are three positions to each data "switch" -- 1, 0, and both 1 and 0 at the same time (or possibly -1, 0, and +1). Not only would this exponentially increase processing power, but it would also mean that tremendous amounts of data could be stored in hard drives that we would consider tiny.

Think of it this way: it would no longer be about "compressing" data. It would be about having the room to store tens of thousands of times the amount of data on a drive 1/100 the size of the drives we have now. You no longer need to compress data for storage when storage space is no longer an issue.

Plan9 · Nov 11, 2017

UltraSoundSquid said: ↑

Thanks. Having different levels on a lossless codec made me question the actual validity of its losslessness (if that's a word).

So in order to save space on my horrible computers, I should be fine if I set each FLAC file at level 1? The processors can barely handle the operating system, let alone more than two programs at once. I've been saving everything as WAV. If FLAC will save a ton of space, then I'd better start doing that.
Click to expand...

FLAC is absolutely lossless, I assure you.
Try encoding one track (preferably the biggest and highest resolution you have: 24bit/something) at FLAC 7 or 8 and see if your computer can play it without stuttering. If it doesn't, reduce the encode to 5 or so and test it again until your computer can process it without problems.
Indeed if you do this you will save around 40% of the space your WAV files used to take.

Chris DeVoe · Nov 11, 2017

Plan9 said: ↑

FLAC is absolutely lossless, I assure you.
Try encoding one track (preferably the biggest and highest resolution you have: 24bit/something) at FLAC 7 or 8 and see if your computer can play it without stuttering. If it doesn't, reduce the encode to 5 or so and test it again until your computer can process it without problems. Indeed if you do this you will save around 40% of the space your WAV files used to take.
Click to expand...

Yes, but the difference between 50% and 40% is not worth talking about when compared to the size difference with lossy techniques at high bitrates like 256k.

Rip your own music and experiment with different compression rates on different material. Listen for compression artifacts on crash cymbals and applause. Use an A/B/X comparator. Don't blindly accept the orthodoxy that the only good quality playback sources are uncompressed.

Again, every single second of HD video you've ever seen has been compressed - seriously, unless you're a broadcast camera operator or cinematographer you have never seen uncompressed HD video. And audio is vastly more simple to compress than video.

Get over it.

Edit to add: Having kicked the hornet's nest, I'm going to sleep.

Randoms · Nov 11, 2017

UltraSoundSquid said: ↑

Thanks. Having different levels on a lossless codec made me question the actual validity of its losslessness (if that's a word).

So in order to save space on my horrible computers, I should be fine if I set each FLAC file at level 1? The processors can barely handle the operating system, let alone more than two programs at once. I've been saving everything as WAV. If FLAC will save a ton of space, then I'd better start doing that.
Click to expand...

As the saying goes, lossless is lossless.

dBpoweramp now has a setting beyond 1-8, which is uncompressed, lossless FLAC, for the more paranoid. Funnily enough this gives an identical result to 1-8.

I use the default of 5. The difference between 5 and 8 is an extremely small difference in file size (but takes more processing power, and hence slightly more time), but identical file quality - it is lossless.

Apart from file size and identical quality, there are other advantages to using FLAC over WAV, especially when converting to a different file type.

coffeetime · Nov 11, 2017

bzfgt said: ↑

I care because my ipod gets full.
Click to expand...

Assuming your iPod's CPU can cope with higher levels of compression, the trade off would be a drop in battery life as the 'cost' of uncompressing the more highly losslessly compressed data would be greater than the 'cost' of uncompressing less compressed lossless data.

With iOS 11, iOS devices can now handle HEIF for images and HEVC (aka H265) for video. Some older devices that support iOS 11 hardware decompression support (fast & battery efficient) with the rest having software decompression support (slower, CPU intensive and therefore battery inefficient) and only newer devices have both hardware encode and decode support.

So with greater data compression and therefore 'more music in the same space' on your iPod, you either need a newer device that supports the more compute intensive decompression in hardware or (maybe) you have your existing device but with a shorter battery life per charge.

As @Chris DeVoe says, the advantage gained in smaller file sizes isn't worth the trade off in either needing more compute power/dedicated decode hardware support and/or a drop in battery life. For what many manufacturers and customers deem best, the sweet spot of current file size vs playback quality vs battery life vs device production costs has been reached.

The problem of capacity given these file sizes is being more cost effectively solved by increasing device storage capacities (128GB iPods, 256GB phones, SD expansion on higher end dedicated portable digital audio players), than would be by supporting ever more compute demanding compression ratios.

This is before we even get to streaming, either from a paid provider or from a 'digital locker', as a means of solving device capacity issues.

My own personal sweet spot for file size vs playback quality has been reached with 256Kb/s AAC for iTunes purchases & Apple Music (including much classical, not forgiving of poor lossy compression), and 320Kb/s AAC for my own CD rips. I find good production and mastering of the music in the first place goes much further than current, commonly sold & used codecs which, when used correctly, are pretty transparent and already highly space efficient.

Plan9 · Nov 11, 2017

Chris DeVoe said: ↑

Yes, but the difference between 50% and 40% is not worth talking about when compared to the size difference with lossy techniques at high bitrates like 256k.

Rip your own music and experiment with different compression rates on different material. Listen for compression artifacts on crash cymbals and applause. Use an A/B/X comparator. Don't blindly accept the orthodoxy that the only good quality playback sources are uncompressed.

Again, every single second of HD video you've ever seen has been compressed - seriously, unless you're a broadcast camera operator or cinematographer you have never seen uncompressed HD video. And audio is vastly more simple to compress than video.

Get over it.

Edit to add: Having kicked the hornet's nest, I'm going to sleep.
Click to expand...

Where did that come from?
I was addressing a specific issue for another member, not discussing the merits of lossy vs lossless compression, which is beyond the intended scope of this thread, I think.
Lossy has its uses, lossless has its uses. They're different and both valid. I use both on a daily basis.
What's more, we have already more than enough of these discussions, like the "analog vs digital" ones.

bzfgt · Nov 11, 2017

coffeetime said: ↑

Assuming your iPod's CPU can cope with higher levels of compression, the trade off would be a drop in battery life as the 'cost' of uncompressing the more highly losslessly compressed data would be greater than the 'cost' of uncompressing less compressed lossless data.

With iOS 11, iOS devices can now handle HEIF for images and HEVC (aka H265) for video. Some older devices that support iOS 11 hardware decompression support (fast & battery efficient) with the rest having software decompression support (slower, CPU intensive and therefore battery inefficient) and only newer devices have both hardware encode and decode support.

So with greater data compression and therefore 'more music in the same space' on your iPod, you either need a newer device that supports the more compute intensive decompression in hardware or (maybe) you have your existing device but with a shorter battery life per charge.

As @Chris DeVoe says, the advantage gained in smaller file sizes isn't worth the trade off in either needing more compute power/dedicated decode hardware support and/or a drop in battery life. For what many manufacturers and customers deem best, the sweet spot of current file size vs playback quality vs battery life vs device production costs has been reached.

The problem of capacity given these file sizes is being more cost effectively solved by increasing device storage capacities (128GB iPods, 256GB phones, SD expansion on higher end dedicated portable digital audio players), than would be by supporting ever more compute demanding compression ratios.

This is before we even get to streaming, either from a paid provider or from a 'digital locker', as a means of solving device capacity issues.

My own personal sweet spot for file size vs playback quality has been reached with 256Kb/s AAC for iTunes purchases & Apple Music (including much classical, not forgiving of poor lossy compression), and 320Kb/s AAC for my own CD rips. I find good production and mastering of the music in the first place goes much further than current, commonly sold & used codecs which, when used correctly, are pretty transparent and already highly space efficient.
Click to expand...

Got it. But what I really want is a device the size of a playing card which holds the equivalent of 1 tb or more of music; more room or smaller files, I care little...

...but then again, I suppose my ideas are too "futuristic" and visionary for the suits to accept. The same people laughed at Edison when he said the world is round....

bzfgt · Nov 11, 2017

OK, sorry, one more question about binary vs. lengthy alphabet before I give up. Can't the computer have a way of reading a file where it knows that "D2jA" = "11010001100010101000111011111101111000101000111010111110001?" So that the file could be small, but it could still communicate a variety of complex information to the computer/reader which then translates it into binary code for itself?

coffeetime · Nov 11, 2017

bzfgt said: ↑

Got it. But what I really want is a device the size of a playing card which holds the equivalent of 1 tb or more of music; more room or smaller files, I care little...

...but then again, I suppose my ideas are too "futuristic" and visionary for the suits to accept. The same people laughed at Edison when he said the world is round....
Click to expand...

It’s not matter of insufficient imagination. Any device is part of an ecosystem, which in turn needs support from everyone involved in the chain. So for highly compressed files & a device that is built primarily for storage and battery life above all else, it needs the manufacturer of the device to design and build it. It will need software. it will need to interface with some sort of store and delivery method (download to device? Read from SD cards?). The labels themselves will need to support the format and be easily able to produce and submit files in that format to the stores. For all that to happen, there needs to be a proven market for it for everyone in the chain to go to all the expense and bother.

Neil Young’s Pono project attempted to do what you propose, albeit with playback quality as the prime concern. Those who bought and used the player spoke highly of the device itself, less so of the store and the entire endeavour (arguably) never achieved critical mass commercially to allow it to be a self sustaining enterprise. Ultimately the iOS/iTunes combo alongside Amazon Music, Google, Android phones & DAPs not to mention Spotify, Pandora etc collectively all won out. With established codecs, playback components etc.

MQA as a new ‘format’ hasn’t gained any traction despite its touted benefits - launching anything new at this point isn’t impossible but faces an uphill struggle against the incumbent formats and ecosystems. Pono struggled and that was built around existing, established codecs and hardware standards.

When many are happy with a relatively low capacity device and either pay for streaming (Spotify Premium, AM, Tidal) or stream for free (Spotify free, YouTube) then you’ve already little to no chance of selling to these people. A little over decade ago, these were people who were happy to carry on buying CDs rather than buying new hardware and media for SACD or DVD-A.

All that said, Aston & Kern do a very well regarded small digital music player with expandable capacity and better DAC than most phones and dedicated players. It might be worth looking to see if it supports some of the more highly compressed FLAC files possible. Might fit the bill of reducing file size whilst maintaining quality, thus increasing capacity?

bzfgt said: ↑

OK, sorry, one more question about binary vs. lengthy alphabet before I give up. Can't the computer have a way of reading a file where it knows that "D2jA" = "11010001100010101000111011111101111000101000111010111110001?" So that the file could be small, but it could still communicate a variety of complex information to the computer/reader which then translates it into binary code for itself?
Click to expand...

The longer the length of the discrete data you want to encode, the bigger the length of the substitution dictionary you need.

"D2jA" might well = "11010001100010101000111011111101111000101000111010111110001" But change one bit of data in and you need something other than D2jA to represent it. Change one bit more and you need another. Work out the list of 4 characters substitutions you need for all permutations of 1s and 0s in your binary ‘word’ and it starts getting pretty lengthy indeed. The list of ‘shortened substitutions’ gets to be so long that your compression ratio isn’t so great, not to mention you then need a massive look up table for the sheer number of substitutions possible.

RomanZ · Nov 11, 2017

Data Compression: A little introduction for beginners

bzfgt · Nov 11, 2017

coffeetime said: ↑

"D2jA" might well = "11010001100010101000111011111101111000101000111010111110001" But change one bit of data in and you need something other than D2jA to represent it. Change one bit more and you need another. Work out the list of 4 characters substitutions you need for all permutations of 1s and 0s in your binary ‘word’ and it starts getting pretty lengthy indeed. The list of ‘shortened substitutions’ gets to be so long that your compression ratio isn’t so great, not to mention you then need a massive look up table for the sheer number of substitutions possible.
Click to expand...

Right, if you'll remember though that was my premise to begin with--a massively long code language, so that a huge file could be shrunk to a tiny one. That's the whole idea.

I don't think I understand how that would negatively affect compression ratio though. I thought it would put all the burden on the device reading the file.

But I hadn't considered your other point, that there isn't a market for it.

Ipods that they now sell hold something like, I think, 16 gb. Whatever the market is for, it's out of step with me...so I believe you and I guess the question is just is this technically possible, although also just hypothetically affordable and doable in a possible world like this one in every respect except that people want really small files.

Tsomi · Nov 11, 2017

This might be of some help, for the curious ones:
https://ese.wustl.edu/ContentFiles/...bPages/su10/AlexBenjamin_AudioCompression.pdf

FLAC doesn't have the most perfect compression algorithm. It trades a bit of compression efficiency for a few important things in your every day usage: not too much CPU processing for decoding, the ability to seek anywhere in the file, streaming, constant speed, RAM usage on small devices, etc. The FLAC compression settings (which go from 0 to 8, from memory) let you trade a bit of resources usage for a bit more compression.

And that's probably good enough for the vast majority of people, really ("it is optimized for decoding speed at the expense of encoding speed, because it makes it easier to decode on low-powered hardware, and because you only encode once but you decode many times").

This doesn't mean that Google (or whoever else) will not invent a new format that compresses even better, though.

Andreas · Nov 11, 2017

If the code had more different characters, the compressed file wouldn't be any smaller.

Scenario 1: Binary code, 16 bits of information ==> compressed file is 16 bits long
Scenario 2: Code with 16 different characters, same file as above ==> compressed file is only 4 characters long, but each character takes up 4 bits ==> compressed file is still 16 bits long

bzfgt · Nov 11, 2017

Andreas said: ↑

If the code had more different characters, the compressed file wouldn't be any smaller.

Scenario 1: Binary code, 16 bits of information ==> compressed file is 16 bits long
Scenario 2: Code with 16 different characters, same file as above ==> compressed file is only 4 characters long, but each character takes up 4 bits ==> compressed file is still 16 bits long
Click to expand...

Why would each character take up more bits? I am taking it that the code is the information.

Log in or Sign up

A Question About Compressing Files

bzfgt The Grand High Exalted Mystic Ruler Thread Starter

Chris DeVoe RIP Vickie Mapes Williams (aka Equipoise)

bzfgt The Grand High Exalted Mystic Ruler Thread Starter

David G. Forum Resident

bzfgt The Grand High Exalted Mystic Ruler Thread Starter

Chris DeVoe RIP Vickie Mapes Williams (aka Equipoise)

Chris DeVoe RIP Vickie Mapes Williams (aka Equipoise)

Jesus Jeronimo Forum Resident

qwerty A resident of the SH_Forums.

bzfgt The Grand High Exalted Mystic Ruler Thread Starter

bzfgt The Grand High Exalted Mystic Ruler Thread Starter

David G. Forum Resident

Plan9 Mastering Engineer

Chris DeVoe RIP Vickie Mapes Williams (aka Equipoise)

Randoms Aerie Faerie Nonsense

coffeetime Senior Member

Plan9 Mastering Engineer

bzfgt The Grand High Exalted Mystic Ruler Thread Starter

bzfgt The Grand High Exalted Mystic Ruler Thread Starter

coffeetime Senior Member

RomanZ Forum Resident

bzfgt The Grand High Exalted Mystic Ruler Thread Starter

Tsomi Forum Resident

Andreas Senior Member

bzfgt The Grand High Exalted Mystic Ruler Thread Starter

Share This Page

Log in or Sign up

A Question About Compressing Files

bzfgt The Grand High Exalted Mystic Ruler Thread Starter

Chris DeVoe RIP Vickie Mapes Williams (aka Equipoise)

bzfgt The Grand High Exalted Mystic Ruler Thread Starter

David G. Forum Resident

bzfgt The Grand High Exalted Mystic Ruler Thread Starter

Chris DeVoe RIP Vickie Mapes Williams (aka Equipoise)

Chris DeVoe RIP Vickie Mapes Williams (aka Equipoise)

Jesus Jeronimo Forum Resident

qwerty A resident of the SH_Forums.

bzfgt The Grand High Exalted Mystic Ruler Thread Starter

bzfgt The Grand High Exalted Mystic Ruler Thread Starter

David G. Forum Resident

Plan9 Mastering Engineer

Chris DeVoe RIP Vickie Mapes Williams (aka Equipoise)

Randoms Aerie Faerie Nonsense

coffeetime Senior Member

Plan9 Mastering Engineer

bzfgt The Grand High Exalted Mystic Ruler Thread Starter

bzfgt The Grand High Exalted Mystic Ruler Thread Starter

coffeetime Senior Member

RomanZ Forum Resident

bzfgt The Grand High Exalted Mystic Ruler Thread Starter

Tsomi Forum Resident

Andreas Senior Member

bzfgt The Grand High Exalted Mystic Ruler Thread Starter

Share This Page

Useful Searches