A Question About Compressing Files

InStepWithTheStars · Nov 11, 2017

Carl Swanson said: ↑

Isn't level 1 the least reduction in file size?
Click to expand...

8 is "best", 1 is "smallest". 5 is the default and/or recommended.

Shaddam IV · Nov 11, 2017

Reading this, I'd like to make a point. When we say "lossless" audio, what we really mean is "the same as Compact Disc standard", or "2 channels of LPCM audio, each signed 16-bit values sampled at 44100 Hz".

Compact Disc standard was itself an arbitrary standard for encoding sound, which is analog (continuous). So what we refer to as "lossless" is actually also "lossy" (compared to analog).

Realizing this helped me to drop my audiophile pretensions and not get so hung up on lossy compression (where I honestly can't hear the difference between it and "lossless"). "Lossless" isn't some gold standard meaning "there is no loss of information (from what you hear in nature)". It's just an arbitrary (and lossy!) standard put in place at the dawn of the CD era. It's a bit of a misnoner in that regard.

Or perhaps I'm wrong here!

GerryO · Nov 11, 2017

It would seem to me that digital and reduced file sizes have everything to do with expressing points along what were originally curved lines of a wave form as points on straight lines (curve fitting), increasing the distance between and reducing the number of points to make file sizes progressively smaller. And since Pi can only be estimated, something is lost right from the get-go.

Chris DeVoe · Nov 11, 2017

Shaddam IV said: ↑

Reading this, I'd like to make a point. When we say "lossless" audio, what we really mean is "the same as Compact Disc standard", or "2 channels of LPCM audio, each signed 16-bit values sampled at 44100 Hz".

Compact Disc standard was itself an arbitrary standard for encoding sound, which is analog (continuous). So what we refer to as "lossless" is actually also "lossy" (compared to analog).

Realizing this helped me to drop my audiophile pretensions and not get so hung up on lossy compression (where I honestly can't hear the difference between it and "lossless"). "Lossless" isn't some gold standard meaning "there is no loss of information (from what you hear in nature)". It's just an arbitrary (and lossy!) standard put in place at the dawn of the CD era. It's a bit of a misnoner in that regard.

Or perhaps I'm wrong here!
Click to expand...

I want to make a button to wear at the Consumer Electronics Show that says "Nyquist was right!". There is a place for greater bit depth and higher sampling rates, and that place is the recording studio where you have to manipulate audio. As it is, I doubt that me, or anyone else closing in on sixty, can hear much above 9k, and would never benefit from either in an end product. As long as idiots don't compress the dynamic range, 16/44.1 CD audio is a perfectly good standard.

Vinyl Socks · Nov 11, 2017

1s and 0s.

If you take 3/4 of a zillion 1s and 0s out of a 24/192 .flac file...then you have a skeleton...basically, a compressed mp3 or m4a or some other lossy, low-quality audio file.

I'm speaking metaphorically, mind you. I'm not saying that by simply removing the ones and zeros it magically transforms into an mp3; I'm using a bit of imagery here.

Carl Swanson · Nov 11, 2017

UltraSoundSquid said: ↑

8 is "best", 1 is "smallest". 5 is the default and/or recommended.
Click to expand...

What software? I'm using dbPoweramp, and 0 is "fastest", which means to me, least coded, 5 is recommended and 8 is "highest".

Here is the verbatim from their website (their emphasis):

"dBpoweramp FLAC Audio Codec

Being a lossless codec there are not many options to set when compressing to FLAC: Compression affects how much effort goes into compressing the audio, all compression modes give the same decoded audio (it is lossless after all), the higher compression levels will give a small % file size saving, but will require more time to compress and decompress. Compression Level 0 requires the least compression time, whilst Compression Level 8 the most. Uncompressed is a special compression mode with stores 16 bit audio in an uncompressed state."

InStepWithTheStars · Nov 11, 2017

Carl Swanson said: ↑

What software? I'm using dbPoweramp, and 0 is "fastest", which means to me, least coded, 5 is recommended and 8 is "highest".

Here is the verbatim from their website (their emphasis):

"dBpoweramp FLAC Audio Codec

Being a lossless codec there are not many options to set when compressing to FLAC: Compression affects how much effort goes into compressing the audio, all compression modes give the same decoded audio (it is lossless after all), the higher compression levels will give a small % file size saving, but will require more time to compress and decompress. Compression Level 0 requires the least compression time, whilst Compression Level 8 the most. Uncompressed is a special compression mode with stores 16 bit audio in an uncompressed state."
Click to expand...

That's probably the same as the one that comes stock with Audacity, then. I don't remember there being a zero level - although upon further inspection, indeed there is. And I did in fact have those backwards. So 8 is the smallest file size but takes the longest to load. 0 is basically WAV, maybe with a little bit of compression added. Thanks for clearing that up.

wayneklein · Nov 11, 2017

I always think of "Ant Man" when it comes to co pressing files. Don't let the, be compressed too long or they will never be full size again!

Shaddam IV · Nov 11, 2017

Vinyl Socks said: ↑

1s and 0s.

If you take 3/4 of a zillion 1s and 0s out of a 24/192 .flac file...then you have a skeleton...basically, a compressed mp3 or m4a or some other lossy, low-quality audio file.

I'm speaking metaphorically, mind you. I'm not saying that by simply removing the ones and zeros it magically transforms into an mp3; I'm using a bit of imagery here.
Click to expand...

The thing is it's not an image, it's audio.

And if you listen to that audio in lossless and then in 320kbs, on the equipment of your choosing, then guess which is which, then repeat this 10 times, if you're like everyone else you're going to be embarrassed when you see the results, if you think you can tell the difference.

So the image of a skeleton is a little less than useless as a metaphor here. In fact, it's misleading. Anyone can immediately tell a skeleton from flesh and blood. Not so with audio.

Or perhaps I've misunderstood what you're trying to say

Vinyl Socks · Nov 11, 2017

Shaddam IV said: ↑

The thing is it's not an image, it's audio.

And if you listen to that audio in lossless and then in 320kbs, on the equipment of your choosing, then guess which is which, then repeat this 10 times, if you're like everyone else you're going to be embarrassed when you see the results, if you think you can tell the difference.

So the image of a skeleton is a little less than useless as a metaphor here. In fact, it's misleading. Anyone can immediately tell a skeleton from flesh and blood. Not so with audio.

Or perhaps I've misunderstood what you're trying to say
Click to expand...

Many forum members here can. Your argument is just as pointless as my metaphor.
In fact...it's your opinion whether or not other people can hear the difference.
For me, it's fact.

Keep trying, my friend...you will get there someday! (P.S. Yeah...you kinda misunderstood what I was saying...but it's all good. It kinda has to be realized rather than taught.)

Chris DeVoe · Nov 11, 2017

Shaddam IV said: ↑

The thing is it's not an image, it's audio.

And if you listen to that audio in lossless and then in 320kbs, on the equipment of your choosing, then guess which is which, then repeat this 10 times, if you're like everyone else you're going to be embarrassed when you see the results, if you think you can tell the difference.

So the image of a skeleton is a little less than useless as a metaphor here. In fact, it's misleading. Anyone can immediately tell a skeleton from flesh and blood. Not so with audio.

Or perhaps I've misunderstood what you're trying to say
Click to expand...

Exactly! I'm an audio engineer, not an audiophile. I know what MPEG artifacts from inadequate bit rates sound like, and as someone pointed out on an earlier thread, one gets the impression that a large number of people heard some bad 128k MP3 files from the Kazaa era and have decided that's what all compressed files sound like. But they ignore the fact that every single broadcast, cable, DVD, Blu-ray and streaming video program they have watched (with the exception of a handful of concert programs) has been compressed in the exact same way.

Shaddam IV · Nov 11, 2017

Chris DeVoe said: ↑

Exactly! I'm an audio engineer, not an audiophile. I know what MPEG artifacts from inadequate bit rates sound like, and as someone pointed out on an earlier thread, one gets the impression that a large number of people heard some bad 128k MP3 files from the Kazaa era and have decided that's what all compressed files sound like. But they ignore the fact that every single broadcast, cable, DVD, Blu-ray and streaming video program they have watched (with the exception of a handful of concert programs) has been compressed in the exact same way.
Click to expand...

All of my music is lossless, always has been. I've been using FLAC for... 15 years? But recently I looked into Google Play Music because they let you upload 50,000 songs for free, and I want easy online access. Slight caveat, the files you upload are converted to 320kbs. I told people for years that I could tell an MP3 from lossless. As you say, I must have been basing this on 120kbs iTunes files from years ago, because I tested and tested and the truth is I can't tell a 320kbs file from a lossless file. I'm holding on to my lossless files of course, but I'll probably be accessing them from Google much of the time.

JohnO · Nov 11, 2017

You might find this page interesting reading, "Data Compression Explained". The copyright for this paper is held by Dell but it is freely distributable. Just read or scan through it, ignore anything you don't understand, and pick up the concepts you can understand. It discusses lossless and lossy compression types, text or binary data, audio, and still image and video.

I just want to point out that generally lossless compression types use a 64K "dictionary", which is built on the fly from 0 (1-pass), or can be 2-pass or 3-pass or more to pre-build a 64k tree and start compressing with that optimized "dictionary tree". The dictionary tree can hold up to 65536 tokens, each representing multiple bits of data, and it is those tokens which are written out (in bits not bytes) to the lossless compressed file.

But in most types of lossless compression, the 64K dictionary tree, when built on the fly, is discarded when it is full !!!!!! When one more token needs to be added (when it needs the 65537th entry in the dictionary tree), then the tree is rebuilt, again from scratch. That might seem counterintuitive, but this has been found to be the most efficient. The dictionary tree does not need to be discarded - you could just keep it filled "complete" at 64K tokens and you couldn't add more tokens while running through the rest of the file, but it has been found to be optimal to discard it - clear it - and start over, for most types of data. For certain types of data, it could be milliseconds faster to just keep it.

Data Compression Explained

Search through the file for the word "discard" - three relevant parts about this are here: (there are more instances of the word "discard" used in other ways on the page, look for these passages):

"In the original DMC, both thresholds are 2. Normally when the state table is full, the model is discarded and re-initialized. Setting higher thresholds can delay this from happening."

"Statistics are stored in a tree which grows during modeling. When the memory limit is reached, the tree is discarded and rebuilt from scratch. Optionally, the statistics associated with each context are scaled down and those with zero counts are pruned until the tree size becomes smaller than some threshold (25%). This improves compression but takes longer."

"Dictionary codes grow in length as it becomes larger. When the size is 257 to 512, each code has 9 bits. When it is 513 to 1024, each code is 10 bits, and so on. When the dictionary is full (64K = 16 bits), it is discarded and re-initialized."

Lossless FLAC types, and various types of ZIP and other compression types, use one or more of these strategies to obtain smaller files that are still lossless when decompressed.

It's been found empirically and mathematically shown that a 64K dictionary tree/table is optimal in the vast majority of cases. You could have any size tree/table you want, (and the researchers have tried) 16k, 256k, 1024k, or that zillion. But those virtually always result in larger lossless compressed file sizes, not smaller, and longer processing time!

There's so much more that goes into lossless compression, some that I understand and some that I don't. I just wanted to point out that all testing for 70 years of research about it has shown that a 64K dictionary tree/table has been found to be the most efficient. The incremental compression improvements have come in building that table in the most optimal way, choosing when to clear the table (it could be cleared before 64k is full if a read ahead predictor shows it will be better to clear it), pre and post processing, recompressing in whole or in part, etc. Etc. Etc. Etc. Etc. Etc.

Kyhl · Nov 12, 2017

bzfgt said: ↑

OK, sorry, one more question about binary vs. lengthy alphabet before I give up. Can't the computer have a way of reading a file where it knows that "D2jA" = "11010001100010101000111011111101111000101000111010111110001?" So that the file could be small, but it could still communicate a variety of complex information to the computer/reader which then translates it into binary code for itself?
Click to expand...

It could but at some point the key that explains to the computer what D2jA means would become bigger than the compressed data. Somewhere the computer needs to know the map that turns D2jA back into 11010001100010101000111011111101111000101000111010111110001.

Compression works by recognizing when 11010001100010101000111011111101111000101000111010111110001 is a repeatable sequence that can be recorded as D2jA multiple times to save space. D2ja would have a different value to be mapped. Add up all that mapping plus the compressed code and at some point the key that maps the code to a real value becomes bigger than the shortcut. When that happens you reach maximum compression.

The solution is that data storage is always getting larger and cheaper. So don't worry about it. Compress your .flacs to where you feel comfortable. By the time your HD crashes the replacement will be smaller, faster, and will store more data.

Chris DeVoe · Nov 12, 2017

Kyhl said: ↑

The solution is that data storage is always getting larger and cheaper. So don't worry about it. Compress your .flacs to where you feel comfortable. By the time your HD crashes the replacement will be smaller, faster, and will store more data.
Click to expand...

Like everyone but tech writers and a few folks grandfathered in, I pay for my phone data by the gigabyte. So I've decided to carry my music with me in my phone. I use a mix of 256k and 320k depending on the material. I've compressed my music myself using LAME. Given a choice between a lot less music using lossless, and a lot more music using lossy compression at high enough quality that I cannot hear a difference even doing A/B/X blind comparison, I'm going to go with the latter.

Again, every single second of audio you hear by every broadcast and streaming and virtually every physical video format available to you is compressed using the same techniques.

I've mentioned this several times in this thread, and so far nobody who objects to compression has answered it. I really think we're dealing with a religious issue rather than a scientific one.

bzfgt · Nov 12, 2017

Kyhl said: ↑

It could but at some point the key that explains to the computer what D2jA means would become bigger than the compressed data. Somewhere the computer needs to know the map that turns D2jA back into 11010001100010101000111011111101111000101000111010111110001.

Compression works by recognizing when 11010001100010101000111011111101111000101000111010111110001 is a repeatable sequence that can be recorded as D2jA multiple times to save space. D2ja would have a different value to be mapped. Add up all that mapping plus the compressed code and at some point the key that maps the code to a real value becomes bigger than the shortcut. When that happens you reach maximum compression.

The solution is that data storage is always getting larger and cheaper. So don't worry about it. Compress your .flacs to where you feel comfortable. By the time your HD crashes the replacement will be smaller, faster, and will store more data.
Click to expand...

Yeah you're probably right, the solution is to get more storage.

bzfgt · Nov 12, 2017

Chris DeVoe said: ↑

Like everyone but tech writers and a few folks grandfathered in, I pay for my phone data by the gigabyte. So I've decided to carry my music with me in my phone. I use a mix of 256k and 320k depending on the material. I've compressed my music myself using LAME. Given a choice between a lot less music using lossless, and a lot more music using lossy compression at high enough quality that I cannot hear a difference even doing A/B/X blind comparison, I'm going to go with the latter.

Again, every single second of audio you hear by every broadcast and streaming and virtually every physical video format available to you is compressed using the same techniques.

I've mentioned this several times in this thread, and so far nobody who objects to compression has answered it. I really think we're dealing with a religious issue rather than a scientific one.
Click to expand...

Has anyone objected to compression on this thread?

Chris DeVoe · Nov 12, 2017

bzfgt said: ↑

Has anyone objected to compression on this thread?
Click to expand...

I'd have to re-read the thread. You may be right. But there are folks on this forum who claim to always be able to tell, that any MPEG compression causes them to start bleeding out their ears.

Shaddam IV · Nov 12, 2017

bzfgt said: ↑

Has anyone objected to compression on this thread?
Click to expand...

I think the guy who was ticked off at me did !

screechmartin · Nov 12, 2017

Shaddam IV said: ↑

Reading this, I'd like to make a point. When we say "lossless" audio, what we really mean is "the same as Compact Disc standard", or "2 channels of LPCM audio, each signed 16-bit values sampled at 44100 Hz".

Compact Disc standard was itself an arbitrary standard for encoding sound, which is analog (continuous). So what we refer to as "lossless" is actually also "lossy" (compared to analog).

Realizing this helped me to drop my audiophile pretensions and not get so hung up on lossy compression (where I honestly can't hear the difference between it and "lossless"). "Lossless" isn't some gold standard meaning "there is no loss of information (from what you hear in nature)". It's just an arbitrary (and lossy!) standard put in place at the dawn of the CD era. It's a bit of a misnoner in that regard.

Or perhaps I'm wrong here!
Click to expand...

Thank you for clariifying that. My other take away, though, is that many lossless audio formats like FLAC and ALAC do not actually have the digital data of a "Compact Disc Standard." They just have a damn sight more digiital data than an MP3 file. So, to call them lossless even by the "Compact Disc Standard" is misleading. Is that correct?

Shaddam IV · Nov 12, 2017

screechmartin said: ↑

Thank you for clariifying that. My other take away, though, is that many lossless audio formats like FLAC and ALAC do not actually have the digital data of a "Compact Disc Standard." They just have a damn sight more digiital data than an MP3 file. So, to call them lossless even by the "Compact Disc Standard" is misleading. Is that correct?
Click to expand...

A FLAC file converted from CD Standard has the same data as the CD Standard when decoded.

Tsomi · Nov 12, 2017

bzfgt said: ↑

Yeah you're probably right, the solution is to get more storage.
Click to expand...

FWIW: I personally have an xDuoo X3. The DAC could be better, though... But it has 2 micro-SD card slots (where you can put at least 128 GB of data on each) and it can be switched to Rockbox pretty easily. I use Opus at 96k (supported by Rockbox), and I can't tell any difference on this particular DAC. Most of my albums weight under 40 MB thanks to Opus, so just imagine how many songs I can put on that.

I do keep FLAC files on an external hard drive, mostly as a way to archive my CDs, and to make it easier to switch to new lossy formats in the future (or to re-EQ stuff or whatever). But 2017 AAC/Opus encoders aren't 1990s MP3 encoders. If it ends up on an iPod, I'm pretty sure you're not going to notice the difference, really. A/B test it.

boiledbeans · Nov 12, 2017

screechmartin said: ↑

Thank you for clariifying that. My other take away, though, is that many lossless audio formats like FLAC and ALAC do not actually have the digital data of a "Compact Disc Standard." They just have a damn sight more digiital data than an MP3 file. So, to call them lossless even by the "Compact Disc Standard" is misleading. Is that correct?
Click to expand...

In fact, I would argue that the FLAC could have a 'damn slight more digital data' than the WAVE. With FLAC, you can add tags with the artist and song title.

Shaddam IV said: ↑

[...]
Compact Disc standard was itself an arbitrary standard for encoding sound, which is analog (continuous). So what we refer to as "lossless" is actually also "lossy" (compared to analog). [...]
Click to expand...

I understand your point, where the sound from a CD is an approximation of real life sounds.

However, when talking about digital data, (from wikipedia) "Lossless compression is a class of data compression algorithms that allows the original data to be perfectly reconstructed from the compressed data."

So in the context of CDs, lossless would mean the original data (CD) can be perfectly reconstructed from the FLAC.
This means, you could rip a CD, compress to FLAC, and in future, decompress it and burn a bit-identical CD.

In fact, I usually do more than this. When burning a duplicate CD, I add CD-TEXT using CUE sheets & EAC, so you could say the duplicate is improved.

Shaddam IV · Nov 12, 2017

boiledbeans said: ↑

In fact, I would argue that the FLAC could have a 'damn slight more digital data' than the WAVE. With FLAC, you can add tags with the artist and song title.

I understand your point, where the sound from a CD is an approximation of real life sounds.

However, when talking about digital data, (from wikipedia) "Lossless compression is a class of data compression algorithms that allows the original data to be perfectly reconstructed from the compressed data."

So in the context of CDs, lossless would mean the original data (CD) can be perfectly reconstructed from the FLAC.
This means, you could rip a CD, compress to FLAC, and in future, decompress it and burn a bit-identical CD.

In fact, I usually do more than this. When burning a duplicate CD, I add CD-TEXT using CUE sheets & EAC, so you could say the duplicate is improved.
Click to expand...

The Wikipedia quotation you cite is exactly correct. However, it is not at odds with what I said, which you seem to be implying.

"The original data (CD) can be perfectly reconstructed from the FLAC". Yes. True. What I'm saying is that the original CD data itself is a lossy approximation of an analog waveform.

Veni Vidi Vici · Nov 12, 2017

Shaddam IV said: ↑

Compact Disc standard was itself an arbitrary standard for encoding sound, which is analog (continuous). So what we refer to as "lossless" is actually also "lossy" (compared to analog)
Click to expand...

That’s not right. The original signal can be reconstructed *exactly* from the CD-DA samples, in theory, as long as its frequency is less than the Nyquist limit. And a digital-to-analogue converter outputs a continuous analogue signal.

By contrast, the cartridge of a turntable cannot exactly reproduce the original sounds which led to the cutting of the groove. There’s some significant loss there also.

In this regard, digital sampling has a theoretical advantage. However, in practice, every sound recording and reproduction system is “lossy” and always will be, because there is no perfection in nature. The components upstream and downstream, and the human ear itself cause some degradation of the signal as it undergoes various transformations before ultimately becoming that wonderful music you enjoy in your brain.

Log in or Sign up

A Question About Compressing Files

InStepWithTheStars It's a miracle, let it alter you

Shaddam IV Forum Resident

GerryO Senior Member

Chris DeVoe RIP Vickie Mapes Williams (aka Equipoise)

Vinyl Socks The Buzz Driver

Carl Swanson Senior Member

InStepWithTheStars It's a miracle, let it alter you

wayneklein Forum Fool

Shaddam IV Forum Resident

Vinyl Socks The Buzz Driver

Chris DeVoe RIP Vickie Mapes Williams (aka Equipoise)

Shaddam IV Forum Resident

JohnO Senior Member

Kyhl On break

Chris DeVoe RIP Vickie Mapes Williams (aka Equipoise)

bzfgt The Grand High Exalted Mystic Ruler Thread Starter

bzfgt The Grand High Exalted Mystic Ruler Thread Starter

Chris DeVoe RIP Vickie Mapes Williams (aka Equipoise)

Shaddam IV Forum Resident

screechmartin Senior Member

Shaddam IV Forum Resident

Tsomi Forum Resident

boiledbeans Forum Resident

Shaddam IV Forum Resident

Veni Vidi Vici Forum Resident

Share This Page

Log in or Sign up

A Question About Compressing Files

InStepWithTheStars It's a miracle, let it alter you

Shaddam IV Forum Resident

GerryO Senior Member

Chris DeVoe RIP Vickie Mapes Williams (aka Equipoise)

Vinyl Socks The Buzz Driver

Carl Swanson Senior Member

InStepWithTheStars It's a miracle, let it alter you

wayneklein Forum Fool

Shaddam IV Forum Resident

Vinyl Socks The Buzz Driver

Chris DeVoe RIP Vickie Mapes Williams (aka Equipoise)

Shaddam IV Forum Resident

JohnO Senior Member

Kyhl On break

Chris DeVoe RIP Vickie Mapes Williams (aka Equipoise)

bzfgt The Grand High Exalted Mystic Ruler Thread Starter

bzfgt The Grand High Exalted Mystic Ruler Thread Starter

Chris DeVoe RIP Vickie Mapes Williams (aka Equipoise)

Shaddam IV Forum Resident

screechmartin Senior Member

Shaddam IV Forum Resident

Tsomi Forum Resident

boiledbeans Forum Resident

Shaddam IV Forum Resident

Veni Vidi Vici Forum Resident

Share This Page

Useful Searches