
| Key: |
CAA-17
|
| Type: |
Improvement
|
| Status: |
Closed
|
| Resolution: |
Fixed
|
| Priority: |
Normal
|
| Assignee: |
Robert Kaye
|
| Reporter: |
Ian McEwen
|
| Votes: |
3
|
| Watchers: |
1
|
|
If you were logged in you would be able to see more operations.
|
|
|
The archive.org thumbnailer bot is seemingly running with a low quality setting for their JPEG compression, and possibly also some blurring. This means that thumbnails are ending up with substantial JPEG artifacts, even when they are high-quality initially. This is especially bad for text and for vector-graphics-style blocks, as you can see in the examples below (generally: anything with sharp edges). A lot of text ends up unreadable or close to it, especially in 250px thumbnails.
I'm not sure what the archive is using, but they should up the quality setting, whatever it is – less blurring, better JPEG quality settings, or perhaps they just have a size limit that should be increased so their process can use higher-quality compression.
My own experimentation on the topic follows, with some examples at the bottom:
If they're using imagemagick's mogrify, the option they need is likely '-quality': http://www.imagemagick.org/script/command-line-options.php?ImageMagick=t820u50n41e6sb4blf1phj31j1#quality – their thumbnails don't look the same as my experimentation with mogrify, though, so I'd guess not. Their artifact level seems between those images with quality set 25 and 50 (out of 100). mogrify's default is "whatever it can detect, or 92".
After some experimentation, the sizes of the resulting files for 25, 50, 75, and default quality, are: 8K, 8K, 12K, and 60K for the first example below (9.7M to start), and 32K, 36K, 44K, and 60K for the second example below (396K to start). The archive's files are around 10K, so I'm impressed with the small sizes they're getting, but the artifacts are quite bad, especially since we're using thumbnails for most places we're displaying anything. They may be doing some sort of blur first, as well; the black text is appearing much more gray in their images.
Examples:
600dpi scan, with text: http://ia601204.s3dns.us.archive.org/mbid-18d9bb0c-2cba-47b3-b9ce-da770c6f0cc9/mbid-18d9bb0c-2cba-47b3-b9ce-da770c6f0cc9-834309549_thumb250.jpg
Original: http://ia601204.s3dns.us.archive.org/mbid-18d9bb0c-2cba-47b3-b9ce-da770c6f0cc9/mbid-18d9bb0c-2cba-47b3-b9ce-da770c6f0cc9-834309549.jpg
Vector graphics + rendered CG: http://ia601207.s3dns.us.archive.org/mbid-081ff32e-7f3a-4b0f-85a0-cf247dd54f5d/mbid-081ff32e-7f3a-4b0f-85a0-cf247dd54f5d-838290502_thumb250.jpg
Original: http://ia601207.s3dns.us.archive.org/mbid-081ff32e-7f3a-4b0f-85a0-cf247dd54f5d/mbid-081ff32e-7f3a-4b0f-85a0-cf247dd54f5d-838290502.jpg
|
|
Description
|
The archive.org thumbnailer bot is seemingly running with a low quality setting for their JPEG compression, and possibly also some blurring. This means that thumbnails are ending up with substantial JPEG artifacts, even when they are high-quality initially. This is especially bad for text and for vector-graphics-style blocks, as you can see in the examples below (generally: anything with sharp edges). A lot of text ends up unreadable or close to it, especially in 250px thumbnails.
I'm not sure what the archive is using, but they should up the quality setting, whatever it is – less blurring, better JPEG quality settings, or perhaps they just have a size limit that should be increased so their process can use higher-quality compression.
My own experimentation on the topic follows, with some examples at the bottom:
If they're using imagemagick's mogrify, the option they need is likely '-quality': http://www.imagemagick.org/script/command-line-options.php?ImageMagick=t820u50n41e6sb4blf1phj31j1#quality – their thumbnails don't look the same as my experimentation with mogrify, though, so I'd guess not. Their artifact level seems between those images with quality set 25 and 50 (out of 100). mogrify's default is "whatever it can detect, or 92".
After some experimentation, the sizes of the resulting files for 25, 50, 75, and default quality, are: 8K, 8K, 12K, and 60K for the first example below (9.7M to start), and 32K, 36K, 44K, and 60K for the second example below (396K to start). The archive's files are around 10K, so I'm impressed with the small sizes they're getting, but the artifacts are quite bad, especially since we're using thumbnails for most places we're displaying anything. They may be doing some sort of blur first, as well; the black text is appearing much more gray in their images.
Examples:
600dpi scan, with text: http://ia601204.s3dns.us.archive.org/mbid-18d9bb0c-2cba-47b3-b9ce-da770c6f0cc9/mbid-18d9bb0c-2cba-47b3-b9ce-da770c6f0cc9-834309549_thumb250.jpg
Original: http://ia601204.s3dns.us.archive.org/mbid-18d9bb0c-2cba-47b3-b9ce-da770c6f0cc9/mbid-18d9bb0c-2cba-47b3-b9ce-da770c6f0cc9-834309549.jpg
Vector graphics + rendered CG: http://ia601207.s3dns.us.archive.org/mbid-081ff32e-7f3a-4b0f-85a0-cf247dd54f5d/mbid-081ff32e-7f3a-4b0f-85a0-cf247dd54f5d-838290502_thumb250.jpg
Original: http://ia601207.s3dns.us.archive.org/mbid-081ff32e-7f3a-4b0f-85a0-cf247dd54f5d/mbid-081ff32e-7f3a-4b0f-85a0-cf247dd54f5d-838290502.jpg |
Show » |
Sort Order:
|
This issue has been raised with IA. I also suggested that if they couldn't find time to work on this, one of us could work on it. We'll have to see what they say.