Just some stuff I want to share with you
I ended my photo cleanup post with the statement that I should review how I resize images and handle the Exif tags in them.
That process took over 30 hours to resize the (5.000+) photos on the site and still resulted in some issues. Over the last weeks, I switched to another toolset to do the resizing. It now takes less than 1 hour to resize the images and so far, I have not found any issues with the results.
Up till now, I used MiniMagick, MiniExiftool and MultiExiftool to do the resizing and Exif processing. All 3 are ruby wrappers for ImageMagick and ExifTool. Those tools do the actual work.
ImageMagick and ExifTool are powerful tools that provide lots of functionality that is not used for this website. The wrappers do not add anything to the functionality to these tools. Since the wrappers are spawning processes for each moment where they need to use ImageMagick or ExifTool, the performance is low. Another downside of this approach was that for each photo I had to use all 3 of them in a particular order. Each of them would then open the “previous” image file, do their own little bit of magic, and then update the file or write it to another file. This resulted in lots of IO and overhead. Overall, the performance of this approach was poor.
I started to search for other ways of processing the photos and eventually found libvips.
According to the benchmarks and speed and memory use information in the libvips wiki pages, it should be much faster than ImageMagick. And libvips uses libexif to handle the Exif tag processing. As a result, there is no need for multiple tools; this one does it all and should be much faster as well.
The next step was to get it to work. Fortunately, there is a nice ruby gem to work with libvips: ruby vips. And there is lots of documentation as well (although you will need to switch between the ruby vips and libvips documentation every now and then to fully grasp how to get going).
I created an application to process some individual photos and was amazed by the speed increase. Based on that, I updated my plugin. Here’s a snippet of the code that does the actual work:
# By now, we know what the size of the image should be: @width and @height
buffer = IO.binread(path)
vips_image = Vips::Image.thumbnail_buffer buffer, @width, height: @height, size: 'down', linear: false
# Clean up the metadata/tags.
#
# We have a list of tags that we do not want to remove.
# Start by getting all current tags, then build a list of tags that we don't need anymore
fields = vips_image.get_fields
to_remove = fields.difference([
"exif-data",
"exif-ifd0-Artist",
"exif-ifd0-DateTime",
"exif-ifd2-DateTimeDigitized",
"exif-ifd2-DateTimeOriginal",
"exif-ifd2-TimeZoneOffset",
"exif-ifd3-GPSAltitude",
"exif-ifd3-GPSAltitudeRef",
"exif-ifd3-GPSDateStamp",
"exif-ifd3-GPSLatitude",
"exif-ifd3-GPSLatitudeRef",
"exif-ifd3-GPSLongitude",
"exif-ifd3-GPSLongitudeRef",
"exif-ifd3-GPSMapDatum",
"exif-ifd3-GPSSatellites",
"exif-ifd3-GPSSpeed",
"exif-ifd3-GPSSpeedRef",
"exif-ifd3-GPSTimeStamp"
])
# Remove all tags that we don't need.
to_remove.each do |field_name|
vips_image.remove field_name
end
# Add a copyright and artist Exif tag to the image.
#
# We already have a @owner that holds the name of the person that took the picture,
# but we need to find out when the picture got taken.
#
# There are several Exif tags that we can use for that.
# Once we have found one, we can construct the actual copyright and artist tag
# and add them to the image.
date_fields = [
"exif-ifd2-DateTimeOriginal",
"exif-ifd2-DateTimeDigitized",
"exif-ifd3-GPSTimeStamp",
"exif-ifd0-DateTime"
]
date_fields.each do |field_name|
if vips_image.get_typeof(field_name) > 0
field = vips_image.get field_name
year = field[0,4]
copyright = "Copyright #{year} by #{@owner}"
vips_image.set_type Vips::REFSTR_TYPE, "exif-ifd0-Artist", @owner
vips_image.set_type Vips::REFSTR_TYPE, "exif-ifd0-Copyright", copyright
break
end
end
# Write the final result to the destination path
vips_image.write_to_file dest_path, strip: false, Q: 95
Then it was time to go for it: resize all photos. It took less than 1 hour to do all of them. Initially I did not trust what I saw; this was much to quick. After lots of checking, everything checked out to be okay.
So, here we are. All photos have been resized and all Exif tags have been cleaned up.
Happy with yet another improvement. Time to move on to the next item on the to do list.