Batch File Renaming from CSV and Image Resizing

Renaming the files

A colleague in a different section (Software Solutions) of IT Services asked if I could help renaming and processing some images provided by the Bodleian Library.  They provided huge TIFF files of manuscripts of Wycliffite Bibles and he had a spreadsheet with column A being a filename (without extension) and column B being the image ID in the website he is building for the project.

That is it looked something like:

Image File Name Image ID Number
abc0014 123

with many, many, more entries of course!

The script I wrote to rename the files was:


# Set up variables

# separate on newlines only

# Loop all lines in CSV
CSV_LINES=($(cat "$CSV_FILE"))
for CSV_LINE in ${CSV_LINES[@]}
OLD_NAME=`echo "$CSV_LINE" | grep -o '^"[^"][^"]*"' | sed 's/"//g'`$EXT
NEW_NAME=`echo "$CSV_LINE" | grep -o '"[^"][^"]*"$' | sed 's/"//g'`$EXT

#output message if $OLD_NAME doesn't exist
if ! [ -f "$INPUT_DIR/$OLD_NAME" ];
echo "No $OLD_NAME"

# continue only if old file actually exists
if [ -f "$INPUT_DIR/$OLD_NAME" ];
# continue only if new filename given
if [ -n "$NEW_NAME" ];
#copy file to new name

IFS=$' tn'

What this did was go through the CSV and if there was both an old name (column A) and a new name (column B) it copied the file to a new output directory. It also moved a copy of the input file out of the way into a ‘processed’ directory just to note that it had been processed. There are a lot of other ways one could do this, in bash and using other technologies, but you use the hammer you happen to have to hand.

Bulk resizing of images

Over the years I’ve done lots of image processing, cropping, resizing, rotating, extracting metadata, etc. My tool of choice when doing this is Imagemagick which is cross platform and incredibly powerful.  It can do incredibly complicated things, but to do simple things like scaling images, cropping them, making montages, etc., is all fairly easy. To do much more difficult things does take a bit of trial and error but really rewards study. To do monotonous repetitive things like this it is really quite easy.

Because I thought I might run this command many times, I created a Makefile  with a ‘resizeImages’ target.

cd output;
echo "Converting full sized" ;
for file in *.tif; do convert $$file[0] ../converted-images/full/`basename $$file .tif`.jpg; done;
cd ../converted-images/full/ ;
echo "Doing large / medium / small / thumbs / tiny" ;
for file in *.jpg; do echo "Doing $$file" ;
convert -scale 1000 $$file ../large/$$file;
convert -scale 500 $$file ../medium/$$file;
convert -scale 250 $$file ../small/$$file;
convert -scale 150 $$file ../thumb/$$file;
convert -scale 50 $$file ../tiny/$$file; done;
echo "Done";

What this does to begin with is to go into the output directory (from the renaming of files above), and using a standard bash for loop, takes each TIFF file and does:

convert $file[0] ../converted-images/full/`basename $file .tif`.jpg

This uses imagemagick’s ‘convert’ utility to convert one of the TIFF files, say 123.tif, to a jpeg. The $file ($$file above in the Makefile) has a [0] after it because we only want the first image embedded in the TIFF file. (The second is an embedded thumbnail.) It puts the output in the directory at ../converted-images/full/ and names the file using the ‘basename’ command. This linux command enables us to strip off the extension from ‘123.tif’ and be left with ‘123’, to which we append a ‘.jpeg’ to tell ‘convert’ that we want the output to be a jpeg. If we were converting just one file this might be:

convert 123.tif[0] ../converted-images/full/123.jpg

which is a really really easy way of converting image file formats. Wrapping it in a for loop is just a convenient way to have the right filenames, imagemagick can cope with wildcards in various ways as well.

After this Makefile target has converted all of the TIFFs to JPEGS it then changes directory to the converted-images/full/ directory and tells us that it is now converting them the large/medium/small/thumbs/tiny. This uses another simple bash for loop just saying

for file in *.jpg; do

and then a list of commands before ending with ‘done’.

In this case the commands are to use the convert the full-sized JPEGs to some agreed widths of large (1000px), medium (500px), small (250px), thumb (150px) and tiny (50px). This the width and the height will automagically scale to whatever it needs to maintain the aspect ratio. There is a lot more about imagemagick geometry that could be said.

This ends up with files under converted-images in full, large, medium, small, thum, and tiny directories with one image file per ID in the CSV file in  each of those directories (which the for loop assumes you’ve already made).