Cataloging CDs and DVDs in Linux

The first problem is that CDs and DVDs don't have a unique volume ID. Often one will have more than one CD with the same volume name, either because they made two or more copies or because the volume name is the default made by the CD writing program. There is an ISO like time stamp in the first part of every data CD and DVD (which are data cds, audio cds are raw and do not have the header).

I started the search by copying the first 100 "blocks" from a CD.

dd if=/dev/cdrom1 of=cdpart count=10

Next I dumped cdpart using "hexdump".

hexdump -C cdpart | less

Examing the output I found

00000000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00008000  01 43 44 30 30 31 01 00  4c 49 4e 55 58 20 20 20  |.CD001..LINUX   |
00008010  20 20 20 20 20 20 20 20  20 20 20 20 20 20 20 20  |                |
00008020  20 20 20 20 20 20 20 20  56 65 6e 75 73 2d 32 30  |        Venus-20|
00008030  30 37 31 31 32 37 20 20  20 20 20 20 20 20 20 20  |071127          |
00008040  20 20 20 20 20 20 20 20  00 00 00 00 00 00 00 00  |        ........|
00008050  d5 a0 02 00 00 02 a0 d5  00 00 00 00 00 00 00 00  |................|
00008060  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00008070  00 00 00 00 00 00 00 00  01 00 00 01 01 00 00 01  |................|
00008080  00 08 08 00 08 04 00 00  00 00 04 08 14 00 00 00  |................|
00008090  00 00 00 00 00 00 00 16  00 00 00 00 22 00 1c 00  |............"...|
000080a0  00 00 00 00 00 1c 00 38  00 00 00 00 38 00 45 0c  |.......8....8.E.|
000080b0  1f 13 00 00 ec 02 00 00  01 00 00 01 01 00 20 20  |..............  |
000080c0  20 20 20 20 20 20 20 20  20 20 20 20 20 20 20 20  |                |
*
00008230  20 20 20 20 20 20 20 20  20 20 20 20 20 20 47 45  |              GE|
00008240  4e 49 53 4f 49 4d 41 47  45 20 49 53 4f 20 39 36  |NISOIMAGE ISO 96|
00008250  36 30 2f 48 46 53 20 46  49 4c 45 53 59 53 54 45  |60/HFS FILESYSTE|
00008260  4d 20 43 52 45 41 54 4f  52 20 28 43 29 20 31 39  |M CREATOR (C) 19|
00008270  39 33 20 45 2e 59 4f 55  4e 47 44 41 4c 45 20 28  |93 E.YOUNGDALE (|
00008280  43 29 20 31 39 39 37 2d  32 30 30 36 20 4a 2e 50  |C) 1997-2006 J.P|
00008290  45 41 52 53 4f 4e 2f 4a  2e 53 43 48 49 4c 4c 49  |EARSON/J.SCHILLI|
000082a0  4e 47 20 28 43 29 20 32  30 30 36 2d 32 30 30 37  |NG (C) 2006-2007|
000082b0  20 43 44 52 4b 49 54 20  54 45 41 4d 20 20 20 20  | CDRKIT TEAM    |
000082c0  20 20 20 20 20 20 20 20  20 20 20 20 20 20 20 20  |                |
*
00008320  20 20 20 20 20 20 20 20  20 20 20 20 20 32 30 30  |             200|
00008330  37 31 31 32 37 31 30 35  31 34 32 30 30 ec 32 30  |7112710514200.20|
00008340  30 37 31 31 32 37 31 30  35 31 34 32 30 30 ec 30  |07112710514200.0|
00008350  30 30 30 30 30 30 30 30  30 30 30 30 30 30 30 00  |000000000000000.|
00008360  32 30 30 37 31 31 32 37  31 30 35 31 34 32 30 30  |2007112710514200|
00008370  ec 01 00 20 20 20 20 20  20 20 20 20 20 20 20 20  |...             |
00008380  20 20 20 20 20 20 20 20  20 20 20 20 20 20 20 20  |                |
*
00008570  20 20 20 00 00 00 00 00  00 00 00 00 00 00 00 00  |   .............|
00008580  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*

The string beginning in line 00008020 is the volume name. That is useful too, but there is an easy way to get it. The 16 character string beginning on line 00008320 is the ISO date the CD was created. The format is yyyymmddhhmmssxx where xx is fractions of a second and usually 00.

There exists a Linux command volname which will extract the volume name.

Armed with this information I can write a simple "bash" (Bourne Again Shell) script to output the ISO timestamp and volume name.


#/bin/bash
echo -e `dd if=/dev/cdrom1 bs=1 skip=33581 count=16 2>/dev/null`'\t'`volname /dev/cdrom1` |tee -a cds.tdf

We start with a "comment" which tells which "interpreter" to use when executing the script.

"echo" prints the stuff produced by the script.

"-e" says we are going to embed special print formatting characters in the command (e.g. '\t').

The subcommands are enclosed in back ticks (quote marks that slope the other way. That is the key above the tab key on most keyboards.

The first subcommand is "dd" - dump data. The input file is /dev/cdrom1 - actually the cd/dvdrw drive in my system.The output is going to be fed to "echo". Our block size is one byte (default is 512). The isodate is 33,581 characters into the CD/DVD. I confess I did a little "hunting" to find the correct decimal equivalent to the address in the hexdump. The iso timestamp is 16 bytes long. The command will output some other info and "2>?dev/null" sends that to the bit bucket. Now we have the reason for the "-e", I want to print a tab character next. T^ab delimited files import nicely into most programs like spreadsheets where I intend to elaborate the entries in some cases.

Next we invoke the second subcommand "volname" which prints the volume name. I run this through "tee" with the -a option so this script both prints on the screen and appends to the file "cds.txt".

Now I can put a CD in cdrom drive #1, wait for the little green light to go out and run the script which I have named "whatis". Eject that disk and repeat with as many as I have tolerance for at the moment. The result looks like this:


2009042120022100	Ubuntu 9.04 i386 
2007010401573900	KNOPPIX 
2009033009330400	GLOBAL_ONENESS_PROJECT_VOL_2 
2008061111275600	UBCD4Windows 
2007112710514200	Venus-20071127 
2009033009330400	GLOBAL_ONENESS_PROJECT_VOL_2 
2009033009330400	GLOBAL_ONENESS_PROJECT_VOL_2 
2008112621292500	tosh_m30 
1998121414022900	SPSS90CD 
2009013016044700	NB65_DVD_0109 

I can open that file directly in Gnumeric, Excel and OpenOffice are clumsy, and add the additional information such as where I am going to put the CDs and a quick note when the volume name is not adequate.

If you made it this far, you should have learned a little "geek speak" and maybe even a reason to exp[lore Linux.