(((This whole section is draft)))

Parsing ID3 Tags From MP3 files

This is a sample of code that was originally developed to parse ID3 tags from the MP3's in my music collection to be indexed by my desktop search engine (using Lucene).

The ID3 tag is actually very easy to parse. It is fixed length and, if exists, occurs at the end of the file. There is more information about ID3 tags at the ID3 Website.

I am only covering the first version of the tag here because it has the greatest support. The fields available (in the order they exist in the file) in the ID3 tag follow:

FieldSize (bytes)
Header3 (always equal to 'TAG')
Title30
Artist30
Album30
Year4
Comment30
Genre1

Parsing the ID3 Tag Data

Parsing out the fixed size data becomes very simple. The following code shows how to parse the ID3 data into a Tag value object.


private Tag populateTag(ByteBuffer bBuf) {
        byte[] tag = new byte[3];
        byte[] tagTitle = new byte[30];
        byte[] tagArtist = new byte[30];
        byte[] tagAlbum = new byte[30];
        byte[] tagYear = new byte[4];
        byte[] tagComment = new byte[30];
        byte[] tagGenre = new byte[1];
        bBuf.get(tag).get(tagTitle).get(tagArtist).get(tagAlbum)
                        .get(tagYear).get(tagComment).get(tagGenre);
        if(!"TAG".equals(new String(tag))){
                throw new IllegalArgumentException(
                        "ByteBuffer does not contain ID3 tag data"
                );
        }
        Tag tagOut = new Tag();
        tagOut.setTitle(new String(tagTitle).trim());
        tagOut.setArtist(new String(tagArtist).trim());
        tagOut.setAlbum(new String(tagAlbum).trim());
        tagOut.setYear(new String(tagYear).trim());
        tagOut.setComment(new String(tagComment).trim());
        tagOut.setGenre(tagGenre[0]);
        return tagOut;
}

There should be more checks on the data being parsed. The example is kept short for readability.

The Tag object is just a simple bean style value object (getters and setters left out for brevity).


public class Tag {
        private String title;
        private String artist;
        private String album;
        private String year;
        private String comment;
        private byte genre;
        ...
        // getters / setters

The fields for the tag should be pretty clear. The only issue could be converting the byte for genre into a usable value. The documentation (though version two) has a list of genres and their corresponding numerical values. Ideally, developers could use these to create localized display of the genre.

Locating the ID3 Tag

So, the difficult part is actually finding the tag. There are many methods to do this. They basically involve either searching for the tag header value or by jumping to the end of the MP3 and reading the last 128 bytes. The ones I will cover are:

I've listed these in order of preference. The first two are very dependent upon using files, which leaves them in a non-preferred state for me (very leaky abstractions). Reading until the Header is found could also be useful when getting MP3's streams (like web radio) though, it could give some errors if the combination of bytes happened to arrive in the music data. The last is good for treating MP3's as entities, like a file without depending on a file.

Using the RandomAccessFile object

This technique is most dependent upon the MP3 being in a file. Thus, it has the greatest number of limitations. On the positive, the code is pretty short and simple.


int tagSize = 128;

public Tag readTag(File file) throws IOException {
	RandomAccessFile raf = new RandomAccessFile(file, "r");
	byte[] tagData = new byte[tagSize];
	raf.seek(raf.length() - tagSize);
	raf.read(tagData);
	ByteBuffer bBuf = ByteBuffer.allocate(tagSize);
	bBuf.put(tagData);
	bBuf.rewind();
	return populateTag(bBuf);
}

There isn't too much to say about this. It allocates space for the tag data, skips to the end of the file (minus the tag data size) and reads the last 128 bytes. The data is then parsed using the method described earlier.

Using an InputStream and Skipping

This method is very similar to how the RandomAccessFile is used. The difference is, that the value for how far to skip is passed as a parameter.


int tagSize = 128;
public Tag readTag(InputStream in, long start) throws Exception {
	BufferedInputStream bIn = new BufferedInputStream(in);
	bIn.skip(start);
	byte[] tagData = new byte[tagSize];
	bIn.read(tagData);
	ByteBuffer bBuf = ByteBuffer.allocate(tagSize);
	bBuf.put(tagData);
	bBuf.rewind();
	return populateTag(bBuf);
}

Using an InputStream and Seeking the Tag

With this method, we scan the bytes in the InputStream looking for the first appearance of the sequence of bytes containing 'T', 'A', 'G' and assume it is the correct header for the ID3 tag.


int tagSize = 128;
public Tag readTag(InputStream in) throws Exception {
        BufferedInputStream bIn = new BufferedInputStream(in);
        int match = 'T' << 16 | 'A' << 8 | 'G';
        int tmp = 0;
        while (true) {
                if ((tmp & 0x00ffffff) == match) {
                        break;
                }
                tmp <<= 8;
                tmp |= bIn.read();
        }
        byte[] tagData = new byte[125];
        bIn.read(tagData);
        ByteBuffer bBuf = ByteBuffer.allocate(tagSize);
        bBuf.put("TAG".getBytes());
        bBuf.put(tagData);
        bBuf.rewind();
        Tag tagOut = populateTag(bBuf);
        return tagOut;
}

To keep the byte scanning fast, we store the bytes matching the pattern in an integer. Then an integer is built by reading each byte shifting the correct value and adding the byte read. Once a match is found, the Tag object is built as in the other methods.

Using a Circular Array

The circular array works by creating an array the size of the ID3 tag. The array is just filled with each byte read. When the size of the array is reached, it starts at the beginning. The position is maintained so that when the end of the InputStream is reached, the ByteBuffer can be filled starting at the position variable to the end of the array and with the beginning of the array up to the position variable.


int tagSize = 128;

public Tag readTag(InputStream in) throws Exception {
	byte[] tagData = new byte[tagSize];
	int pos = 0;
	for (int curVal = 1; (curVal = (in.read())) >= 0;) {
		tagData[pos++] = (byte)curVal;
		if(pos==tagSize){
			pos=0;
		}
	}
	ByteBuffer bBuf = ByteBuffer.allocate(tagSize);
	bBuf.put(tagData,pos,tagSize-pos);
	bBuf.put(tagData,0,pos);
	bBuf.rewind();
	return populateTag(bBuf);
}

Resources

Questions, comments, insults? feedback@willcode4beer.com.



Sponsors:

About willCode4Beer