Tuesday, September 14, 2010

The Database

I needed a database that would allow the the Music Server to access information about Artists, Albums and Songs/Tracks in a simple and easy to access format. As already mentioned in the previous post the Web UI is written in javascript and working with JSON (http://json.org/) is probably the most forward data format available. If the database itself uses JSON for the data format life is made much simpler as there is really no marshaling to other formats required.

The core of the database is the songs/tracks file. This file contains around 8000 tracks in my music collection. To load this whole file into memory is really not feasible. The solution I came up was to use a basic form of indexing and access the file via the java.io.RandomAccessFile class. A track index is simply a length (length of the JSON track record) and an offset pointing to where it resides in the file. To access a track quickly all that is required is a simple data structure consisting of these two values.

The song/track JSON structure itself mimics the IDv3 MP3 tags that are read from the MP3 files. Here is an example of one :

{
  "album": "Legion of Boom",
  "author": "The Crystal Method",
  "date": "2004",
  "duration": 268,
  "mp3.bitrate.nominal.bps": 128000,
  "mp3.channels": 2,
  "mp3.copyright": false,
  "mp3.crc": false,
  "mp3.frequency.hz": 44100,
  "mp3.id3tag.genre": "Electronica",
  "mp3.id3tag.track": 4,
  "mp3.mode": "Joint Stereo",
  "mp3.original": false,
  "mp3.vbr": false,
  "mp3.version.layer": "Layer 3",
  "mp3.version.mpeg": "MPEG1",
  "path": "/Volumes/mp3/music/The Crystal Method/Legion of Boom/04 - The Crystal Method - The American Way - Legion of Boom.mp3",
  "title": "The American Way",
  "type": "mp3"
}

In addition to to the songs/tracks file there are a number of index files that associate track indexes to Artists and Albums.

There is an "Artists to Songs" index file. Each Artist is associated with an array of song indexes.

  "alice in chains": [
    {
      "length": 542,
      "offset": 1500107
    },
    {
      "length": 544,
      "offset": 1500649
    },
.....

There is an "Albums to Songs" index file. Each Album is associated with an array of song indexes.

  "...And Justice For All (mp3)": [
    {
      "length": 600,
      "offset": 3101248
    },
    {
      "length": 568,
      "offset": 3101848
    },

Also there is an "Artists to Albums" index file that handles finding the artist/album associations quickly.

  "arctic monkeys": [
    "Favourite Worst Nightmare (mp3)",
    "Whatever People Say I Am, That\'s What I\'m Not (mp3)",
    "Humbug (mp3)"
  ]

These 4 files made up the database for the initial version, however as development progressed I found that I needed to add 2 additional files :
  • An "ArtistId To Artists" file
  • A "Directory Information" file
The "ArtistId To Artists" file was required because of the discrepancies I found with the Artist names in my music collection. The would be a variety of combination that would ultimately match the same artist. For example "The Cure", "Cure" and "Cure, The". This id file ties the combinations together.

The "Directory Information" file came about when I wanted to add new music files to an existing database without starting from scratch. This file provides a snapshot of the directories providing the music files that can be used to compare what is being scanned off disk.

    1 comment:

    1. Hey Richard, I was looking into creating something very similar to this! I've noticed you haven't updated the code in a while, and I wonder if you still are interested in it and if you'd be happy for me to contribute.
      I've been a java dev for 10 years, so with your permissions I'll clone your project and make some changes to move it to my liking, plus updating it with some latest technologies.

      Is there an email I can write to so we can communicate? :) Let me know,
      Cheers,
      Sam

      ReplyDelete