How to Read Remote MP3 File Info Without Downloading Entire File

I’d like to share technique I came up with when I needed to index a lot of mp3 podcasts located on remote servers. It would have taken weeks if downloading entire files, count was on tens of thousands.

Example is in Python, but any other programming language could be used to perform these steps.

In a nutshell what was needed to be done:

  1. Read ID3 tags
  2. Get podcast duration
  3. Get file size
  4. Get MP3 bitrate

I managed to solve this by reading only first 40kb of a file, and if required last 128 bytes for ID3v1 tags.

Process is divided in 6 logical steps

  1. Download first 40kb
  2. Save them to a temporary file
  3. Read real file length from HTTP response
  4. Resize temporary file
  5. Rewind file pointer to -128 and read last 128 bytes using Range Request
  6. Finally read all the information ID3 info, bitrate, duration

To read ID3 info and calculate bitrate and song duration i used nice library called Mutagen

First i will illustrate how to download info we need and create temporary mp3 file with just the data we need.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
import httplib, urllib
from urlparse import urlparse
 
def downloadMp3(self, mp3url):
    params = {}
    parts = urlparse(mp3url)
    useragent = 'podcast-fetcher-example'
    allowedmimes = ['text/html']
    allowedextns = ['', '.mp3']
    maxcontentbytes = 40000
 
    headers = {
           'User-agent': useragent,
           'Accept': ','.join(allowedmimes),
           'Range': 'bytes=0-%(maxcontentbytes)d' %locals()
    }
 
     try:
        conn = httplib.HTTPConnection(parts.netloc)
        path = parts.path
        conn.request("GET", path, params, headers)
        response = conn.getresponse()
        # To read real file size
        contentLen = 0
 
       # If Range is not supported, just download first max bytes
        if response.status == 200 :
            if response.getheader("content-length") != None :
                contentLen = int(response.getheader("content-length"))
 
       # Handle Redirect
        elif response.status == 302 :
            newurl = response.getheader("location")
            self.downloadMp3(newurl)
       # Range response
        elif response.status == 206 :
 
            """ Response example 'content-range', 'bytes 0-40000/3796992"""
            field = response.getheader("content-range")
            contentLen = int( field[field.find("/")+1:] )
 
       # We are not handling errors here
        if response.status > 299 :
            conn.close()
            return False
 
        # Size if not big enough to calculate bitrate
        if contentLen < maxcontentbytes:
            conn.close()
            return False
 
        data = response.read(maxcontentbytes)
        # create/open temporary file
        file = open("temp.mp3", 'wb')
        # truncate if exists already
        file.truncate(0)
        file.write(data)
 
        # make space for ID3v1 if any
        file.seek(contentLen - 128, 0)
        response.close()
 
        # if supports partial request we read last 128 bytes for ID3v1
        if response.status == 206 :
            conn = httplib.HTTPConnection(parts.netloc)
            # Range: bytes=-128 will read is last 128 bytes
 
            headers2 = {
                   'User-agent': useragent,
                   'Accept': ','.join(allowedmimes),
                   'Range': 'bytes=-128' %locals()
            }
 
            conn.request("GET", path, params, headers2)
            response2 = conn.getresponse()
            file.write(response2.read(128))
            response2.close()
 
           # otherwise just append 128 to file
        else :
            file.seek(128, 1)
 
        file.close()
        conn.close()
 
      except Exception, msg :
        conn.close()
        print self.name + "Error downloading info: " + str(msg)
        return False
 
    return True

And having all data we need, we read meta info with mutagen.

Note that mutagen returns data as Unicode strings

1
2
3
4
5
6
7
8
9
  if downloadMp3(url):
            from mutagen.easyid3 import EasyID3
            from mutagen.mp3 import MP3
            audio = MP3("temp.mp3", ID3=EasyID3)
            title = audio.get("title", [""])[0].encode("ISO-8859-1")
            artist = audio.get("artist", [""])[0].encode("ISO-8859-1")
            album = audio.get("album", [""])[0].encode("ISO-8859-1")
            bitrate = audio.info.bitrate
            filesize = round(audio.info.length)

Et voilĂ , we have all the info we needed while we spent only little more than 40kb of bandwidth.

No Comment

No comments yet

Leave a reply

You must be logged in to post a comment.