Tuesday, October 2, 2012

Using Python and wget to download certain file types from a web-page

Recently I wanted to download all the movies in Mackay's information theory course. That was a great deal of files and I though it would too tedious to try to do it using the browser or even manual wget. So digging around, I found this decent tutorial on python HTML processing. I also dug around some information on parsing URLs. I combined that with my knowledge of python's subprocess and put together this nice piece of code.

 

#!/usr/bin/python

import sgmllib

class MyParser(sgmllib.SGMLParser):
    "A simple parser class."

    def parse(self, s):
        "Parse the given string 's'."
        self.feed(s)
        self.close()

    def __init__(self, verbose=0):
        "Initialise an object, passing 'verbose' to the superclass."

        sgmllib.SGMLParser.__init__(self, verbose)
        self.hyperlinks = []

    def start_a(self, attributes):
        "Process a hyperlink and its 'attributes'."

        for name, value in attributes:
            if name == "href":
                self.hyperlinks.append(value)

    def get_hyperlinks(self):
        "Return the list of hyperlinks."

        return self.hyperlinks

import urllib, sgmllib

# Get something to work with.
webPage="http://www.inference.phy.cam.ac.uk/itprnn_lectures/"
f = urllib.urlopen(webPage)
s = f.read()

# Try and process the page.
# The class should have been defined first, remember.
myparser = MyParser()
myparser.parse(s)

# Get the hyperlinks.
links=myparser.get_hyperlinks()
print links

movies=[x for x in links if x.endswith('mp4')]
print movies

import urlparse
movieURLs=[urlparse.urljoin(webPage,x) for x in movies]
print movieURLs

from subprocess import call

for movieURL in movieURLs:
    call(["wget -c "+movieURL],shell=True)

Sunday, September 30, 2012

Mutating your MAC address

Recently I was at an airport that offered free WiFi for the first 15 minutes, afterwards the system asked you pay. I knew that they must be checking identity via MAC address, so I wrote this little nifty python script that changes it to some random value each time you call it. This would only work on Linux (or maybe a MAC) OS.
I am not suggesting that you violate any laws, I put this up for educational purposes. Use it at your own risk.
For obtaining the MAC address through python, I used synthesizerpatel's code snippet. Alternatively the MAC address could have been obtained via a system call ifconfig using "subprocess.Popen", as described in here.

 #!/usr/bin/python  
   
 import fcntl, socket, struct, random  
 from subprocess import call  
   
 def getHwAddr(ifname):  
   s = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)  
   info = fcntl.ioctl(s.fileno(), 0x8927, struct.pack('256s', ifname[:15]))  
   return ''.join(['%02x:' % ord(char) for char in info[18:24]])[:-1]  
   
   
 a=getHwAddr('wlan0')  
 i=random.randint(2,5)  
 b=a[:3*i]+hex((int(a.split(':')[i])+random.randint(1,255))%255)[2:].zfill(2)+a[(i+1)*3-1:]  
   
 print "Changing from "+a+" to "+b  
 call(["sudo ifconfig wlan0 down"],shell=True)  
 call(["sudo ifconfig wlan0 hw ether "+b],shell=True)  
 call(["sudo ifconfig wlan0 up"],shell=True)