03 November 2013

Web-Media Ripping 101

In this blogpost I want to write a step by step instructions and explanations how to solve the problem I've encountered. This can be interesting for you if you want to learn how to get media from the web sites that don't allow to download their media and are serving "only viewable" content. Also you'll get the idea about how the programmers think, and what cool things you can do if you know the basics of Informatics.

The problem

There was an epic irish cover video on YouTube of Draft Punk's "Get Lucky" song. But few days ago when I've searched for "get lucky irish cover" on YouTube I didn't find that video. Ok, ok 2 days ago someone resubmitted it, but it was not there when I was searching. Now that really sucks when you want to start your day with an epic video and are not able to.

Searching

Nowadays Google can help us a lot with finding something on the web. So use it as much as you can. I remembered that the pianist in that video was Scott Bradlee.  So at first I've looked explicitly on his channel to make sure that there was no video. As I didn't find anything on the Scott's channel. I've tried to use a Google video search with a "get lucky scott bradlee" query. And bingo! You can see the video at Jukebox.

Finding source

Yes, I've found it. But how can I be sure that it won't be removed from Jukebox tomorrow? To be able to see the video each morning I wanted to download it to my computer. Yes, there is a "Download" button, but it din't work for me, and I doubt that it will work for you. Now a few important statements that you should memorise:

  • if you are able to see (hear) the clip, then computer has to know where to get the video in order to show it to you.
  • computer is a very stupid thing. It can execute the orders fast, but it has a very limited knowledge of how the orders should look like. This means that someone has to give the exact orders which follow a strict rules about how to show that video to the user.
  • if you understand how your computer is getting the video - you can get it by yourself.

There is such tool as web inspector which shows an HTML structure of the page. I've used it to see, how the video element is defined:
as you can see, there is an object of a type application/x-shockwave-flash and then is has params inside. To make it easier to follow, I'll copy the content's here:
<object type="application/x-shockwave-flash" data="http://www.ultimedia.com/swf/ultimedia-player.swf?v=2.0.2.5987" width="100%" height="100%" bgcolor="#000000" id="player" name="player" tabindex="0">
<param name="allowfullscreen" value="true">
<param name="allowscriptaccess" value="always">
<param name="seamlesstabbing" value="true">
<param name="wmode" value="opaque">
<param name="flashvars" value="netstreambasepath=http%3A%2F%2Fwww.ultimedia.com%2Fdeliver%2Fmusique%2Fiframe%2Fmdtk%2F05137618%2Fzone%2F2%2Farticle%2Fxl85zp%2Fautoplay%2Fyes%2Fmember%2F33m03%2F&amp;id=player&amp;className=player_roll&amp;provider=http&amp;startparam=start&amp;http.startparam=start&amp;image=http%3A%2F%2Fimg.ultimedia.com%2Fa168357%2Farticles%2Fib770889.jpg&amp;plugins=http%3A%2F%2Fwww.ultimedia.com%2Fswf%2Fgapro-1h.swf%3Fv%3D2.0.2.5987%2Chttp%3A%2F%2Fwww.ultimedia.com%2Fswf%2Fova-jw.swf%3Fv%3D2.0.2.1669&amp;autostart=true&amp;skin=http%3A%2F%2Fwww.ultimedia.com%2Fskin%2Fjukebo%2Fjukebo.zip%3Fv%3D2.0.2.5987&amp;file=http%3A%2F%2Fstream13.ultimedia.com%2Ffdd6675389641944323c10e08e99a607%2Fc3BlZWQ9MzAwO3VzZXI9anVrZWJvO2V4cGlyZT01Mjc1YTQ0Nw%2C%2C%2F770videos%2Fvb770889-1004638.flv%3Fmdtk%3D05137618&amp;stretching=uniform&amp;gapro.accountid=UA-1399276-5&amp;gapro.pluginmode=FLASH&amp;ova.title=Ultimedia%20Videos&amp;ova.pluginmode=HYBRID&amp;controlbar.position=bottom&amp;logo.file=http%3A%2F%2Fwww.ultimedia.com%2Fimg%2Fdeliver%2Flogos%2Fjukebode_33zkl.png&amp;logo.position=bottom-right&amp;logo.hide=false&amp;logo.link=http%3A%2F%2Fwww.jukebo.de%2F&amp;logo.linktarget=_blank&amp;logo.over=0.8&amp;logo.out=0.5">
</object>
One thing that catches your eye is a data attribute with some URI as it's value. But it's just a player that is used to play the video (you could assume this by reading the URI itself). Also one of the param elements has a name flashvars and is particularly interesting because of it's value. If you know a bit of URI query string syntax or google for "flashvars" you'll figure out that the value of this param is a set of key=value pairs separated by ampersands. Here are the vars with additional whitespace to make it easier for you to read it:
netstreambasepath=http%3A%2F%2Fwww.ultimedia.com%2Fdeliver%2Fmusique%2Fiframe%2Fmdtk%2F05137618%2Fzone%2F2%2Farticle%2Fxl85zp%2Fautoplay%2Fyes%2Fmember%2F33m03%2F&amp;
id=player&amp;
className=player_roll&amp;
provider=http&amp;
startparam=start&amp;
http.startparam=start&amp;
image=http%3A%2F%2Fimg.ultimedia.com%2Fa168357%2Farticles%2Fib770889.jpg&amp;
plugins=http%3A%2F%2Fwww.ultimedia.com%2Fswf%2Fgapro-1h.swf%3Fv%3D2.0.2.5987%2Chttp%3A%2F%2Fwww.ultimedia.com%2Fswf%2Fova-jw.swf%3Fv%3D2.0.2.1669&amp;
autostart=true&amp;
skin=http%3A%2F%2Fwww.ultimedia.com%2Fskin%2Fjukebo%2Fjukebo.zip%3Fv%3D2.0.2.5987&amp;
file=http%3A%2F%2Fstream13.ultimedia.com%2Fd6ed0b43e2bd43971b52407a7ed93b76%2Fc3BlZWQ9MzAwO3VzZXI9anVrZWJvO2V4cGlyZT01MjczOGRjNA%2C%2C%2F770videos%2Fvb770889-1004638.flv%3Fmdtk%3D05137618&amp;
stretching=uniform&amp;
gapro.accountid=UA-1399276-5&amp;
gapro.pluginmode=FLASH&amp;
ova.title=Ultimedia%20Videos&amp;
ova.pluginmode=HYBRID&amp;
controlbar.position=bottom&amp;
logo.file=http%3A%2F%2Fwww.ultimedia.com%2Fimg%2Fdeliver%2Flogos%2Fjukebode_33zkl.png&amp;
logo.position=bottom-right&amp;
logo.hide=false&amp;
logo.link=http%3A%2F%2Fwww.jukebo.de%2F&amp;
logo.linktarget=_blank&amp;
logo.over=0.8&amp;
logo.out=0.5
Now let's think what can lead to a location of the video. netstreambasepath sounds promising. Something with "stream" may identify the place the video is streamed from. Note that URI value is strange. Instead of starting with "http://" is starts with "http%3A%2F%2F". These URIs are encoded in a special way, so that they won't mess with other special characters in the code. You can use some web resource like URL Decoder/Encoder to decode URIs inso more understandable format :). Now if you follow netstreambasepath URI, you'll get the same flash video that you see at jukebo.de. Bad guess. Let's look for a next candidate. Oh, near the middle of code we have a file key. And indeed this is the video. I have no idea how to get it in your browser. Maybe just paste URI into downloads window, or just "open" the URI in browser, or open and save something. If you don't know how to download files by URI in your browser - google for it.
  1. If you are picky, you probably have noticed that in the end of the video's URI there is something like: "mdtk=05137618". It's because this bastards want to make open web - closed. So when you open the video on their page they store in the database that when someone provides a "05137618" key, he can get the video. This key expires over a short time, but until expiry everyone with that URI can get the video file.
  2. If you've downloaded the video already, or examined URI, you probably know that this video is in a nasty FLV format.

Converting video

Most of players don't play FLV. So you have 2 choices: find a player or find a converter. I didn't want to instal something new, so I've just searched for "online flv to mp4 converter". Online video converter to MP4 was first in the list. And here is the best part: you can just give it the URI of the video to convert. So while your access token is still active, just paste the URI and drink some Guinness while converter downloads and converts the video, and then it's downloaded to your computer.

You don't have to be a genius. You just need to know the basics of informatics, understand simple rules of computers, and you'll be able to make your life better. The world of digit was made for us. Don't let others to decide a future for you.


No comments:

Post a Comment