How to create a new Fetch Class ?

The fetch classes is the part of VCD-db which automatically fetches data from remote websites that contain movie listings, such as imdb.com, yahoo movies and adultdvdempire.com
To create a new fetch class you will need to have some knowledge of PHP and Regular Expressions.
All the fetch classes are located in the folder vcddb/classes/fetch/, and they are dynamically loaded when used.
All fetch classes extend the class VCDFetch. That class actually does must of the work, the inherited fetch classes just tell the VCDFetch how to do it and how to handle the fetched data.

In the following example, I'll walk through all the variables, what data they store, what functions need to be implemented and what they do. Here is an example of how an fetch class is programming process would look like. This class includes all the variables needed and the functions that have to be implemented. Although the function implementations are not finished. The below example is based on the imdb.com fetch class.

<?php
class VCDFetch_imdb extends VCDFetch {

    protected $regexArray = array(
        'title'     => '<h1>([^\<]*)<span>',
        'year'      => '(<a href="/Sections/Years/([0-9]{4})">([0-9]{4})</a>)',
        'poster'    => '<a name="poster"([^<]*)><img([^<]*)([^<]*)src="([^<]*)" height="([0-9]{2,3})" width="([0-9]{2,3})"></a>',
        'director'  => '#Director.*\n[^<]*<a href="/Name?[^"]*">([^<]*)</a>#i',
        'genre'     => '<A HREF=\"/Sections/Genres/[a-zA-Z\\-]*/\">([a-zA-Z\\-]*)</A>',
        'rating'    => '<b>([0-9]).([0-9])/10</b>',
        'cast'      => '<td class="nm"><a href="/name/nm([^"]+)/">([^<]*)</a></td><td class="ddd"> ... </td><td class="char">([^<]*)</td></tr>',
        'runtime'   => '([0-9]+) min',
        'akas'      => 'Also Known As</b>:</b><br>(.*)<b class="ch"><a href="/mpaa">MPAA</a>',
        'country'   => '<a href=\"/Sections/Countries/([^>]*)>([^<]*)</a>',
        'plot'      => '<h5>Plot Outline:</h5>([^\<]*)<'
        );

    protected $multiArray = array(
        'genre', 'cast', 'akas', 'country'
    );

    private $servername = 'akas.imdb.com';
    private $searchpath = '/find?s=tt&q=[$]';
    private $itempath   = '/title/tt[$]/';

    public function __construct() {
        $this->setSiteName("imdb");
        $this->setFetchUrls($this->servername, $this->searchpath, $this->itempath);
        $this->useSnoopy();
    }

    public function search($title) {
        return parent::search($title);
    }

    public function showSearchResults() {
        $this->setMaxSearchResults(50);
        $regx = '<a href=\"\/title\/tt([0-9]+)\/([^\<]*)\">([^\<]*)</a>[^(]*\(([0-9]{4}(/I+)?)\)';
        $results = parent::generateSimpleSearchResults($regx, 1, 3, 4);
        return parent::generateSearchSelection($results);
    }

    protected function processResults() {
        if (!is_array($this->workerArray) || sizeof($this->workerArray) == 0) {
            $this->setErrorMsg("No results to process.");
            return;
        }

        $obj = new imdbObj();
        $obj->setIMDB($this->getItemID());

        foreach ($this->workerArray as $key => $data) {

            $entry = $data[0];
            $arrData = $data[1];

            switch ($entry) {
                case 'title':
                    $title = $arrData[1];
                    $obj->setTitle($title);
                    break;

                case 'year':
                    $year = $arrData[2];
                    $obj->setYear($year);
                    break;

                case 'poster':
                    $poster = $arrData[4];
                    $obj->setImage($poster);
                    break;

                case 'director':
                    $director = $arrData[1];
                    $obj->setDirector($director);
                    break;

                case 'genre':

                    $arr = array();
                    foreach ($arrData as $item) {
                        array_push($arr, $item[1]);
                    }
                    $obj->setGenre($arr);
                    break;

                case 'rating':
                    $rating = $arrData[1].$arrData[2];
                    $rating = $rating/10;
                    $obj->setRating($rating);
                    break;

                case 'cast':
                    $arr = null;
                    $arr = array();
                    foreach ($arrData as $itemArr) {
                        $actor = $itemArr[2];
                        $role = $itemArr[3];
                        $result = $actor." .... " .$role;
                        array_push($arr, $result);
                    }
                    $obj->setCast($arr);
                    break;

                case 'runtime':
                    $runtime = $arrData[1];
                    $obj->setRuntime($runtime);
                    break;

                case 'akas':
                    $akaTitles = implode(',', $arrData);
                    $obj->setAltTitle($akaTitles);
                    break;

                case 'plot':
                    if (is_array($arrData)) {
                        $plot = trim($arrData[1]);
                    } elseif (is_string($arrData)) {
                        $plot = trim($arrData);
                    }
                    
                    $obj->setPlot($plot);
                    
                    break;

                case 'country':
                    if (sizeof($arrData) > 0) {
                        $arrCountries = array();
                        foreach ($arrData as $itemArr) {
                            array_push($arrCountries, $itemArr[2]);
                        }
                        $obj->setCountry($arrCountries);
                    }

                    break;

                default:
                    break;
            }

        }

        $this->fetchedObj = $obj;
    }

    protected function fetchDeeper($entry) {

        switch ($entry) {

            case 'akas':
                $ret = array();
                $contents = $this->getContents();
                if(eregi('Also Known As:</b><br>(.*)<b class="ch"><a href="/mpaa">MPAA</a>',$contents, $y)) {
                    $contents = $y[0];
                    while(eregi('<br>([^<]*)', $contents, $x)) {
                        if (isset($x[1]) && strcmp(trim($x[1]),"") != 0) {
                            $ret[] = trim($x[1]);
                        }
                        $contents = substr($contents,strpos($contents,$x[0])+strlen($x[0]));
                    }
                }
                array_push($this->workerArray, array($entry, $ret));

                break;

            default:
                break;
        }
    }
}
?>

Class members and functions.

  1. Class variables
    1. $regexArray
    2. $multiArray
    3. $servername
    4. $searchpath
    5. $itempath
  2. Class functions
    1. __construct()
    2. search($title)
    3. showSearchResults()
    4. processResults()
    5. fetchDeeper($entry)

Class variables

$regexArray

This variable is an associative array.
The key is an unique name of an entry, and the value is an regular expression that is used to get the desired data for that key. For example this is an entry to retrieve the title from a movie item on imdb.com

title' => '<h1>([^\<]*)<span>'

The regular expression used above works like this ... Find <h1>, then do a greedy search for any characters that is not <, then the next characters must match <span>.

$multiArray

This variable is an array.
It tells the VCDFetch engine that multiple results are expected for the values in this array.
Entries in this array must already be defined in the $regexArray.

$servername

This variable is a string.
The server name that the fetch class connects to.
Do not include http://, for example imdb uses us.imdb.com

$searchpath

This variable is a string.
The relative url within the $servername where the search results on the remote server are displayed.
For example imdb.com uses /find?s=tt&q=[$]. The [$] token is dynamically substituted by the search string when using the fetch class with VCD-db.

$itempath

This variable is a string.
The relative url where an entry for a movie would reside.
For example imdb.com uses /title/tt[$]/, yet again the [$] token is replaced by the actual movie id on the remote server that data is being fetched from.

Class functions

__construct()

This is the class constructor.
This function has no parameters and returns void (nothing) It is called automatically when the class is initialized.
If we take a look at this function in the imdb fetch class the following lines are in the function ..

$this->setSiteName("imdb");
$this->setFetchUrls($this->servername, $this->searchpath, $this->itempath);
$this->useSnoopy();

The first line gives the class data and the cache manager an unique name.
The second line calls a parent function with the servername, searchpath and itempath that should have already been defined.
The third line tells the parent class to use the Snoopy lib to fetch data, if this parent function is not called, the parent class uses the php function fsock_open to retrieve data from the remote site.

search($title)

This function performs the search on the requested title.
This function has a parameter string that is the titled being searched for and returns an array of the search results. In most cases calling the parent function like in the example above should be sufficient.

showSearchResults()

This function uses the results gathered with the search() function to generate search results to be displayed in the VCD-db UI when a title search has been called.
This function has no parameters and returns an associative array of the search results.
The associative array has the following keys, (id,title,year). The year key is optional.
If we take a look at this function in the imdb fetch class the following lines are in the function ..

$this->setMaxSearchResults(50);
$regx = '<a href=\"\/title\/tt([0-9]+)\/([^\<]*)\">([^\<]*)</a>[^(]*\(([0-9]{4}(/I+)?)\)';
$results = parent::generateSimpleSearchResults($regx, 1, 3, 4);
return parent::generateSearchSelection($results);

Line one is calling a parent function to set the maximum search results to be displayed.
Line two is creating a regular expression that is used to gather the search results from the remote search result page.
The third line is calling the parent function generateSimpleSearchResults with the regular expression as the first parameter and the next parameters tell the parent function where in the search result array the id,title and year can be found. The year index is optional.
The parent function then returns an associative array containing keys (id,title,year) and the values are of course the search results fetched from the remote site.
The fourth line then calls the parent function generateSearchSelection with the results from the generateSimpleSearchResults as a parameter. The function generateSearchSelection prepares the data from the search results to be used in the VCD-db UI when a title has been searched.

Calling the parent functions in the function is of course optional, if your data need some special handling you can process the search results yourself in this function, just be sure to return an array of the search results by ending the function by calling parent::generateSearchSelection with your search results.

processResults()

This function iterates through the results of the items that were fetched from the regular expressions defined in the variable $regexArray. It uses a switch statement to match the keys defined in the $regexArray.
There you will need to pick out the results and add them to the fetched object you are populating.

fetchDeeper($entry)

This function is called when a regular expression defined in the $regexArray does not return any results.
Then you can do some extra checks trying to populate the data needed for the specified entry.
Yet again a switch statement is used to match the key being used.
The function call array_push($this->workerArray, array('key', $data)); is then used to populate the correct value with the key being used.

Activating the fetch class

To activate the fetch class within VCD-db so it becomes available in the dropdown list when adding movies, you will need to create a new entry in the Source sites view in the admin console.
Below is an example how the imdb entry looks like ..

Key Value Description
Name: Internet Movie Database The name of the site
Alias: imdb The site alias, must match they alias used in the constructor $this->setSiteName("imdb");
Homepage: http://imdb.com The site's homepage
Fetch Url: http://www.imdb.com/title/tt# Url of a movie item, # is replaced by the actual ID
Class name: VCDFetch_imdb The name of the fetch class that does the work, case sensetive!
Image path: imdb.gif The name of the logo for the site, must reside in folder images/logos/
Is fetchable: checked Tell VCD-db that this class can fetch content

After this entry has been created for your fetch class, it will become available in the list of fetch classes on the "add new movie" page.