Rss Parse Class ver0.1

关键字: rss parse php class rss1.0 rss2.0 解析rss php源码

去年写的一个解析rss的类,支持rss1.0/2.0(除非rss有问题)。返回的结果是解析后的数组:channel、 items、images。识别rss的编码并全部转化为utf-8。
由于rss的格式不是很统一,所以返回的结果可能会不完整。还有写的比较仓促,估计会有疏漏。!-.-

原创文章,转载请注明出处.

几个主要的方法:
parseRss($url): 设定欲解析的rss地址
getAll(): 取得所有的返回结果,数组类似:

Array
(
    [
channel] => Array
        (
            [
title] => UGiA.CN (站点标题)
            [
link] => http://www.ugia.cn (站点地址)    
            
[description] => …描述
            
[pubdate] => 1105939204 (Unix时间戳)
        )

    [
items] => Array
        (
            [
0] => Array
                (
                    [
title] => …日志标题     
                    
[link] => http:// …日志地址 
                    
[description] => …描述
                    
[pubdate] => 1105939204 (Unix时间戳)
                 )
            

         
)

    [
images] => Array
        (
            [
0] => 图片url
        
)

getChannel(): 取得频道信息,标题、描述、地址、最后修改日期。 即上面的channel数组
getItems(): 取得日志数据,标题、地址、描述、最后修改日期。即上面的items数组
getImages(): 图像url,即上面的images数组

示例:

<?php
require_once("rssparse.php");
header('Content-Type: text/html; charset=utf-8');

$RP = new RssParse();
$RP->parseRss("http://www.ugia.cn/wp-rss2.php");

$result  $RP->getAll();

//$channel = $RP->getChannel();
//$items   = $RP->getItems();
//$images  = $RP->getImages();

print_r($result);
?>

源代码如下:

<?php
/**
 * Rss Parse Class ver0.1
 * 
 * @link http://www.ugia.cn/?p=42
 * @author: legend (PASiOcn@msn.com)
 * @version 0.1
 */

class RssParse {

    var 
$encoding      "utf-8";
    var 
$rssurl        "http://www.ugia.cn/wp-rss2.php";
    
    var 
$resource      "";
    var 
$tag           "";

    var 
$insidechannel false;
    var 
$insideitem    false;
    var 
$insideimage   false;
    
    var 
$item          = array();
    var 
$channel       = array();
    var 
$image         "";

    var 
$items         = array();
    var 
$images        = array();

    
    function 
rssReset()
    {
        
$this->item    = array();
        
$this->channel = array();
        
$this->images  "";
        
$this->items   = array();
        
$this->images  = array();
    }

    function 
getResource()
    {
        
$fp = @fopen($this->rssurl"rb");

        if (
is_resource($fp)) {

            while(
$data fread($fp4096)) {
                
$ipd .= $data;
            }
            
$this->resource $ipd;
            @
fclose($fp);

            return 
true;
        }

        return 
false;
    }

    function 
getEncoding()
    {
        if (
preg_match('| encoding="([^"]*)"|'$this->resource$result))
        {
            
$this->encoding strtolower($result[1]);
        }
        else
        {
            
$this->encoding "utf-8";
        }
    }
   
    function 
parseRss($rssurl '')
    {
        if (!empty(
$rssurl))
        {
            
$this->rssurl $rssurl;
        }
        
        if (!
$this->getResource())
        {
            return 
false;
        }

        
$this->getEncoding();
        
        if (
$this->encoding != "utf-8")
        {
            
$this->resource iconv($this->encoding"UTF-8"$this->resource);
        }

        
$xml_parser xml_parser_create("utf-8");

        
xml_parser_set_option($xml_parserXML_OPTION_CASE_FOLDINGfalse);
        
xml_set_object($xml_parser$this);
        
xml_set_element_handler($xml_parser"startElement""endElement");
        
xml_set_character_data_handler($xml_parser"characterData");

        
xml_parse($xml_parser$this->resourcetrue);
        
xml_parser_free($xml_parser);
        
        if ( 
count($this->channel) > 1)
        {
            
$this->channel['pubdate'] = $this->mystrtotime($this->channel['pubdate']);
            if (
$this->channel['pubdate'] < = 0)
            {
                
$this->channel['pubdate'] = $this->items[0]['pubdate'];
            }
        }
        return 
true;
    }
    
    function 
getAll()
    {
        return array(
                     
'channel' => $this->channel,
                     
'items'   => $this->items,
                     
'images'  => $this->images
                    
);
    }
    
    function 
getChannel()
    {
        return 
$this->channel;
    }

    function 
getItems()
    {
        return 
$this->items;
    }

    function 
getImages()
    {
        return 
$this->images;
    }

    function 
startElement($parser$name$attrs)
    {
        if (
$this->insideitem || $this->insideimage || $this->insidechannel)
        {
            
$this->tag strtolower($name);
        }
        
        switch (
$name)
        {
            case 
"channel" $this->insidechannel true; break;
            case 
"item"    $this->insideitem    true; break;
            case 
"image"   $this->insideimage   true; break;
        }
    }

    function 
endElement($parser$name)
    {
        if (
$name == "channel")
        {
            
$this->insidechannel false;

        }
        else if (
$name == "url")
        {
            
$this->images[]    = trim($this->image);
            
$this->insideimage false;
            
$this->image       "";
        }
        else if (
$name == "item")
        {
            
$this->item['pubdate']     = $this->mystrtotime($this->item['pubdate']);
            
$this->item['description'] = trim(strip_tags($this->item['description']));
            
$this->item['description'] = str_replace(" """$this->item['description']);
            
            
/**
            if (strlen($this->item['description']) > 700)
            {
                $this->item['description'] = substr($this->item['description'], 0, 697) . "...";
            }
            */
            
            
$this->items[]         = $this->item;
            
$this->item            = array();
            
$this->insideitem      false;
        } 
    }

    function 
characterData($parser$data)
    {   
        if (
$this->insideitem)
        {
            switch (
$this->tag)
            {
                case 
"title":       $this->item['title']       .= $data; break;
                case 
"description"$this->item['description'] .= $data; break;
                case 
"link":        $this->item['link']        .= $data; break;
                case 
"dc:date":     $this->item['pubdate']     .= $data; break;
                case 
"pubdate":     $this->item['pubdate']     .= $data; break;
                case 
"modified":     $this->item['pubdate']     .= $data; break; 
            }
        }
        elseif (
$this->insideimage && $this->tag == "url")
        {
            
$this->image .= $data;
        } 
        elseif (
$this->insidechannel)
        {
            switch (
$this->tag)
            {
                case 
"title":         $this->channel['title']       .= $data; break;
                case 
"description":   $this->channel['description'] .= $data; break;
                case 
"link":          $this->channel['link']        .= $data; break;
                case 
"dc:date":       $this->channel['pubdate']     .= $data; break;
                case 
"pubdate":       $this->channel['pubdate']     .= $data; break;
                case 
"lastbuilddate"$this->channel['pubdate']     .= $data; break;
                case 
"modified":      $this->channel['pubdate']     .= $data; break;
            }
        }
    }
    
    
/**
     * 日期格式太多,除了php中的strtotime()函数能够转化的,我另外加了一个格式的识别,其他的未写。
     */
    
function mystrtotime($time)
    {
        
$curtime strtotime($time);
        if (
$curtme < = 0)
        {
            if (
preg_match("|\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}\+\d{2}:\d{2}|"$time$result))
            {
                
$time str_replace(array("T""+"), array(" "" +"), $time);
                
$time[23] = "";
            }
            
            
// if (.........

            
$curtime strtotime($time);
        }

        return 
$curtime;
    }

   function 
getError($msg)
   {
       die(
$msg);
   }
}
?>
</></>

5 Comments »

  1. rainbowsoft said,

    January 23, 2005 @ 9:23 am

    我有一个RSS输出类,也可以试试做一个反向解析类

  2. liuchen said,

    March 10, 2006 @ 4:49 pm

    非常感谢!里面有两个错误,一个是”引号的错误,一个是

  3. legend said,

    March 10, 2006 @ 5:39 pm

    这个类不完善,推荐使用magpierss
    http://magpierss.sourceforge.net

  4. xiulin.liu said,

    June 28, 2006 @ 9:42 pm

    很喜欢这种程序风格,但是试用后,觉的速度有点慢.

  5. Omaha said,

    January 11, 2007 @ 5:30 am

    Excellent Web Site! Very professional and full of great information. I am greatly enjoying it. Your enthusiasm is wonderful!!!

RSS feed for comments on this post

Leave a Comment