| home | sitemap | travel pix | interactive mathematics |


Adding feeds with CG-FeedRead - a tutorial

Posted in Computers & Internet on 19 Jan 2006.
32 Comments

Someone recently asked for a recommended method for pulling feeds from news sources into a PHP-based website.

Update, 27 Feb 2007: There were some problems with the feeds from Bangladesh and the $dLimit variable. I have rewritten the tutorial where necessary.

Background

I use CG-Feedread by David Chait on my Interactive Mathematics site to pull posts from squareCircleZ blog into the homepage (you can see the links under the heading “Mathematics Blog” in the right column of the math site).

First step

Download the latest CG-FeedRead. It is called something like “WP 1.5.1.x Compatible…”. It’s free, but Chait appreciates donations.

The script works fine but the documentation is rather hard to figure if you are a newby.

CG FeedRead is designed to work with the Wordpress blog engine, but here I am going to assume that we are independent of Wordpress.

Installation

After you have downloaded the zip file, extract the files. You get lots of files, but will only need 4, from the \plugins\cg-plugins\ directory.

We’ll assume the PHP page that you want to pull the feeds into is in the root directory of http://www.mysite.com/.

Create a directory /cg in your root directory (so it will be http://www.mysite.com/cg/) . Upload the following 4 files from the zip into that /cg directory:

  • cg-feedread.php
  • helper_fns.php
  • uni_fns.php
  • XMLParser.php

[Update 14 Nov 2006: Please see David Chait's comment below, about uploading all of the files, and my comment following.]

Inside the cg/ directory, create a directory called /cache_feedread (so it will be http://www.mysite.com/cg/cache_feedread/). This holds a small cache of the feeds (this is much better than calling the feed everytime a user goes to your PHP page.) CHMOD the permissions for the cache directory to 764. (The script needs to be able to write the cached files. If 764 does not work, try giving more write permissions.)

That’s it for installing CG FeedRead.

The PHP page to display feeds (single feed only)

1. Start with the shell of an HTML document (you need all this so it displays properly in a browser), something like:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01
      Transitional//EN">
<html>
  <head>
  <meta http-equiv="content-type"
      content="text/html; charset=utf-8">
  <title></title>
  </head>
  <body>
  </body>
</html>

2. Now, before the <!DOCTYPE HTML PUBLIC … stuff at the very top, put the following:

<?php
  require_once("cg/cg-feedread.php");
?>

This tells the current page to load the cg-feedread script.

3. In the <body> section, put the following (this tells the cg-feedread script to go get the feed, process it and display it on this page):

<?php
$feedUrl = "http://www.squarecirclez.com/blog/feed";
$maxItemsPerFeed = 4;
$showDetails = true;
$cacheName = "blog";
$filterCat='';
$tLimit = -1;
$dLimit = 10;
$noHTML = true;
$showTime = false;
$feedStyle = false;
$noTitle = true;
$showTimeGMT = true;
$titleImages = false;
$multiSiteTitle = false;
$makeRSS = false;
$rssLink="";

$feedOut = getSomeFeed($feedUrl, $maxItemsPerFeed,
  $showDetails, $cacheName, $filterCat, $tLimit, $dLimit,
  $noHTML, $showTime, $feedStyle, $noTitle, $showTimeGMT,
  $titleImages, $multiSiteTitle, $makeRSS, $rssLink);

if ($feedOut)
  echo $feedOut;
?>

Note 1: I found the listing of these variables in the original files was quite confusing until I rewrote them like this. Takes space, but at least you know what is going on.

Note 2: The line, $dLimit = 10; gives you the first 10 characters of the post. (But see the comments below - this needed an extra tweak to work.)

Example Results Page

You can see the results of the previous step at the top of: CG FeedRead Examples. Also on that page, following the squareCircleZ feed are the feeds from the 2 news sites (The New Nation and The ABC, Australia) that we will combine into one feed in the section below.

4. Save your page with a PHP extension (something like feedread.php will do), load it on your server (in the root directory - see above) and then call it in your browser (it will be www.mysite.com/feedread.php. All should work. (Good luck!)

The PHP page to display multiple feeds

This now answers the original request, for a way to pull multiple feeds into one page. The following is similar to what we had above, but allows for multiple feeds.

1. In the <body> section, this time use the following code:

<?php
// first, make an array of all the feeds you want mixed
$feeds = array (
"http://timesofindia.indiatimes.com/rssfeeds/-2128936835.cms",
"http://abc.net.au/news/syndicate/offbeatrss.xml"
);

// decide how many total entries you want,
//sampled from how many PER FEED
$count = array(10, 5);
// 10 max output, 5 max sourced from each feed.
$showDetails = true;
$cacheName = "arrayOfFeeds";
$filterCat="";
$tLimit = -1;
$dLimit = 10;
$noHTML = true;
$showTime = false;
$feedStyle = false;
$noTitle = true;
$showTimeGMT = true;
$titleImages = false;
$multiSiteTitle = false;
$makeRSS = false;
$rssLink="";

$feedOut = getSomeFeed($feeds, $count, $showDetails,
 $cacheName, $filterCat, $tLimit, $dLimit, $noHTML,
 $showTime, $feedStyle, $noTitle, $showTimeGMT,
 $titleImages, $multiSiteTitle, $makeRSS, $rssLink);

if ($feedOut)
 echo $feedOut;
?>

Notice the differences between this and the single feed example above. This time you need to supply an array of the feeds you want and the variables for the function are a bit different.

You can see the result of this step at the bottom of the example page under the heading “Mega Array”: CG FeedRead Examples.

2. Once again, save your page with a PHP extension (something like feedread.php), load it on your server (in the root directory - see above) and then call it in your browser. All should work. (Good luck!)

Tips

  • The first time you try the multifeed version you may get all sorts of messages in your browser window - this just indicates that the caching is going on. Refresh a few times and it should be neat.
  • If you get stray messages you don’t want (like “CGFR: MultiFeed (2) processing…”) just comment those lines out of the cg-feedread.php file (they are for error checking and it’s okay to kill them). They look like:
    dbglog("CGFR: MultiFeed (2) processing...");
  • I have styled the example - see the HTML output to see how it can be done.
  • You can play with the variables in the function - refer to the feedreadReadme.htm page that comes with the download.
  • I have found that not all feeds work. For example, the xml feed from The New Nation site (http://nation.ittefaq.com/artman/publish/rss.xml) did not work, but appears to be properly formed XML. Maybe a setting I need to tweak. Update: I realise now it didn’t work because it is Windows formatted, not UTF-8. That’s why I changed to the IndiaTimes feed.

Good luck. Hope it works for you.

32 Comments


Book-mark this post

Book-mark this post in Del.icio.us, Furl, Digg, Stumble Upon, whatever...
Mouse-over the image and choose your bookmark:


Related articles

32 Comments »

  1. syedmahm said,

    January 24, 2006 at 4:04 pm

    Oh! wonderful, I am through!
    No one before you could make this whole thing so easy for me. Thank you so much for all of this.

  2. Prashanth Narayanan said,

    March 18, 2006 at 2:33 am

    very nice tutorial!
    had this running in 5 minutes! keep up the good work!
    -prash.

  3. stefan asemota said,

    April 5, 2006 at 1:43 am

    thanks for the turorial…CG Feedread seems to have a problem with special characters encoding…. is there a workaround for this?

  4. zac said,

    April 12, 2006 at 2:39 pm

    Stefan - you’ll notice that the feeds from squareCircleZ (at the top of the examples page) are reading special characters just fine, but the ones from the news feeds lower down on the page are having a problem.

    A lot of blog engines (and obviously news ones too) replace “&” with “&amp;” and this messes up the feeds and makes it appear that CG Feedread cannot handle special characters. If you have control of the output, try experimenting with any code that replaces “&” with “&amp;“. If you are pulling from outside sources, you may need to play with the functions in the uni_fns.php file in CG Feedread.

    A less stressful approach may be to ask Chait, the author of CG-Feedread, in the discussion at the bottom of his forum.

  5. Larry Eeles said,

    July 10, 2006 at 2:30 pm

    This is a great tutorial.. thats the first I have got to work.

    The only thing is that i would like this to update when a new blog is posted and it dosent seam to do this? is there a way of doing this?

    Thanks Larry

  6. zac said,

    July 11, 2006 at 1:03 am

    Hi Larry. I’m glad that you found the tutorial useful. Maybe by now you have already seen your updated post in the page containing CGFeedread.

    This script caches the feed details on the server. This is so that the original blog is not suffering from unnecessary hits and expensive bandwidth. (Or worse, the owner of the blog may block you because you are hogging the pipe). I can’t remember the original cache period (I think it is one hour) but you can change this setting in the script, near the top.

    When testing, I will drop this setting down to maybe 2 minutes, publish a new post in the blog, wait a few minutes, refresh the page where the feeds are displaying and if I see the details of the new post, I know all is working well. Then I set it back to 1 hour.

  7. Dave said,

    September 6, 2006 at 9:45 pm

    Zac - have you been able to get this script to work on a new page? I can get it to work, but $noHTML = true or to false does not display the HTML of the original post. Also, when I try to limit the length that is displayed ($dLimit) nothing happens. In your example you use “10″ which is ignored.

  8. zac said,

    September 7, 2006 at 3:37 pm

    Hi Dave

    How are things in Canada?

    The trick is to reduce the caching time so that whatever changes you make to the variables will show up more quickly. In the cg-feedread.php file, go to the line near the top and change it to something like
    $XML_CACHE_TIME = 20;
    Now after 20 seconds you should see the changes you have made to $noHTML (refresh after 20 seonds - it should display in all its HTML glory)..

    As for $dLimit, the original function in Chait’s file has several variables fixed, which means any changes in your own file that calls those function variables will be ignored.

    In cg-feedread.php, change the function getSomeFeed by removing the equals bits so it looks something like…
    getSomeFeed($InUrl, $maxItemsPerFeed, $showDetails, $cacheName, $filterCat,…

    Now, when you feed values from your own feedread.php page to the function in cg-feedread.php, they should be effective.

    Don\’t forget to change back that $XML_CACHE_TIME variable or you may get blacklisted from the site(s) you are feeding from…

    Let me know how it goes.

    Update: This does not appear to work any more. See a later comment on this issue.

  9. Krazyglu Blog » How quickly does this update? said,

    September 24, 2006 at 6:42 am

    [...] July 11, 2006 at 1:03 am [...]

  10. David said,

    November 14, 2006 at 2:23 am

    Just to note, you can make things a tiny bit simpler by uploading the entire cg-plugins folder (and not messing around with which files to upload!), and whichever of the cg-ZZZZZ-plugin.php files you want in order to let WP control activation of the plugins (I normally have folks upload the entire thing, and then just activate the ones they want). So upload cg-feedread-plugin.php (or whatever it is called), and then you don’t have to modify your theme to activate it.

    BTW, uploading the entire folder will become more and more important as my plugins share a lot of code, and I continue to ‘factor’ shared code out into individual files. It also allows you to play with other plugins without trying to figure out what relies on what… ;)

    -d

  11. zac said,

    November 14, 2006 at 5:55 am

    Thanks, David -I appreciate your input (and your coding!).

    Actually, I separated out the files I needed because I was having a conceptual problem understanding which files I needed for Wordpress (as it turned out, none) and which ones I needed for my feedread situation (the four I mentioned).

    My usual approach with such situations is to strip away all that I don\’t need and then get the subset of files to work. But I see your point about uploading the lot, especially now that you are using code libraries.

  12. Alex Dichev said,

    December 26, 2006 at 7:24 pm

    Thank you!

    This was very nice tutorial.
    I have few pages ready reading feeds running in 10 minutes..

    Thanks again!

  13. zac said,

    December 27, 2006 at 2:49 am

    Hi Alex. I’m glad that you found the tutorial useful.

    Zac

  14. Dan Gargus said,

    January 14, 2007 at 1:56 am

    First of all, thx for feedread… it works great!

    Does anyone know of a way to add some cellpadding to the feeds that I’m bringing in from Yahoo news using cg feedread?

    Checkout this link http://www.gargusgazette.com and note how the photos and news descriptions are too close and in need of some padding.

  15. zac said,

    January 15, 2007 at 4:20 am

    You’re welcome, Dan. I’m glad you found it useful.

    As for your page, I have several comments.

    (1) To get padding around the photos, add this to the style sheet:

    .storycontent img {
      padding:0 10px 10px 0;
    }
    

    This targets the images in the “storycontent” classes only.

    (2) Your page does not work well in Firefox, and it should. The “From the editor” section is okay in IE, but overlaps the header image in FF.

    There are many errors in the page, the first of which is the doctype. You should have (because of the content in your page) at the top:

    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
    <html xmlns="http://www.w3.org/1999/xhtml">
    <head profile="http://gmpg.org/xfn/11">

    I suggest you run your page through the W3C validator at http://validator.w3.org/ and work on the problems one by one. It is a good learning experience!

    (3) You have a mix of DIVs and tables on the page, which is not a good idea and adding to your problems. Let me know if you need help with this.

    Sometime soon I will write a tutorial on how to edit the CSS on a Wordpress blog. The first step will be to remove all existing styles and add things one at a time. Watch this space.

  16. Dan Gargus said,

    January 15, 2007 at 10:53 pm

    Update: This comment is not about CG feedread. Ignore if you are just starting.

    Thx VERY much Zac!,

    The .storycontent addition to the style sheet works great!

    I added the suggested doctype code…

    Don’t know where to begin in modifying the css for the WordPress Kubrick template in order to make the my header navigation work in Firefox? It took me forever to get what I have to look right in IE…

    Here’s a link to a screenshot of what the header looks like on my system and what I’d like to look like in Firefox: http://www.gargusgazette.com/images/header_screenshot.gif

    Thx Again!,

    Danny

  17. Dan Gargus said,

    January 16, 2007 at 2:24 am

    Update: This comment is also not about CG feedread. Ignore if you are just starting.

    Zac,

    Just a sidenote: I don’t know if the following would be a correct and valid thing to do but here’s my reasoning…

    The index.php page is really just a static HTML page made from just enough elements from the Kubrick template to make the page(s) look as though it were part of the blog template. They are really somewhat separate from the blog itself… I renamed it to the .php extension so as to be able to use php snippets like the timestamp as well as cg-feadread.

    I made the page, and others, so that I could use them more traditionally so third-party scripts can be used that wouldn’t normally be possible if they were within the blog but yet they retain the Kubrick look and feel.

    The upcoming Historical Value pages/application are using HTML header and footer pages via SSI built around a standard MySQL FAQ/knowledge base script: http://www.gargusgazette.com/appiesnet/appieskb/kb1/

    Now I can follow suit for adding a photo gallery, shopping system etc…

    My plan was to make other static HTML pages out of and for other WordPress templates… my concern is that I’m messing with proper validation that could hurt me in the search engines.

    Thx As Always!,

    Danny

  18. squareCircleZ » Tidying CSS and XHTML - a case study tutorial said,

    January 16, 2007 at 9:47 am

    [...] One of my readers used my Adding Feeds With CGFeedread tutorial and got it to work with some Yahoo! news and other feeds. Dan is using one of the Wordpress templates for consistency across his site. [...]

  19. zac said,

    January 16, 2007 at 9:53 am

    Update: This comment is not about CG feedread. Ignore if you are just starting.

    Hi Dan

    I had a play around with your problem page.

    I wrote a new post with the explanation of what I did and why at: Tidying CSS and XHTML - a case study tutorial.

    My suggestions:
    1. Learn some CSS (from scratch) - there are tonnes of good tutorials out there.
    2. Forget tables - they add too many complications to what you are trying to do. Just stick with DIVs.
    3. Check everything you do in the W3C validator - I learned mountains from the feedback it gives.
    4. Check in as many browsers as you can, but certainly IE6, IE7 and FF.

    Good luck.

  20. mickeyb said,

    February 25, 2007 at 12:32 pm

    Hi

    I think I followed your instructions to the letter. I have wordpress and made the link to the blog inside your piece of script http://www.londonsaints.com/wordpress.

    This is what I got. Only the title of the most recent blog appeared at the end of a lot of errors

    CGFR: Recaching blog (http://www.londonsaints.com/wordpress?feed=rss2)… CGFR: dealing with singlular-entry case…
    Warning: fopen(/home/channon/public_html/cg/cache_feedread/blog.DAT) [function.fopen]: failed to open stream: No such file or directory in /home/channon/public_html/cg/cg-feedread.php on line 809

    Warning: flock() expects parameter 1 to be resource, boolean given in /home/channon/public_html/cg/cg-feedread.php on line 810

    Warning: fwrite(): supplied argument is not a valid stream resource in /home/channon/public_html/cg/cg-feedread.php on line 812

    Warning: flock() expects parameter 1 to be resource, boolean given in /home/channon/public_html/cg/cg-feedread.php on line 813

    Warning: fclose(): supplied argument is not a valid stream resource in /home/channon/public_html/cg/cg-feedread.php on line 814

    Warning: fopen(/home/channon/public_html/cg/cache_feedread/blog.html) [function.fopen]: failed to open stream: No such file or directory in /home/channon/public_html/cg/cg-feedread.php on line 835
    CG-Feedread failed to save feed to disk — couldn’t write to the cache_feedread directory.
    Warning: fclose(): supplied argument is not a valid stream resource in /home/channon/public_html/cg/cg-feedread.php on line 847

    Well blogger me….
    OK.…

    can you help??

  21. mickeyb said,

    February 25, 2007 at 6:42 pm

    sorted that one out now

  22. zac said,

    February 26, 2007 at 2:24 am

    Hi Mickeyb
    Looks like you had a file permissions problem on the blog.DAT file. I’m glad you got it sorted out.

  23. mickeyb said,

    February 26, 2007 at 6:56 pm

    Hi again

    Am almost there. The only trouble I have now is that my CG-Feedreader links on my html home page won’t update when new blogs come in no matter how much I refresh the page.

    The only way I can make it happen is to go into the cache-feedread folder and delete the blog.DAT and blog.html files.

    any idea what’s going wrong?

  24. mickeyb said,

    February 26, 2007 at 7:02 pm

    ooops… didn’t read the comments above. will try that solution

  25. mickeyb said,

    February 26, 2007 at 7:30 pm

    AAAAAaaarrrrrrrgh!

    now it has totally and utterly stopped working. It is putting the WHOLE blogs on my html page instead of just the head and the first few characters!!!!!

    what can have gone wrong. I’ve reloaded everything, killed off the blog Dat and html files put it still keeps coming back

    what can have gone wrong!!!!

  26. zac said,

    February 26, 2007 at 10:16 pm

    Hi Mickey

    If you set (for example)
    $dLimit = 100;
    it will give you the first 100 characters of the post.

    Keep smiling!

    Update: Actually, this does not work! I’m not sure why. I have a workaround, though, which is not so bad.

    Open cg-feedread.php and paste this function into the top (underneath the comments, near where it has the feedread version number):

    function limit_words($string,$limit)
    {
      $numWords = 0;
      $result = "";
      if($limit<1 || !is_int($limit)) return $result;
        $word = strtok($string, " \\n\\t");
        $result .= $word;
        while($word && (++$numWords < $limit)) {
          $word = strtok(" \\n\\t");
          $result .= " $word";
        }
      return $result;
    }
    

    Now, about 2/3 of the way down the cg-feedread.php script, find the line that says:

    $itemDescription = cleanBadChars($itemDescription); // just in case...

    After that line, add these 2 lines (the first is a call to the function and the second adds the “…” at the end of each truncated post.

    $itemDescription = limit_words( $itemDescription, $dLimit );
    $itemDescription .= "...";
    

    Now you can go back to your feedread.php file and set the $dLimit to whatever you like.

    It should work okay. Good luck.

    I have not extensively tested this - let me know if it breaks.

  27. mickeyb said,

    February 27, 2007 at 12:04 pm

    Hi zac

    that bit of script worked ok once I fiddled with it. I replaced

    ($string, ” nt”);

    with

    ($string, ” %”);

    because it was delivering the dlimit but with all the n and t letters missing from the text. so I thought I’d use a character that rarely appears in the intro of blogs

    the script had been working fine until last night. strange how it suddenly went wrong. all I had done was change themes on my blog but that can’t have done the damage.

    cheers

    mickey

  28. zac said,

    February 28, 2007 at 1:25 am

    Wak!

    Wordpress changed my “\n\t” into “nt” (because it was in a <pre> tag) and I hadn’t noticed it.

    This removes new line and tab characters from the post when counting the number of words (and puts them back in when giving the output).

    I have modified the code in my comment so it looks correct and can be copied and pasted.

    I see your example is working fine now.

  29. mickeyb said,

    March 23, 2007 at 2:20 pm

    Hi Zac

    Is it possible to get the blog on the home page to show the name of the blogger too apart from the first ten words

    also can latest comments be shown on the front?

    cheers

    Mickey

  30. zac said,

    March 24, 2007 at 5:52 am

    Hi Mickey

    Latest comments is easy. You just give to the function the feed for comments.

    See Comments feed for an example.

    To achieve this, I just gave it

    /comments/feed/

    instead of

    /feed/

    In your case, you would give it.

    http://londonsaints.com/wordpress/?feed=comments-rss2

    You will get the name of the person who commented.

    I’m afraid the name of the blogger will require tweaking Chait’s script. Maybe you could contact him at his blog.

    Another option is the following…

    Feedread is great if you are pulling feeds from outside sources. But if you are pulling your own feeds, it really is overkill. On my mathematics site, I pull 3 posts from my blog on each page (see here for an example towards the bottom of the left column). I am not using Feedread at all there, I am accessing the database directly.

    Are you interested in that option?

  31. Phoenix said,

    April 6, 2008 at 3:10 am

    I know this thread is like a year old. I dont know if anyone even looks at this anymore, but for the sake of keeping sanity I will post my question anyways.

    I have tried to limit the number of words that appear, really I only want the title to show up and nothing else. I read through the comments and did what zac said, and i edit the change that mickeyb pointed out.

    My site shows lots of the post, i only want 15 characters, however, where i refresh nothing happens.

    Anyone care to help? I know its old but its worth a try….thanks

  32. zac said,

    April 6, 2008 at 7:36 am

    Hi Phoenix

    If you check it again now, maybe it will look how you want. The caching system of this script can be a trap - you make change to the script and then nothing seems to change on the HTML page.

    When modifying, set the timeout ($XML_CACHE_TIME) to a very low value so changes appear immediately.

    Then remember to set it back!

    If it is not OK still, send the link and I’ll have a look.

Leave a Comment