Perl - Google Reader

pdupreez · Jul 26, 2009

I use Google Reader (RSS aggregator?) a lot to capture interesting links and store them as "starred" for later processing. What I am looking for is a way to get all the links in the starred items, create a new html page (if necessary), spider through all the links within that new page and extract links to specific hosts into a text or html file.

How can this be done, and is Perl the right tool of choice?

KevinADC · Jul 26, 2009

This is probably not going to help you very much, but there is already a module written to interface with google reader:

http://search.cpan.org/~gray/WebService-Google-Reader-0.08/lib/WebService/Google/Reader.pm

You can peek into the source code and see how the author has written the script.

------------------------------------------
- Kevin, perl coder unexceptional! [wiggle]

pdupreez · Aug 5, 2009

Thanks Kevin for responding. That did not give me what I was looking for but further searching on Google did the trick. I had to mash a number of different scripts together, but can now efficiently process hundreds of Google Starred links and dump the selected links to a text file for further processing. It is in Ruby, and you may be able to convert it to perl for us?

Your Google Starred items must be set for Public access, and you can find the [bold]xxxxxxxxxxxxxxxxxxxxx[/bold] Google Reader Account Number in the public web link when you change the Starred view to public (Under Settings->Folder & Tags in Google Reader)

I start the script with:

[bold]ruby GRS.rb [skip][/bold]

[skip] is optional if you have already parsed the Google Links and want to redo the download links

#############################
# START OF SCRIPT
#############################

require 'net/http'
require 'uri'
require 'open-uri'
require 'rubygems'
require 'hpricot'
require 'simple-rss'

# Check if you wish to skip parsing Google Reader again

if ARGV[0] == "skip"

# Check if GoogleReaderStarredLinks.txt exists

#File.open("GoogleReaderStarredLinks.txt")

if File::exists?( "GoogleReaderStarredLinks.txt" )

puts "Skipping Google Reader parsing"

else

# If File does not exists, parse Google Reader RSS feed with simpleRSS and place it in a text file for further processing

feed = "

http://www.google.com/reader/public/atom/user/

[bold]xxxxxxxxxxxxxxxxxxxxx[/bold]/state/com.google/starred?n=500"
rss = SimpleRSS.parse open(feed)
rss.entries.each do |item|
open('GoogleReaderStarredLinks.txt', 'a') { |f|
f.puts item.link
f.close
}
end
end
else

# If File does not exists, parse Google Reader RSS feed with simpleRSS and place it in a text file for further processing

feed = "

http://www.google.com/reader/public/atom/user/

[bold]xxxxxxxxxxxxxxxxxxxxx[/bold]/state/com.google/starred?n=500"
rss = SimpleRSS.parse open(feed)
rss.entries.each do |item|
open('GoogleReaderStarredLinks.txt', 'a') { |f|
f.puts item.link
f.close
}
end
end

#include UrlUtils

# Push all URL's in file to an Array. This could have been done directly, but will
# not allow a repeat without having to parse Google Reader again, which takes time and bandwidth

urls = []
File.open('GoogleReaderStarredLinks.txt', 'r') do |file|
file.readlines.each do |line|
urls.push(line.chomp)
end
end

# Loop through each of the URL's in the Array urls[]

urls.each do |url|
puts "Google Reader Link : " + url

# Open the URL and check for errors (timeouts and HTTP)
# If any, skip to the next URL

begin
url_object = open(url)
rescue Timeout::Error
puts "The request for a page at #{url} timed out...skipping."
next
rescue OpenURI::HTTPError
puts "The request for a page at #{url} returned an error"
next
end

# Parse the link with Hpricot i.e. open the webpage linked to the original URL and
# read that into a variable doc, which holds an equivalent of the webpage source code

next if url_object == nil
doc = nil
doc = Hpricot(url_object)

# Look for an URL link

doc.search('a[@href]').map do |x|
new_url = x['href'].split('#')[0]
unless new_url == nil

# Checks if the webpage contains a link to one of the online file storage servers I have an account with
# and put the link into a text file for further processing, and loop to the next URL in the webpage

if new_url.include? 'abc.com/files'
open('DownloadLinks.txt', 'a') { |f|
f.puts new_url
puts " Download link : " + new_url
}
next
elsif new_url.include? 'pqr.com'
open('DownloadLinks.txt', 'a') { |f|
f.puts new_url
puts " Download link : " + new_url
}
next
elsif new_url.include? 'xyz.com'
open('DownloadLinks.txt', 'a') { |f|
f.puts new_url
puts " Download link : " + new_url
}
next
end
end
end
end

#############################
# START OF SCRIPT
#############################

KevinADC · Aug 5, 2009

It is in Ruby, and you may be able to convert it to perl for us?

Sorry, I can't. Maybe someone else can.

Kevin

------------------------------------------
- Kevin, perl coder unexceptional! [wiggle]

TrojanWarBlade · Aug 5, 2009

Are we now a script conversion facility? ;-)

Trojan.

pdupreez · Aug 5, 2009

Just shopping around. But no, seriously there is no need to convert, I just know this is a Perl forum, if people want to have it in Perl. It works great in Ruby, which in the Windows environment seems to have less overheads (no Cygwin)and integrates better into Scite in any case.

KevinADC · Aug 5, 2009

You don't need cygwin unless you want to use Unix commands with Windows. You can run activeperl or strawberry perl on Windows.

------------------------------------------
- Kevin, perl coder unexceptional! [wiggle]

Annihilannic · Aug 5, 2009

Besides, anyone who can actually understand Ruby well enough to convert it will probably declare that it is a far superior language anyway, so why would you want to??

Annihilannic.

pdupreez · Aug 5, 2009

Thanks for the input. I have only started using Perl and Ruby in the last few days, and I am learning a lot. You are right, I can run Perl outside of Cygwin, which is nice to know. Time to go fix the path!

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Perl - Google Reader

pdupreez

Technical User

KevinADC

Technical User

pdupreez

Technical User

KevinADC

Technical User

TrojanWarBlade

Programmer

pdupreez

Technical User

KevinADC

Technical User

Annihilannic

MIS

pdupreez

Technical User

Similar threads

Part and Inventory Search

Sponsor