January 26, 2009
Feeding The Monster
I have many different Web 2.0 personas and I'd love to pull together all the various postings and updates I do into one stream and display that. Of course, this all has a name and it is Lifestreaming. Personally, I think it is a dumb name. I like the web, and do a lot of things on the web, but trust me, it ain't my life. I prefer to call it "Webstreaming".
For the most part, Webstreaming is based on RSS feeds. But it's a complicated thing to do, as nearly every feed has some customization required to get it to display correctly. There are links, text, microblogs, audio, video, etc. There are s few sites that try to do it for you, but of course for us geeks, it has to be self hosted.
So I downloaded and installed Sweetcron, a PHP-based blog software. There are some really interesting implementations of this, most especially Tom Beardshaw, who does a lot of work on it. Unfortunately, it does require a lot of customization, done all in PHP, which is no longer a favorite language of mine. I used to like it, but now find it too idiosyncratic and prone to code breakage. I do have a rough version of it running on my new domain, IHieronym.us (I use the nom de plume Hieronymus on a few Web 2.0 sites).
As I really would rather do something in Python or, even better, Django, I went on a quest. As I said, most social media sites will report activity via an RSS stream, so I needed to find a python-based RSS parser and the most common one seems to be the Universal Feed Parser, which can deal with nearly any kind of RSS / Atom feed.
But on top of that I found Feedjack, a full blown feed "aggregator" built using both Django and Feedparser! Pretty cool stuff, although installation is a very strange thing. I'm not intimately familiar with python or Django installation issues, and found it weird how the feedjack page talks about "your Django", as Django is a library, not an installation. But what they really mean is "your Django project". So, after getting Feedparser and Feedjack installed (for openSUSE, I installed Feedparser via YaST & Feedjack via the python setup.py install process, while I on my FreeBSD box I was able to find both of them in the ports), I did the following basic steps to get a Feedjack site up and running:
$ django-admin.py startproject ihieronymus $ cd ihieronymus $ ls __init__.py manage.py settings.py urls.py __init__.pyo manage.pyo settings.pyo urls.pyo [ after editing setting.py and urls.py as suggested here to add the admin site ] $ python manage.py syncdb Creating table auth_permission Creating table auth_group Creating table auth_user Creating table auth_message Creating table django_content_type Creating table django_session Creating table django_site Creating table django_admin_log You just installed Django's auth system, which means you don't have any superusers defined. Would you like to create one now? (yes/no): yes Username (Leave blank to use 'jdarnold'): E-mail address: jdarnold@buddydog.org Password: Password (again): Superuser created successfully. Installing index for auth.Permission model Installing index for auth.Message model Installing index for admin.LogEntry model $ sudo python manage.py runserver 207.22.41.217:80 Password: Validating models... 0 errors found Django version 1.1 pre-alpha, using settings 'ihieronymus.settings' Development server is running at http://207.22.41.217:80/ Quit the server with CONTROL-C. [26/Jan/2009 13:05:31] "GET / HTTP/1.1" 404 1921 ^C $
So at this point, we have a very basic djano site with the admin part all setup. Now we need to add in the Feedjack values into the settings.py file:
MEDIA_ROOT = '/www/data/'
MEDIA_URL = 'http://www.myserver.com'
INSTALLED_APPS = (
'django.contrib.auth',
'django.contrib.contenttypes',
'django.contrib.sessions',
'django.contrib.sites',
'django.contrib.admin',
'feedjack',
)
Now I need to create a link to Feedjack's static images on my 'regular' web server. Remember, the builtin Django debug server doesn't serve up static images and pages, so you need host them somewhere else. Also, on many Apache sites, it won't allow serving up pages outside of the Apache folders, so you may have to copy the folder into the Apache data folder rather than just using a symbolic link.
$ sudo ln -s /usr/local/lib/python2.5/site-packages/Feedjack-0.9.16-py2.5.egg/feedjack/static/feedjack /www/data/feedjack
Now we should be able to run syncdb to add in Feedjacks db files:
$ python manage.py syncdb Traceback (most recent call last): File "manage.py", line 11, inexecute_manager(settings) File "/usr/local/lib/python2.5/site-packages/django/core/management/__init__.py", line 340, in execute_manager utility.execute() File "/usr/local/lib/python2.5/site-packages/django/core/management/__init__.py", line 295, in execute self.fetch_command(subcommand).run_from_argv(self.argv) File "/usr/local/lib/python2.5/site-packages/django/core/management/base.py", line 195, in run_from_argv self.execute(*args, **options.__dict__) File "/usr/local/lib/python2.5/site-packages/django/core/management/base.py", line 221, in execute self.validate() File "/usr/local/lib/python2.5/site-packages/django/core/management/base.py", line 249, in validate num_errors = get_validation_errors(s, app) File "/usr/local/lib/python2.5/site-packages/django/core/management/validation.py", line 28, in get_validation_errors for (app_name, error) in get_app_errors().items(): File "/usr/local/lib/python2.5/site-packages/django/db/models/loading.py", line 128, in get_app_errors self._populate() File "/usr/local/lib/python2.5/site-packages/django/db/models/loading.py", line 57, in _populate self.load_app(app_name, True) File "/usr/local/lib/python2.5/site-packages/django/db/models/loading.py", line 72, in load_app mod = __import__(app_name, {}, {}, ['models']) File "/usr/local/lib/python2.5/site-packages/Feedjack-0.9.7-py2.5.egg/feedjack/models.py", line 19, in class Link(models.Model): File "/usr/local/lib/python2.5/site-packages/Feedjack-0.9.7-py2.5.egg/feedjack/models.py", line 20, in Link name = models.CharField(maxlength=100, unique=True) TypeError: __init__() got an unexpected keyword argument 'maxlength' $
Darn, not quite there. This 'maxlength' problem is usually indicative of some kind of Django version mismatch. And yup, the Freebsd ports version of Feedjack is 0.9.7, while the Django version is 1.1 pre-alpha. So I grabbed and installed Feedjack v0.9.16 and now:
$ python manage.py syncdb Creating table feedjack_link Creating table feedjack_site Creating table feedjack_feed Creating table feedjack_tag Creating table feedjack_post Creating table feedjack_subscriber Installing index for feedjack.Post model Installing index for feedjack.Subscriber model $
Spot on! Now when I run the server and go to the admin web page, I see Feedjacks new entries. Next is to try and decipher the obscure references for how to grab new feeds, as Feedjack is pretty tied to its paradigm of grabbing feeds from other People. First you add a "Site", which basically defines what your site is going to look like. Then you add some feeds, giving it the RSS url. Unfortunately, it isn't nearly as good at figuring out the correct RSS URL as is Sweetcron and many Web 2.0 sites make it far too hard to track it down. Then you need to add a "Subscriber", which links the Site to the Feed. All of this is because you can have multiple "Planets" hosted with a single Feedjack installation, but it seems to me to be overkill, as most sites, including all the ones I looked at in their links, only have one.
Now that you have a few sites, you run the feedjack_update.py script to go get the RSS feeds and drag them in. This requires a little bit of environment dancing around, as otherwise it can't find the Django info it needs:
$ pwd /home/jdarnold/django/ihieronymus $ export PYTHONPATH=/home/jdarnold/django $ export DJANGO_SETTINGS_MODULE=ihieronymus.settings $ feedjack_update.py * BEGIN: 2009-01-26 14:43:49.014353 [2] Processing feed http://anaze.tumblr.com/rss [2] Processed http://anaze.tumblr.com/rss in 0:00:00.341290 [ok] [new=0 updated=0 same=20 error=0] [1] Processing feed http://linuxlove.tumblr.com/rss [1] Processed http://linuxlove.tumblr.com/rss in 0:00:00.528515 [ok] [new=0 updated=0 same=20 error=0] * END: 2009-01-26 14:43:49.941449 (no threadpool module available, no parallel fetching)
Oh, and you should probably get memcached running, as it really helps speed the database access. feedjack does a good job of using the cache. You can also install threadpool, as discussed on the web site, but I haven't tried that yet.
Next, of course, is the never ending customization battle. First though, I need to decide if this is the path I want to go down. Feedjack comes with two "themes", neither of which is as expressive as I want, especially for my music feeds like blip.fm and last.fm. And especially after I came across soup.io, which looks like it might already do all I need. Check out my page here: hieronymus.soup.io.
del.icio.us | Digg it | Furl | Yahoo MyWeb | Create Social Bookmark Links
Posted by jdarnold at 09:36 PM | TrackBack
Track with co.mments
Track with co.mments 


