Advisare-0.02 is out.
Please get it at Sourceforge as usual. I’ve completely rewritten the source code, added better documentation, demonstration files. The architecture is now object-based and, hopefully, the next release will be fully object-oriented.
Eddy Young says
Why not subclass SGMLParser for parsing the HTML code? Also, urllib allows you to grab the HTML directly from the website.
I’ve posted some code that extracts links from Google News. I would contribute in SourceForge, but I’m busy with another project. You can get some idea from my GoogleNewsParser code, though.
http://coding.mu/archives/2004/03/27/google_news_parser_in_python
avinash says
I am not too keen to use SGMLParser because the HTML I get from Canal Satellite is really badly formed and it would be a pain to process it if is not not tidied beforehand.
And, of course, as soon as it is tidied, I can use a proper XML parser (like expat) and it makes my code easier to write and to maintain.
The difficult part is the second part of the program. I am thinking about reading the XML produced and build a web of objects with semantic links between them (like same category, same duration, same channel…).
Then using a proper graph traversal algorithm, the application should be able to propose a TV program based on what the user has seen before.
Eddy Young says
Uhm, this is getting interesting. Maybe I should contribute some code after all :-)