Chad Orzel writes about The Journal of Stuff I Like. Basically, the idea is that you're reading papers anyway; just make a list of the ones you found worthwhile and put it where people can read.
This certainly seems like it could be a useful thing. If you liked one paper I liked, you might like other papers I like.
The dual, of course, would be to compile a list of "people who liked this paper". And from there it's a short step to "people who liked this paper also liked...", and if there's some centralized system keeping track of things, amazon.com-style recommendations, which have on occasion found me books that I liked but wouldn't have known about. (It's a short step conceptually; I'll admit that I don't know too much about how big of a step it is computationally.) But even without these extra steps in the previous paragraph, it could still be worthwhile.
I might start a list. Like I said, it wouldn't be hard. I already have a list of papers I've read, which is reasonably complete over the time period I've been keeping it.
17 September 2008
Subscribe to:
Post Comments (Atom)
9 comments:
Keep it all in a database. Shouldn't be too hard computationally to establish a list of "People who liked this also liked..."...A short little SQL query should get you the list. When the dataset gets very large it might take some time, but I can't see that being much of a problem.
OR, give each paper and ID#. When someone adds it as a paper they like, add the list of papers they like to a special field. I.E. I like papers 3, 37, and 375. When I add paper 43 to my favorites list, check the ones I liked (3, 37, 375) with a list of papers that people who like #43 liked. If it's not on the list yet, add it. Then, it's just a matter of pulling up the list. Using the ID numbers will make it much quicker to test if a paper is already in the list, because we can do a log(2)N search on the list instead of basically having to go all through it looking for the title.
I hope that made sense. This is an interesting concept.
Sean,
yes, that sounds reasonable. I'm used to thinking about algorithms but not so much used to actually using them.
As it is, if I'm going to do this I'd just start a simple web page; if someone else comes up with a decent way to connect lists like this I'd use it, but I'm not going to invent that wheel.
Yah, I like to think about processes, but I like to implement them too, in some cases. This one doesn't seem too difficult. The hard thing would be somehow getting the papers into the database so that you can effectively refer to them.
And it seems like a short digression from this to reddit/google page-rank style (or even Amazon/others x stars) rating of papers.
All to be available through Google Scholar (just as an example).
I guess however my record of papers I've enjoyed (or used/liked/whatever) is basically represented by the (somewhat ordered) LIFO stack I keep on my desk :-).
andy,
unfortunately the rest of us don't have access to the stack of papers on your desk.
As one of the comments on the article you link to notes: CiteULike has much of this functionality already.
I've been using it for a while now to collect and organise papers I've read. Although it doesn't do any collaborative filtering you can easily find other people who have read the same articles as you and then search through their bibliographies.
There is a rating that can be given to papers but it seems to be used as a measure of priority rather than how much the article was liked. A list of papers a CiteULike user liked could be implemented by creating a "Stuff I Like" group and adding articles to that or by using a "Stuff I Like" tag.
Rather than trying to build the infrastructure for a new site it might be better to ask CiteULike if they would be interested in supporting this functionality more directly. Alternatively, one could build a site that uses the data from CiteULike as a basis for the collaborative filtering.
On the contrary, doing recommendations is absurdly hard to scale. Listing people that liked something is O(n) at best, listing everything _they_ liked is O(n^2). If you want to go one step further and do recommendations for a person rather than an article, you're doing things-(people-who-liked-(things-you-liked)-liked), which is O(n^3)!
I would second Mark for CiteULike. I've been using it more for listing my books lately than posting links to physics/maths preprints.
Well I acquiesce in but I dream the list inform should have more info then it has.
Post a Comment