What do you want in an ideal Newsing system? Submit your ideas here.
forum
posted by lxpk Tue, 2007-08-21 19:02
Groups: DrupalDev Project: Newsing Project Tags:
Groups: DrupalDev Project: Newsing Project Tags:
What do you want in an ideal Newsing system? Submit your ideas here.
»
- Login or register to post comments
- Email this page






Taxonomy of Article Catagories
Redundant!
We need at least the top tier of catagories for the news system so we can tag stories well. I recommend we start with just the top level without breaking down yet to help us get moving.
People
Business
Government
Technology
Maybe candy should be its own category.
Media
Nature
Internet
Here's what I mean by posting it as a single comment and editing
I think business and
You can edit your original
You can edit your original comment instead of adding more.
Each comment can be like a mini web page onto itself.
Individualized Karma
"Basically, you could register with the system that you like or dislike a certain post (article, comment, etc), and you could register that you like or dislike a certain person. Your ratings on stories/comments count more for people who like you and less for people who dislike you, and the ratings of people you like count more for you, and those of those you dislike count less for you.
So posts by people who are my friends, or friends of my friends, or friends of their friends, and so on, appear "modded up" to me (less so for greater degrees of separation); as do the posts which those people mod up (and again, a friend of a a friend of a friend's mod points count less, for my view of the site, then my close fiends' mod points, which count still less than my own). Likewise, my own down-mods make things practically vanish from my view of the site, but the down-mods of my friends' friends' friends only slightly lower a post's score. And conversely, my enemies' ratings count for little; my friends' enemies' ratings count for even less, and so on.
Now that I think about it, I'm not sure that there should be any down-modding at all. Let all users and posts have a default score of 0; if you like it, it gets a +1 in your view, if someone you like likes it it gets +0.5 in your view, if someone they like likes it it get a +0.25 in your view, etc; and then individual users can set their preferred signal to noise threshold anywhere above 0. Maybe weight and normalize it somehow so that all scores fall in the 0-1 range (that way things you personally like always show up).
For users who aren't logged in (if such people will be able to see anything at all), I'd recommend having a Guest user account which is automatically friends with everyone, thus the overall most popular stuff shows up by default. If you register and log in, you can tailor the kind of news and comments you want to see. (It just occurred to me that this would also allow one site to by completely free-range in its topics, and for readers to still see only the news that interest them; so tech people don't have to be bothered by sports news and vice versa).
The only real downside to this I can see is that is would tend to encourage groupthink for most people. However I like to mark intelligent, interesting people as my friends, even when I don't agree with them. If I have an interesting debate with them, they're on the friends list. So, people who want to avoid groupthink will still see dissenting viewpoints. Also, people who like people like that will tend to see dissenting viewpoints as well; I may be popular with a bunch of libertarians and popular with a bunch of socialists, and even though those two groups might otherwise never mark each other as friends, since they're both friends with me, I'll bridge the gap between the two groups, and some socialists will see some things that my libertarian friends like, and some libertarians will see some things that my socialist friends like."
That's pretty awesome
Do you have the skill to implement something like this? Friend distance and the like? WHat kinda specialist skills will you need to get going on it?
Also, as a fan of invasions and trolling I'd recommend against giving the guest account sociometric notations, 'cause it opens the entire site up for manipulation without requiring a login. All you'd have to do is hammer a few nodes from a few dozen IP addys and suddenly those nodes would be important and at the top of every list for everyone.
But me is noob so maybe not?
Update/Revision
When comment vote ups are
When comment vote ups are worth gold stars, you guys are going to make bank on these.
making friends by modding: StumbleUpon and Last.fm make friends lists for you based on your shared site/music taste, that's a great idea.
Threads: Drupal is already unlimited threading. One way to improve it would be threading of nodes under other nodes in a way that's more transparent and easy-to-use than book outlining. (long story but that's how the menu navigation works for the main site content).
Performance nightmare
Even in the simplest screnario, the queries required to do that are going to be pretty crazy. Adding features like non-linear rating and normalization are going to make it much worse, because all the data that needs to be fetched and all the calculations made will have to be done once for each user, and many of them will be done again to get the final result. Ratings from people not on my friends list will be a must, and that can't even be optimized with an index.
Drupal already runs slow enough, and that type of rating system will run slowly regardless of the software used.
Users who likes this $FOO also liked...
Basically, all articles start out with a score of 1, and can be modded up or down exactly once by each user. Each user has a relationship value (by default 1) with every other user. When two users mod an article the same way, their relationship value goes up by some factor, and when they mod it differently their relationship value does down by the same factor (e.g. like modes double your relationship value, unlike mods halve it). The final moderation value a given user sees is not the raw score of that article (upmods minus downmods), but the sum of upmods and downmods times the relationship values of the moderators. And every user of course has a personally chosen threshold, where they only see posts with a relative score (that is, the score as they see it, factoring in their relationships) higher than their threshold. The end result being, you can have one giant forum covering every topic under the sun, with as many trolls and crap as you want, and your moderation of articles as good or bad will influence how likely you are to see or not see more articles like that, calculated on the basis that people who liked articles you liked also liked these articles.
[my friend asked about handling of topics]
Well I was mostly thinking, like, people on Slashdot sometimes complain that the site has so much politics now, getting all up in their tech news. With my moderation system, they'd see all the politics topics as modded below their threshold. It this were implemented in a general news "blog" (ugh I hate that word), then the news topics you see would be the ones you've shown the most interest in in the past; and the guest-user would see the news topics that people in general find most interesting (i.e. things modded by their raw scores). And related topics would automatically sort of group together. Like, if you're only really interested in computer tech news, so you only upmod computer tech articles... but most people interested in computer tech are also interested in general science articles... then you'll see some science creep into your news, since most tech people like science news too. And if you in fact don't like science news and mod it down, that will weaken your relationship to the tech people who also like science news, and you'll see less science news.
End of transcript.
I was thinking earlier about your concerns about efficiency, and wondering how this might be implemented more efficiently, and this idea came to me:
Rather than just a numeric value associated with each article being its raw score, each article has table associated with it, with a list of the people who've modded it in one column and a +1 or a -1 in the other. The sum of those +1s and -1s (plus the initial 1) is the article's raw score. Then, each user has associated with it a table of people who've modded the same articles as him, and their associated relationship values. The final, relative score a user sees is the sum of the raw moderations times any relationship values with matching usernames (ergo if nobody I have any non-neutral relationship with has modded this article, I just see it's raw score).
Would that be efficient enough? It gets rid of the friends-of-friends-of-friends issue, has an underlying raw score similar to how I understand Digg works (which would be what guest users would see), and still accomplishes filtering the articles a user sees by his past taste in articles.
And if it's not efficient enough, I'll just ask again: how do Amazon et al do it?
How amazon does it
There are several reasons why it is much easier for amazon to do this than for us to do it. 1. Amazon's system isn't very helpfull. I allways get better results by searching. 2. As you mentioned, much smaller amount of data per user. Few people write reviews on amazon. Perhaps a better example would be newegg. Their recomendations appear to be based on shopping carts only, and their ratings are a simple average of all ratings. 3. Vastly superior hardware. 4. Their entire system is much more streamlined. A basic drupal site, fresh out of the box, will lag on my fastest computer (2ghz,3gb ram), even viewed locally.
Basically, the performance issue is a "nightmare" not "impossible." Once our budget per user gets anywhere near amazon's, it will be doable, but still not particularly fast. As a comparison, I can open 10 amazon windows in under 20 seconds. Opening 5 empowerment windows takes about 2 minutes.
Since the desired result is better usability of the site through increasing the quality of viewable information, we are better off at this point just watching what we post and making sure we drown in our own spam. For site with 10 million users and hundreds of thousands of dollars in funding, if not millions in funding, what your talking about is pretty much dead on, but for 10 users and no funding, it isn't going to happen. Yet.
implementation
"Rather than just a numeric value associated with each article being its raw score, each article has table associated with it, with a list of the people who've modded it in one column and a +1 or a -1 in the other..."
Thats pretty much what I was thinking, only you could do without the -1s. Thats what I was calling the simplest case. Performance could be helped by moving the weighting calculation off site. That is, another server grabbs the viewer's weights and all the ratings for that comment, then does the math, and drupal gets the results, thus bypassing the massive overhead of drupal. Then for each comment, a second server does 2 queries and drupal does one (to the other server). That wouldn't break anything, but it would also be linear in nature, meaning that it would still have to have a little math done to make the ratings usefull, though that could also be done on a different server.
What I don't know is how well drupal can utilize that kind of information when it isn't built into the node/comment's weight.
Logarithmic Scoring
That is simple to
That is simple to implement.
The data required are
x=the total positive ratings
m=the total positive ratings of the highest rated item
f=the groupthink factor, from 1 to 0 with 1 being linear and smaller numbers favoring items with few ratings
w=the wieght to apply to the item with the highest rating. 20 is the highest that makes any sense in relation to the drupal wieghting system. This is the "cap"
the formula is
w(x/m)^f
Jellybeans should have their own category
In a word--Filters
My ideal newsing system would allow me to filter things for search and storage by a wide variety of filters.
And if we're gonna get really IDEAL about it, maybe the filters could include regular expressions.
It would just help if you could sort out mainstream media vs. indy, left vs. right, categorize links to blogs, forums, wikis, etc. separately, so they could be included in the results or omitted, as determined by your filter preferences.
Just my two cents. When it comes ot news, I like to hear all the sides I can, but there are frequently an overwhelming number of sides--thus, any tools and tricks we can develop to sort, compare, and rate our information and its sources are good news in my book.
-cid