What do you want in an ideal Newsing system? Submit your ideas here.


Groups: DrupalDev Project: Newsing Project Tags:
What do you want in an ideal Newsing system? Submit your ideas here.

Taxonomy of Article Catagories

Redundant! 

 

We need at least the top tier of catagories for the news system so we can tag stories well.  I recommend we start with just the top level without breaking down yet to help us get moving.

 

 

People

News articles about individual people.

Business

News items that have to do with a business directly.  Press releases, standards and mergers.

Government

News stories about new and old laws, embargos, political comings and goings, and other things having to do with federal entities.

Technology

News about technology and it's implimentation.  Also, candy.

Maybe candy should be its own category.

A Wider Perspective On Flavor

Media

News articles about artists, publishers, distro agents, and any other media.

Nature

Hurricanes, earthquakes, comets, and solar flares.  Also, birds and trees and other things that should be purged to ensure homo sapien superiority.

Internet

News about the comings and goings of the social networks that make up the internet.

Here's what I mean by posting it as a single comment and editing

  • Taxonomy of article categories
    • We need at least the top tier of catagories for the news system so we can tag stories well.  I recommend we start with just the top level without breaking down yet to help us get moving.
    • Media
      • MediaNews articles about artists, publishers, distro agents, and any other media.
    • Technology
      • News about technology and it's implimentation.  Also, candy.
    • Government
      • News stories about new and old laws, embargos, political comings and goings, and other things having to do with federal entities.
    • Business
      • News items that have to do with a business directly.  Press releases, standards and mergers.
    • People
      • News articles about individual people.

I think business and

I think business and government should be the same thing amirite?

You can edit your original

You can edit your original comment instead of adding more.

Each comment can be like a mini web page onto itself.

Individualized Karma

Hello everyone,

Alex and I were just chatting on IM about this newsing system here, and the relative merits of existing sites like Slashdot and Digg, and I mentioned an idea I had a while back that he asked me to post here. Here's a slightly cleaned up and elaborated version of what I said:

"Basically, you could register with the system that you like or dislike a certain post (article, comment, etc), and you could register that you like or dislike a certain person. Your ratings on stories/comments count more for people who like you and less for people who dislike you, and the ratings of people you like count more for you, and those of those you dislike count less for you.


So posts by people who are my friends, or friends of my friends, or friends of their friends, and so on, appear "modded up" to me (less so for greater degrees of separation); as do the posts which those people mod up (and again, a friend of a a friend of a friend's mod points count less, for my view of the site, then my close fiends' mod points, which count still less than my own). Likewise, my own down-mods make things practically vanish from my view of the site, but the down-mods of my friends' friends' friends only slightly lower a post's score. And conversely, my enemies' ratings count for little; my friends' enemies' ratings count for even less, and so on.


Now that I think about it, I'm not sure that there should be any down-modding at all. Let all users and posts have a default score of 0; if you like it, it gets a +1 in your view, if someone you like likes it it gets +0.5 in your view, if someone they like likes it it get a +0.25 in your view, etc; and then individual users can set their preferred signal to noise threshold anywhere above 0. Maybe weight and normalize it somehow so that all scores fall in the 0-1 range (that way things you personally like always show up).


For users who aren't logged in (if such people will be able to see anything at all), I'd recommend having a Guest user account which is automatically friends with everyone, thus the overall most popular stuff shows up by default. If you register and log in, you can tailor the kind of news and comments you want to see. (It just occurred to me that this would also allow one site to by completely free-range in its topics, and for readers to still see only the news that interest them; so tech people don't have to be bothered by sports news and vice versa).


The only real downside to this I can see is that is would tend to encourage groupthink for most people. However I like to mark intelligent, interesting people as my friends, even when I don't agree with them. If I have an interesting debate with them, they're on the friends list. So, people who want to avoid groupthink will still see dissenting viewpoints. Also, people who like people like that will tend to see dissenting viewpoints as well; I may be popular with a bunch of libertarians and popular with a bunch of socialists, and even though those two groups might otherwise never mark each other as friends, since they're both friends with me, I'll bridge the gap between the two groups, and some socialists will see some things that my libertarian friends like, and some libertarians will see some things that my socialist friends like."

That's pretty awesome

Do you have the skill to implement something like this?  Friend distance and the like?  WHat kinda specialist skills will you need to get going on it?

 

Also, as a fan of invasions and trolling I'd recommend against giving the guest account sociometric notations, 'cause it opens the entire site up for manipulation without requiring a login.  All you'd have to do is hammer a few nodes from a few dozen IP addys and suddenly those nodes would be important and at the top of every list for everyone.

 

But me is noob so maybe not? 

Update/Revision

I couldn't code my way out of Hello World (though I can do predicate logic in my sleep). So, I can't help at all with the implementation of this, sorry. (Story of my life.. all design, no implementation).

As for the Guest account, I wasn't suggesting that it should have any ability to post or to rate other posts or other accounts. Rather, I was just saying that what guest users should see on the main page, rather just seeing all posts with the exact same score (which would be very low signal-to-noise ratio for a large popular site; which, interestingly, would begin to naturally limit its popularity), the guest account should see the "average" moderation of all posts, and an easy way to accomplish this is to have every user on the system automatically considered a friend by the guest account (thus, the opinions of all users are accounted for in how Guest sees things modded, and the opinions of users who are liked by lots of other users weigh even heavier). Though the converse of course should not be the case, because if it were then everybody would be at most two hops away from everybody else, being a friend of their friend "Guest").

Hmm... now that I think about it, something like this would probably have to carry over to registered users too, otherwise as soon as you registered account, but hadn't yet marked anyone as your friend, all posts on the system would have the exact same score, and your signal-to-noise ratio would plummet. If you just made all new accounts automatically friends with everyone, then the all-positive system becomes a much more negative one, as you customize your view by marking people as no longer your friends. Though I suppose your s:n ratio would quickly go back up as you up-modded posts that you like. Ah, hmm... maybe having all users automatically friends with Guest is a good idea (the "converse" I just argued against at the end of the last paragraph), that way new accounts still see a 50% diluted effect of popular opinion on the moderation in their view (since they're not directly friends with everybody, but one of their friends, Guest, is). People who don't want popular opinion to have any sway over how things get moderated in their view can always make Guest no longer their friends.

This somehow reminds me of the second half of the idea, which I had forgotten earlier. I'm even less sure how to go about implementing this (and I'm not thinking very clearly right now since it sounds like something is exploding in my living room), but I was also thinking... what people's relationship with you didn't just influence how much their moderation impacts you, but their moderation automatically impacts their friendship status with you?

Ah yes, now I remember this... instead of explicitly marking someone as a Friend of Foe, you only mod articles. People who modded an article the same direction as you get their relationship value to you strengthened. So if I found this article interesting and up-mod it, then everyone who agrees with me (by also up-modding it) becomes slightly more my Friend in the eyes of the system; and thus, articles that they also like get up-modded more in my view. I imagine that this is much how sites like Amazon.com do recommendations: "people who bought this product also bought...". (Anecdotally, I recently received a special email from Amazon about some special collection of artsy gay/lesbian movies they have, which I presume I received because I bought both the Rocky Horror Picture Show and Hedwig And The Angry Inch from them, which of course must mean I'm gay since they're both psychadelic musicals starring a transvestite or transsexual, and only gay men like musicals ;) ). So, you're more likely see articles that people who share your interests like.

And on an entirely different subject, something that goes without mention to me but apparently others disagree: it should be completely threaded, with no arbitrary depth limit. Replies to posts just get nested under whoever they're replying to, let threads trail out as far off as they want to. (And collapse all threads which are entirely below the user's designated threshold, so they don't take up page space). Slashcode and WebBBS are good examples (here's a WebBBS forum that I admin for an example: http://forums.bungie.org/story/), though both of them get some funny formatting issues when threads get way too long. Slashcode I think handles it better.

When comment vote ups are

When comment vote ups are worth gold stars, you guys are going to make bank on these.

making friends by modding: StumbleUpon and Last.fm make friends lists for you based on your shared site/music taste, that's a great idea. 

Threads: Drupal is already unlimited threading. One way to improve it would be threading of nodes under other nodes in a way that's more transparent and easy-to-use than book outlining. (long story but that's how the menu navigation works for the main site content). 

 

Performance nightmare

Even in the simplest screnario, the queries required to do that are going to be pretty crazy. Adding features like non-linear rating and normalization are going to make it much worse, because all the data that needs to be fetched and all the calculations made will have to be done once for each user, and many of them will be done again to get the final result. Ratings from people not on my friends list will be a must, and that can't even be optimized with an index.

 

Drupal already runs slow enough, and that type of rating system will run slowly regardless of the software used.

 

Users who likes this $FOO also liked...

I'm curious, how do places like Amazon do their recommendations, then? (That is, their "users who bought this product also bought..."). Or is that just more feasible because there are fewer purchases per user on a store than there are posts per user on a forum/blog?

I was just describing this idea to another friend of mine, and I think I'm thinking a little more clearly today and described it a bit more straightforwardly than last time. Here is a transcript:

Basically, all articles start out with a score of 1, and can be modded up or down exactly once by each user. Each user has a relationship value (by default 1) with every other user. When two users mod an article the same way, their relationship value goes up by some factor, and when they mod it differently their relationship value does down by the same factor (e.g. like modes double your relationship value, unlike mods halve it). The final moderation value a given user sees is not the raw score of that article (upmods minus downmods), but the sum of upmods and downmods times the relationship values of the moderators. And every user of course has a personally chosen threshold, where they only see posts with a relative score (that is, the score as they see it, factoring in their relationships) higher than their threshold. The end result being, you can have one giant forum covering every topic under the sun, with as many trolls and crap as you want, and your moderation of articles as good or bad will influence how likely you are to see or not see more articles like that, calculated on the basis that people who liked articles you liked also liked these articles.


[my friend asked about handling of topics]


Well I was mostly thinking, like, people on Slashdot sometimes complain that the site has so much politics now, getting all up in their tech news. With my moderation system, they'd see all the politics topics as modded below their threshold. It this were implemented in a general news "blog" (ugh I hate that word), then the news topics you see would be the ones you've shown the most interest in in the past; and the guest-user would see the news topics that people in general find most interesting (i.e. things modded by their raw scores). And related topics would automatically sort of group together. Like, if you're only really interested in computer tech news, so you only upmod computer tech articles... but most people interested in computer tech are also interested in general science articles... then you'll see some science creep into your news, since most tech people like science news too. And if you in fact don't like science news and mod it down, that will weaken your relationship to the tech people who also like science news, and you'll see less science news.


End of transcript.


I was thinking earlier about your concerns about efficiency, and wondering how this might be implemented more efficiently, and this idea came to me:


Rather than just a numeric value associated with each article being its raw score, each article has table associated with it, with a list of the people who've modded it in one column and a +1 or a -1 in the other. The sum of those +1s and -1s (plus the initial 1) is the article's raw score. Then, each user has associated with it a table of people who've modded the same articles as him, and their associated relationship values. The final, relative score a user sees is the sum of the raw moderations times any relationship values with matching usernames (ergo if nobody I have any non-neutral relationship with has modded this article, I just see it's raw score).


Would that be efficient enough? It gets rid of the friends-of-friends-of-friends issue, has an underlying raw score similar to how I understand Digg works (which would be what guest users would see), and still accomplishes filtering the articles a user sees by his past taste in articles.


And if it's not efficient enough, I'll just ask again: how do Amazon et al do it?

How amazon does it

There are several reasons why it is much easier for amazon to do this than for us to do it. 1. Amazon's system isn't very helpfull. I allways get better results by searching. 2. As you mentioned, much smaller amount of data per user. Few people write reviews on amazon. Perhaps a better example would be newegg. Their recomendations appear to be based on shopping carts only, and their ratings are a simple average of all ratings. 3. Vastly superior hardware. 4. Their entire system is much more streamlined. A basic drupal site, fresh out of the box, will lag on my fastest computer (2ghz,3gb ram), even viewed locally.

 

Basically, the performance issue is a "nightmare" not "impossible." Once our budget per user gets anywhere near amazon's, it will be doable, but still not particularly fast. As a comparison, I can open 10 amazon windows in under 20 seconds. Opening 5 empowerment windows takes about 2 minutes.

 

Since the desired result is better usability of the site through increasing the quality of viewable information, we are better off at this point just watching what we post and making sure we drown in our own spam. For site with 10 million users and hundreds of thousands of dollars in funding, if not millions in funding, what your talking about is pretty much dead on, but for 10 users and no funding, it isn't going to happen. Yet.

implementation

"Rather than just a numeric value associated with each article being its raw score, each article has table associated with it, with a list of the people who've modded it in one column and a +1 or a -1 in the other..."

Thats pretty much what I was thinking, only you could do without the  -1s. Thats what I was calling the simplest case. Performance could be helped by moving the weighting calculation off site. That is, another server grabbs the viewer's weights and all the ratings for that comment, then does the math, and drupal gets the results, thus bypassing the massive overhead of drupal. Then for each comment, a second server does 2 queries and drupal does one (to the other server). That wouldn't break anything, but it would also be linear in nature, meaning that it would still have to have a little math done to make the ratings usefull, though that could also be done on a different server.

What I don't know is how well drupal can utilize that kind of information when it isn't built into the node/comment's weight. 

Logarithmic Scoring

Slashcode limits comments to a score of at most +5. I thought there was something about this in their FAQ, but failing to find it now, I'll just cite from memory that it was something to do with keeping groupthink from modding some posts through the roof, and also to keep people from wasting mod points on things that are already highly rated.

I'd like to suggest that since the system I've been promoting presumes that anybody can upmod any post exactly once (and can upmod as many posts as they feel like), if the things that prompted Slashdot's 5-point cap are of conern, then the final score use to determine whether a post is above a user's threshold should be a logarithmic function of the actual, raw number of points that post has. That is, adding more to the raw score of something which is already scored very highly has less impact than adding more to the raw score of a less highly rated post. For example, if you have posts A, B, and C, with C having 10x the raw score of B which in turn has 10x the raw score of A, then while B should appear nearly 10x as highly rated as A, C shouldn't appear as 10x the score of C.

Of course, if all posts are weighted or normalized somehow to a 0-1 range (which I'm still not quite sure on how to do; things are still exploding here), then this may not be necessary at all, but it seems a better solution than an arbitrary cap like Slashdot has.

That is simple to

That is simple to implement.

The data required are

x=the total positive ratings

m=the total positive ratings of the highest rated item

f=the groupthink factor, from 1 to 0 with 1 being linear and smaller numbers favoring items with few ratings

w=the wieght to apply to the item with the highest rating. 20 is the highest that makes any sense in relation to the drupal wieghting system. This is the "cap"

 

the formula is

w(x/m)^f

Jellybeans should have their own category

Because of their duality as a candy and a bean, they deserve their own special placement. Not many foodstuffs can make this claim, much less, back it up!

In a word--Filters

My ideal newsing system would allow me to filter things for search and storage by a wide variety of filters.

 

And if we're gonna get really IDEAL about it, maybe the filters could include regular expressions.

 

It would just help if you could sort out mainstream media vs. indy, left vs. right, categorize links to blogs, forums, wikis, etc. separately, so they could be included in the results or omitted, as determined by your filter preferences.

 

Just my two cents. When it comes ot news, I like to hear all the sides I can, but there are frequently an overwhelming number of sides--thus, any tools and tricks we can develop to sort, compare, and rate our information and its sources are good news in my book.

 

-cid

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.