|
|
|
adjustable weights Developing knowledge base metrics (this is long!)
|
|
|
The support.mozilla.com knowledge _base_ is visited every week by hundreds of thousands of users. Following where these users go on the site and what articles they find helpful provides very good data as to what issues our users are facing as well as where we could improve articles to help more users. Towards that end, each troubleshooting article has the following question to which users answer yes or no: ?Did this article solve a problem you had with Firefox?? Historical background At first blush, it seems that counting the number of users who answer ?yes? to this question for a given article will give an indication of how many people have a given problem. However, this gives (for this past week) the following statistics as the top 5: For Internet Explorer Users 833 Installing Firefox On Windows 240 Keyboard Shortcuts 205 Clearing Private Data 167 Clearing Location Bar History 153 How To Make Firefox The Default Browser 133 Options Window 127 Cookies 93 How To Set The Home Page 69 Using Firefox 68 As you can see, it appears that the vast majority of our users just want to read about what we have to say about Internet Explorer or installing Firefox on windows, even though reading through forum posts and Hendrix posts suggests that users are really having trouble with things like the change to the location bar and bookmarks UI as well as bookmarks not saving. A quick look at the way people make it these pages explains why the stats are as skewed as they are: the For Internet Explorer Users article is _link_ed from the Firefox help menu directly and most of the other articles in the top 10 are _link_ed from the in-product help page. This means that traffic to these pages is much higher than to other KB article pages and hence they naturally have more votes. Development of current approach To compensate for vote skewing _base_d on how many users see a page, we would have to divide the number of votes by the number of pageviews. Also, users who voted that ?no, this article didn?t solve a problem? probably had the issue in question but our support article wasn?t able to address it, so to build a metric of which are the most common problems, we should include those votes as well. We can build a votes metric that is Yes votes plus No votes. Doing this gives this list: Essentially all pages which were mis-_link_ed from the English language search results or weren?t properly filed. This is because these results have so few pageviews that their numbers are artificially inflated. Installer Firefox Sur Windows Firefox Unter Linux Installieren Zoom De La Page Installare Firefox Su Linux Bloqueador De Popups Informazioni Su _java_script__ Importation Mistet Bokmerker Innstillinger Suggestions De Recherche We?re going to have to factor the number of pageviews back into the equation somehow. Too many pages have next to no pageviews and that completely skews the numbers. We seem to have come full circle. However, Ominiture is more powerful than this. It can also tell us where people go after they search or visit our front page or inproduct front page. We can combine this data into a single statistic which I?ll (for lack of a better term) call significant pageviews. Someone searching for a problem and landing on a KB article is worth 1 point. Someone clicking on a _link_ from the start page is worth 0.2 (because they find that page when they look for Firefox help), clicking from the inproduct page is only 0.1 point (since it?s an easier page to get to and people may find it even when they don?t have a specific problem. Since basic pageviews are still important (_link_s from outside are included here) we have to factor those in at 0.01 points. All of these factors can be adjusted but these are numbers that serve for illustrative purposes now. We can then build our final metric (score): Article score = (Significant pageviews) * (Total votes) / (Total pageviews) Ranking by article score gives the following: Clearing Private Data 270.8 No Sound In Firefox 249.34 How To Clear Search Bar History 138.54 Clearing Location Bar History 134.8 Hiding Bookmarks In The Smart Location Bar 125.2 Bookmarks 120.85 Bookmarks Not Saved 118.73 Options Window 113.24 For Internet Explorer Users 111.82 Organizing Bookmarks 99.22 This is a lot better (although it?s also apparent that it can use a lot more tweaking since articles like For Internet Explorer Users is still overrepresented). Distinguishing yes and no votes Now, in the above scenarios, we?ve merged yes votes and no votes to figure out which problems users are most facing. What if we wanted to answer the question of which articles are most likely to solve the user?s issue and which ones are most likely to be a waste of users? time? With our new number of ?significant pageviews? we can look mainly at yes votes and no votes. Since a yes vote should count against a no vote and vice versa (a large number of yes votes with a corresponding large number of no votes shouldn?t count more than an article which is predominantly yes votes), we build the following metrics: Most helpful articles: (Significant pageviews) * (Yes votes ? No votes * 0.5) / (Total pageviews) Least helpful articles: (Significant pageviews) * (No votes ? Yes votes * 0.5) / (Total pageviews) Astute observers may wonder why we don?t just ratio yes votes to no votes. The answer is that removes the significance of vote count entirely. An article with just one vote in one direction will dominate over one with hundreds in both. This way of balancing votes is a little more informative in terms of providing the desired information without skewing pages with few votes too heavily and it also has a nice symmetry with the previous method of scoring. This is the current status of the KB metrics collection. Possible improvements The primary improvement that I would like to see is factoring in how people get to a certain page if they?re not coming via searching or a front page. If they?re _link_ing from other knowledge _base_ articles, or the forums, I?d like to score that as high as or higher than search results, whereas if they?re coming from outside the site, I?d score that low. Unfortunately while it is technically possible to distinguish between these possibilities using Omniture data, it is extremely labor intensive as it involves downloading a report for every single knowledge _base_ article. Until there?s a simple way to get a dump of the entire Omniture data set each week, it?ll be very difficult to produce this quality of data. Another improvement is dividing the large and unwieldy knowledge _base_ articles up such that each one addresses a single issue. This way, we can pinpoint exactly which solutions are useful and which should be dropped. The reason the Bookmarks article is so highly ranked is possibly because people aren?t familiar with bookmarks or perhaps they have a problem with some aspect of the bookmarks UI or perhaps they have lost their bookmarks. It?s impossible to tell right now but if we could divide those up or somehow collect more detailed statistics, our data will be a lot better. Adjusting factors. Since there is no conclusive independent ranking of knowledge _base_ articles, it?s impossible to know how accurate this dataset is at predicting what issues users will have. It is possible the factors will need to be adjusted to make a more useful ?significant pageviews? number. The numbers used above were devised _base_d on ?best guess? estimates of the relative number of people who come via each channel with the indicated issue with Firefox. Without hard facts or independent confirmation, that?s as good as it is possible to do at this time. However, if you think the numbers should be significantly different, please say so. We can try all sorts of combinations out.
|
|
|
|
|
|
|
The administrator has disabled public write access. |
|
|
|
adjustable weights Developing knowledge base metrics (this is long!)
|
|
|
Also, users who voted that ?no, this article didn?t solve a problem? probably had the issue in question but our support article wasn?t able to address it, so to build a metric of which are the most common problems, we should include those votes as well. We can build a votes metric that is Yes votes plus No votes. This part could be clarified a bit: Basically, there are three main reasons why someone would vote No on the question Did this article solve the problem you had with Firefox? : a) The article correctly describes the user's problem, but the suggested solution(s) didn't solve it. b) The article does not describe the user's problem
|
|
|
|
|
|
|
The administrator has disabled public write access. |
|
|
|
adjustable weights Developing knowledge base metrics (this is long!)
|
|
|
Hi, first of all, it's great to see someone look into metrics on sumo. Could you detail a bit on what questions you're actually trying to answer with those metrics? I probably just missed the memo. I have a few technical issues, I have to admit. - troubleshooting article What's the definition of that? Like, the For Internet Explorer Users seems to qualify and not in your analysis, in that on the one hand, it has the Yes/No buttons, but on the other hand you're unhappy if it pops up in the analysis. Not sure why it shouldn't. - translated articles There is one section, where you come up with a bunch of article names that look like translations. For some metrics, translations and their en-US original should probably counted as one. Unless you're trying to evaluate translation quality for a particular page. - statistical relevance You're introducing some complex metric with adjustable weights, and one of the reasons you cited was to work around the noise of non-popular pages. There are statistical measures like variance for that. How about cutting off the data by the variance at some point and using a simple measure? - weights in the complex measure I didn't get those, at all. Like, why would a user that finds the page he's looking for because we've set up a good navigation path be less significant than a search result? - no answers I think that no answers are different answers than yes . There are various steps from the problem to a troubleshooting article, and if you're looking at the wrong page, you might say no . That is more an answer on search results or other navigation, though, and not that much of an answer on whether that article gives a good answer to the problem it's about. Yeah, sorry, loads of no s. Depending on which question you're trying to answer, it might be a nice other project to analyse what users are searching for when hitting a troubleshoot article, or even more, if there are peaks in searches which we don't have answers for or the like. Axel
|
|
|
|
|
|
|
The administrator has disabled public write access. |
|
|
|
adjustable weights Developing knowledge base metrics (this is long!)
|
|
|
Axel Thanks for your feedback and as I've said, there's a ton of room for improvement. If you have any suggestions for other data that we can look at and how we can incorporate them, I'd really appreciate hearing about them.
|
|
|
|
|
|
|
The administrator has disabled public write access. |
|
|
|
adjustable weights Developing knowledge base metrics (this is long!)
|
|
Hi, first of all, it's great to see someone look into metrics on sumo. Could you detail a bit on what questions you're actually trying to answer with those metrics? I probably just missed the memo. The most basic question: what issues/problems with Firefox are our users facing most commonly? More specifically, we want data that we can then take to the greater mozilla community as a form of feedback as to what issues are most prevalent among USERS (not technical folk who understand bugzilla). There was no memo, it's my fault for not making the purpose of this project clear at the outset. I have a few technical issues, I have to admit. - troubleshooting article What's the definition of that? Like, the For Internet Explorer Users seems to qualify and not in your analysis, in that on the one hand, it has the Yes/No buttons, but on the other hand you're unhappy if it pops up in the analysis. Not sure why it shouldn't. I don't think it should qualify because it doesn't fix a problem but provides information. In that sense it doesn't actually help us identify issues with Firefox. More importantly, I highly doubt that that many users are coming to support because they have a problem with migration from IE, it's more likely that they click on Help for IE users from the Help menu in Firefox in windows, see the poll at the bottom and just vote. _base_d on support questions to the forums, livechat, almost no one is looking for a feature in IE that they don't know how to get in Firefox. - translated articles There is one section, where you come up with a bunch of article names that look like translations. For some metrics, translations and their en-US original should probably counted as one. Unless you're trying to evaluate translation quality for a particular page. I'm working off article names and none of my raw numbers track which English article each foreign language article goes to. It's by far much easier to filter on URL and drop hits on foreign language pages altogether. This however lets some article hits to slip through for various tikiwiki bugs and thus you see a lot of articles with really few pageviews. More importantly, the point I was making in that section is that articles with pageviews = 1 but more than one vote (basically articles where omniture screwed up the local pointing in the URL tracking due to a tikiwiki bug) are going to dominate if we only consider votes-per-pageview so we have to adjust for it. Basically that statistic no longer measures how common certain issues are but rather how _infrequently_ a given page is seen. - statistical relevance You're introducing some complex metric with adjustable weights, and one of the reasons you cited was to work around the noise of non-popular pages. There are statistical measures like variance for that. How about cutting off the data by the variance at some point and using a simple measure? I'm not sure how variance accounts for this. On what metric do I calculate variance? This isn't an issue with statistical or random noise, this is a problem with user flow. For example, the article on Options and the article on For Internet users can be accessed directly from Firefox with the user not having to go via the main support site or having a specific problem in mind. While we make the assumption that users who vote on a page are going to have attempted the solution on that page, for pages where there is no explicit problem or explicit solutions, that doesn't necessarily apply. We have to divide out the huge statistical anomalies. As for statistical relevance, this is data mining and it's always always possible to over interpret data and assign it higher relevance than it may actually have. The most concrete conclusions you can draw are which articles represent problems that users are most likely to face/look for in a rough, ranked sense. Just because one art icle is ranked 12 and one is 13 doesn't mean that by another metric it shouldn't be the other way around but the issue addressed in either of those is probably less commonly seen than for the article ranked 2 and more commonly seen than for the article ranked 50. - weights in the complex measure I didn't get those, at all. Like, why would a user that finds the page he's looking for because we've set up a good navigation path be less significant than a search result? Because users aren't that good. Without doing a search, they're not exposed to the full list of possible articles. Users who just click on one of the ten _link_ed articles from the front page are less likely to have picked the article that best describes their problem from the pool of hundreds. They may have picked the best article that describes their problem from a list of ten but that's not as informative. Users who go to the inproduct help page are less likely to be needing help than users who went to google or the mozilla site, searched for firefox support and got to our main support site, so inproduct users are ranked even lower. - no answers I think that no answers are different answers than yes . There are various steps from the problem to a troubleshooting article, and if you're looking at the wrong page, you might say no . That is more an answer on search results or other navigation, though, and not that much of an answer on whether that article gives a good answer to the problem it's about. Yeah, sorry, loads of no s. Djst answered this question above. If the article didn't solve the problem but the page is correct or at least the user has a strong suspicion that the page will be helpful, they probably have the issue in the _title_ or some close variant and should be counted. Our KB is not comprehensive and often we don't include solutions for specific issues if they're deemed too technical or the likelihood of another solution working is much higher. People who have no better article for their problem and vote no _should_ be counted. People for whom there is a better article and found the wrong one _should not_ be counted. We hope that with redirection inside the article (if you have XYZ, see this article instead) users are more likely to find and vote on the article that best describes their problem and so no votes should carry significant weight (perhaps not as much as a yes vote but definitely more than half) Depending on which question you're trying to answer, it might be a nice other project to analyse what users are searching for when hitting a troubleshoot article, or even more, if there are peaks in searches which we don't have answers for or the like. I agree, looking at search terms and paths to find things was actually my first idea for collecting metrics. Unfortunately Omniture doesn't track that (it chops off all the parameters that are pushed to a page in the URL) and the Google search term tracking doesn't provide a lot of data, just a list of top ten search terms with no numbers, timeline or page targets. Looking at where users go after they search is currently the best hard statistic we've got, which is why I gave it the strongest rating. Axel Thanks for your feedback and as I've said, there's a ton of room for improvement. If you have any suggestions for other data that we can look at and how we can incorporate them, I'd really appreciate hearing about them. _______________________________________________ support-planning mailing list
This e-mail address is being protected from spam bots, you need JavaScript enabled to view it
https://lists.mozilla.org/listinfo/support-planning This might be a bit off from the information we're trying to gather currently, but I'd love to see stats like users who visited x article were most likely to click yes on y article. This is along the lines of studying the paths users take. In theory, x should always = y but if lots of people are ending up on one article and then following it to another we'd want to look at why and see about getting people to the correct article first.
|
|
|
|
|
|
|
The administrator has disabled public write access. |
|