User:Econterms/Report from WikiSym/OpenSym 2013

From Wikimedia District of Columbia
Jump to navigation Jump to search

Draft blog about WikiSym/OpenSym 2013

WikiSym is an annual conference on academic research about wikis and other kinds of open collaboration. As in past years some of the research is fascinating. This time I happily identified myself as a member of Wikimedia DC on my name bad. Here are some topics and findings I found interesting. Most of the full papers are linked from the conference proceedings, online here.

Sources referred to by online encyclopedias
  • We saw an analysis of sources cited in English Wikipedia in footnotes. Scholarly publications are cited less than in a traditional encyclopedia. Large fractions of references are to primary sources; and to from "alternative" publishers, governments, and nonprofits. They commented on global South geography. Heather Ford, David R. Musicant, Shilad Sen, Nathaniel Miller: (the paper, online)
  • The giant Chinese online encyclopedia, Baiku Baide, is interestingly similar to the Chinese Wikipedia, and there are a spectrum of differences, e.g. that submissions to Baidu Baike are reviewed by Baidu employees before they appear in it, and that different contents from the two sites have been blocked, censored, and removed at different times. Han-Ten Liao is writing a dissertation comparing the two. He showed tables of what sources they cited. BB seems to include a lot of text copied from Wikipedia. Both have a lot of copyright violations. Here is an abstract: [1] I Earlier findings from these comparisons: [2] and [3]. Inspired by learning about BB, I asked students in our Wikimania-arranged dorm about the two and got interesting and different answers. This is a rich subject.
Data versioning and open access
  • Think about what a version control system for datasets should be like. It's different from a source code version control system because for example data sets may be very large and may change in so many places from version to version that they are too hard to compare realistically. Sowe and Zettsu implemented a way of "curating" data sets with a wiki that points ot the data. Here "data curation" means collecting, tending, organizing validating, annotating and preserving data for reuse and sharing. They implement their "model" on a MediaWiki in which a description of the data ("metadata") are on the wiki, and it links to the data itself, and the individuals doing this have wiki-histories and reputations. They've implemented this for their laboratory's disaster-response research which can use diverse kinds of data sets ; weather, industry, geospatial, satellite, population, media, and others.
  • Computational biologist Philip Bourne spoke in a plenary session on the challenges of open science. experiment making a PLOS publication that also went right to wikipedia. Discussed how a scientific paper could or should be associated with easy access to its data and executable versions of its statistical analysis and graphs. This subject came up other times at the conference. It implies a set of steps beyond open data toward open and reusable data and analysis. We're not close to making this easy to implement; it's a bit like making a movie for each scientific paper, which also includes its footnotes. He is the co-founder and founding Editor-in- Chief of the open access journal PLOS Computational Biology which, he said, is publishing 30,000 articles this year and is by this measure the largest academic journal in the world. (Bourne's slides)
"Reverts" on Wikipedia -- these are edits on wikipedia that undo a string of previous edits
  • Geiger and Halfaker analyze the sources of "reverts" on Wikipeda -- . Most reverts are designed to maintain quality against vandalism and errors. The authors show that ClueBotNG is the quickest and most active mechanism -- usually acting against vandalism within 20 seconds if it will act at all -- and discuss the spectrum of other bots and tools and human behaviors that cause reverts. ClueBotNG was down several times for days in 2011, and they analyze how many reverts occurred in those periods. They conclude in essence that the same quality control was exercised in those periods, but more slowly, and they discuss how slowly. http://opensym.org/wsos2013/proceedings/p0200-geiger.pdf