User:Econterms/Report from WikiSym/OpenSym 2013
Draft blog about WikiSym/OpenSym 2013
WikiSym is an annual conference on academic research about wikis and other kinds of open collaboration. As in past years some of the research is fascinating. This time I happily identified myself as a member of Wikimedia DC on my name bad. Here are some topics and findings I found interesting. Most of the full papers are linked from the conference proceedings, online here.
- Sources referred to by online encyclopedias
- We saw an analysis of sources cited in English Wikipedia in footnotes. Scholarly publications are cited less than in a traditional encyclopedia. Large fractions of references are to primary sources; and to from "alternative" publishers, governments, and nonprofits. They commented on global South geography. Heather Ford, David R. Musicant, Shilad Sen, Nathaniel Miller: (the paper, online)
- The giant Chinese online encyclopedia, Baiku Baide, is interestingly similar to the Chinese Wikipedia, and there are a spectrum of differences, e.g. that submissions to Baidu Baike are reviewed by Baidu employees before they appear in it, and that different contents from the two sites have been blocked, censored, and removed at different times. Han-Ten Liao is writing a dissertation comparing the two. He showed tables of what sources they cited. BB seems to include a lot of text copied from Wikipedia. Both have a lot of copyright violations. Here is an abstract:  I Earlier findings from these comparisons:  and . Inspired by learning about BB, I asked students in our Wikimania-arranged dorm about the two and got interesting and different answers. This is a rich subject.
more notes: users of chinese wikipedia often declare openly on users pages their political leaning, from procommunist to profalun gong to han supremacist to turkmen-independence favoring BB has no power users from HK or Taiwan. CW has a lot. BB has one simplified character set, i believe he's saying, and CW allows several character sets most cited sources in each: he can list the most-cited web sites by each. many "book review" sites and spam web sites. ("book review" seems to mean a kind of spam site; not clear.) CW cites "bioinfo.cn" with very high frequency. oh, that's biology, not biographical BB seems to be much larger based on numbers of citation
- Data versioning and open access
- Think about what a version control system for datasets should be like. It's different from a source code version control system because for example data sets may be very large and may change in so many places from version to version that they are too hard to compare realistically. Sowe and Zettsu implemented a way of "curating" data sets with a wiki that points ot the data. Here "data curation" means collecting, tending, organizing validating, annotating and preserving data for reuse and sharing. They implement their "model" on a MediaWiki in which a description of the data ("metadata") are on the wiki, and it links to the data itself, and the individuals doing this have wiki-histories and reputations. They've implemented this for their laboratory's disaster-response research which can use diverse kinds of data sets ; weather, industry, geospatial, satellite, population, media, and others.
- Computational biologist Philip Bourne spoke in a plenary session on the challenges of open science. experiment making a PLOS publication that also went right to wikipedia. Discussed how a scientific paper could or should be associated with easy access to its data and executable versions of its statistical analysis and graphs. This subject came up other times at the conference. It implies a set of steps beyond open data toward open and reusable data and analysis. We're not close to making this easy to implement; it's a bit like making a movie for each scientific paper, which also includes its footnotes. He is the co-founder and founding Editor-in- Chief of the open access journal PLOS Computational Biology which, he said, is publishing 30,000 articles this year and is by this measure the largest academic journal in the world. (Bourne's slides)
- Australian law professor Anne Fitzgerald explained recently adopted licensing rules for the data and publications from Australian government's statistics and geography agencies. After careful review she and others on a committed recommended against adopting a public-domain rule (like the U.S. government's) and in favor of a Creative Commons noncommercial attribution copyright (CC BY NC). If I understood correctly, this was desirable to help the government control commercial sale of its publications. Open access and copyrights on government work were actively discussed and debated at WikiSym and Wikimania too. For more follow up here.
- "Reverts" on Wikipedia -- these are edits on wikipedia that undo a string of previous edits
- Geiger and Halfaker analyze the sources of "reverts" on Wikipeda -- . Most reverts are designed to maintain quality against vandalism and errors. The authors show that ClueBotNG is the quickest and most active mechanism -- usually acting against vandalism within 20 seconds if it will act at all -- and discuss the spectrum of other bots and tools and human behaviors that cause reverts. ClueBotNG was down several times for days in 2011, and they analyze how many reverts occurred in those periods. They conclude in essence that the same quality control was exercised in those periods, but more slowly, and they discuss how slowly. http://opensym.org/wsos2013/proceedings/p0200-geiger.pdf