User:James Hare/Wikiparty

Wikiparty—a workflow for edit-a-thons and editor training metrics

A request for proposal.

Background
Since October 2013, Wikimedia DC has collected minimal information from those attending its edit-a-thons—workshops dedicated to editing Wikipedia and learning how it works. Only Wikipedia usernames are collected; no other piece of personally identifiable information is collected. We collect this information to determine the outcome of a particular edit-a-thon, but also for long-term analysis of who shows up at our edit-a-thons and whether a participant continues to edit after our edit-a-thons. This information collection apparatus is new, and only basic analysis of event outcomes and attendance has been done, but even this basic information has been illuminating and has potential for future research.

The most effective way to collect usernames is to ask each individual participant, since not everyone signs up on the event page beforehand and a posted link is not likely to be clicked. This collection process involves a volunteer passing around an iPad with a form loaded on the screen, prompting each participant to fill out the form. The form asks for the participant's username, whether he or she created an account in the past week, and other statistical questions that have asked about affiliation with the host institution, gender, whether they were a student in a class working with Wikimedia DC (i.e., Andrew Lih's class at American University), or where they heard about the event. For compliance purposes, the form also asks if the participant consents to having their Wikipedia activity analyzed.

Beyond the form that each participant fills out is a back-end of different components that do not speak to each other. As such, making the data useful requires significant human intervention. A form, generated for each event, is associated with its own spreadsheet in Google Drive. These form responses are integrated into a central spreadsheet. This spreadsheet includes fields for a unique ID, their username, whether the participant was a newcomer at a Wikimedia DC event, whether they consented to having their edits analyzed, their record of attendance at events, and fields for specific buckets (like whether the participant was a student in Andrew Lih's class). The spreadsheet is prepared by hand. Analysis of these participant usernames is done through tools such as Wikimetrics and Herding Sheep. Imputing data into these respective tools is also done by hand.

There is, simply, no reason for these functions to be separate tools. Wikimedia DC requests the creation of a unified web app that brings these functions together, including data collection, Wikipedia API calls, and basic statistical analysis. This app would be coupled with a streamlined edit-a-thon workflow that would make simplify research efforts at edit-a-thons and make it possible for almost anyone to carry out this kind of research.

Accounts and groups
This tool is intended to store data for Wikimedia DC and for other groups and organizations that may be using the tool. Data should be associated with groups (e.g. Wikimedia DC), and groups should be associated with users (e.g. James Hare). That is to say, the data belongs to the group, and users belong to groups. This allows users to have individual login credentials and, at the same time, allow other people to take over a project should any individual user depart.

Individual login credentials should be based on Wikimedia login credentials, using OAuth if possible. Users are assigned to group by existing group members; the first member of a group would have the responsibility of creating that group. As an optional feature, there could be group administrators responsible for adding and removing people from groups.

Group members can organize events (i.e. edit-a-thons) on behalf of the group, generating forms for that event. Group members have access to the data collected in that group and generate custom analysis buckets.

Data collection
Each participant will fill out a form, either when they enter the edit-a-thon or during the edit-a-thon. If the participant has not created an account yet, a volunteer will help them create a form. The form is intended to be completed by those who currently have Wikipedia accounts.

A form is generated for each particular event. The form at minimum asks for the participant's username and whether they consent to having their Wikipedia activity analyzed (see "consent for analysis" below). The form should also allow the creation of custom form fields according to the group's needs.

Each username has its own profile in Wikiparty and is added to the group-wide database. This group-wide database includes: The participant is also assigned to an analysis bucket for having attended that particular event and can be assigned to additional buckets according to the group's needs (e.g. newcomer vs. experienced editor). The group-wide database allows the group to conduct long-term analyses of participant behavior, including long-term editor retention and repeat attendance at events. Single event-scope analysis buckets allow the group to determine the outcome of a particular event and include the data collected at that particular event. Custom buckets are for studying other trends, including participation by gender.
 * The username (plus a unique identifier)
 * Whether the person was a newcomer at the event (defined as having created an account the day of the event or in the prior week)
 * A record of attendance for each event hosted by the group
 * Other pieces of data collected from them

Consent for analysis
For legal purposes, the phrasing of the consent question must include this text:
 * Sign up here to allow us to learn more about how you use Wikipedia during and after this event, so that we can work towards improving it. By signing up, you agree to let us use this information for these purposes. See the Notice for Opt-In.

The "notice for opt-in" should link to this longer statement:
 * Thank you for participating in today's edit-a-thon! The Wikimedia Foundation and Wikimedia District of Columbia are always trying to learn more about how users interact with Wikipedia and its sister projects, so that we can make the projects better and more fun for people like you. One way you can help us improve the projects and events like this one is to permit us to follow how you (through your username) use our sites during and after this event. For example, by seeing whether you continue to use your account after this event, we can better measure the effectiveness of this event and similar programs.
 * If you choose to help us in this manner, we will group you together with others attending this event who have also elected to help us and then follow your publicly available activity (like your public contributions) to see if you are still enjoying editing the Wikimedia projects after this event. We will not share with third parties any personally identifying information about you, such as your real name, address, email address, date of birth, or phone number, but we may share information about your use of the projects in aggregated or anonymized forms. Please note that your publicly available activity and the information you share with us during the course of this event may be collected, stored, used, modified, communicated, archived, destroyed, or otherwise processed by the Wikimedia Foundation or Wikimedia District of Columbia and may be transmitted to or from the United States and other countries that may not have the same level of privacy regulation that your country does.
 * If you would like to help us improve the projects, just sign your name below in the column entitled “Signature” to indicate your agreement to this policy.

Note that for organizations other than Wikimedia District of Columbia using Wikiparty, "Wikimedia District of Columbia" throughout the text should be replaced the name of the group.

If a participant does not consent to analysis, then the participant's attendance will be recorded in the database but no analysis of Wikipedia edits will occur for that participant.

Analysis
Analysis serves two purposes: to measure repeat attendance of individual participants at events and to measure activity of analysis buckets on Wikimedia projects.

The measurements we are interested in include:
 * Attendance figures. This includes total participation in events, including number of those who were new to Wikipedia when they entered the Wikiparty system. It also includes measurements of repeat attendance, including those who came to subsequent events after coming to their first event as a newcomer.
 * Content contributions measured per event and per quarter, measured in number of positive bytes, number of individual edits, number of articles created and improved, and number of media files uploaded to Wikimedia Commons. A list of articles affected, as well as traffic data for those articles, would be helpful.
 * Cohort comparisons. We are interested in comparing contributions by newcomers to those of experienced editors. Note that a person may attend their first event as a newcomer but as of the next event will likely be counted as an experienced editor due to the definition of "newcomer" we use. It should also be possible to compare custom-designed cohorts, as per above.
 * Longitudinal measures of participation. This includes sustained editing following the events, measured in three- and six-month intervals.

As a demonstration of these analyses (except for the longitudinal measures), see our content programs report from the second quarter of Fiscal Year 2013–14.