Experimentation

toc

Definition
User study method based on experimentation is fundamentally different from most research methods. Instead of passively assembling data, the researchers in experiemental reseach are active participant in the process. Research experiment is the type of research approach which researchers manipulate one thing (independent variable) to observe that effect the change has on something else (dependent variable) to determine causation relationship between two phenomenon (Jones).

Benefits of Experimentation
>
 * The test results should highlight direct causal weaknesses of the product --> very easy to narrow down design issues.
 * They're most useful when the testing variables are of objective nature and data collected are quantifiable, which are hard to be measured through other subjective means (survey, interview) e.g. testing algorithms for inclusive software.

Limitation of Experimentation
> >
 * High costs: to create these controlled environment involves high costs and it takes quite a bit of time for planning and conducting researches.
 * High failure rate: There's very little room for changes after the experiments are conducted. If variables aren't selected carefully, the results are essential meaningless. This is different from other methods such as surveys and observations, which usually take lesser costs for change (just changing survey prinouts or observation charts).
 * Fake environments: results derived from manipulated environments might not be as helpful as the ones observed in natural contexts (Jones).
 * Events within the market happen due to many influences within and outside the market (political, social, technological, cultural). They could mostly just be correllation, not causal effects.

**Experiment Example: Parallel Programming System**
In 1996, two researchers at University ot Alberta performed an usability experiment to the compare the usability of two PPS systems to determine what features of PPS are useful and beneficial to its users (Szafron).

> >> > > > > > >> >>> >>> >>>>
 * **Systems used for testing**: Enterprise Solution (a more user-friendly program that envolves less code writing) and NMP (Network Multiprocessor Package, a traditional code-driven programming software)
 * **Subjects**: 15 graduate students in CMPUT 507 Parallell Programming class at University of Alberta, who have no parallel programming experience before joining the course. They were randomly assigned to 2 groups: 7 students using Enterprise + 8 students using NMP
 * **Procedure**:
 * Two 50-minute lectures given to entire class on both software + two 20-minute lab demonstrations of each PPS
 * The students have access to the systems for 2 weeks.
 * **Data Collection**:
 * Each student were to submit a 2-page essay commenting on what they liked and disliked about the system that he/she is assigned to work on.
 * The numbers of hours students logged onto the system were carefully recorded
 * The numbers of lines of codes in the solution programs were carefully recorded
 * The numbers of editing sessions.
 * The numbers of compiles
 * The numbers of times the students tested their program by running it.
 * The execution times of their program on data set 2 (the numbers of lines of codes).
 * **Usability Metrics**:
 * 1) **Learning Curve**: the researchers wanted to measure **how long** does it take an expert or novice user of PPS to learn to use the product efficiently.
 * 1) **Programming Errors**: the researchers wanted to find out whether the inhibition of parrallelism function in some systems are indeed helpful for preventing errors.
 * 1) **Deterministic Performance**: the researchers wanted to know whether the **non-determinism feature** in some system is helpful for the overhead in debugging process.
 * 1) **Compatibility with Existing Software**: the researchers wanted to test whether PSP system support the **integration** with existing software.
 * 1) **Integration with Other Tools**: the researchers wanted to test whether the feature for **integrating** with other development tools (e.g. debugging, performance evaluation) is useful for users.
 * Results:
 * NMP students performed 206 compiles --> an incredibly large numbers.
 * Because Enterprise requires lesser code writing, it helps users save more time during coding.
 * However, as Enterprise is equipped with many hidden mechanism for avoiding deadlock, synchronoziation errors, it has the worst run-time performance. Also, during the process, it introduced newer integration problems: overlapping futures problem. --> NMP affected accuracy of the code while enterprise affected efficiency.
 * Usability issue identified:
 * Probability of programming errors
 * Deterministic execution feature and animation feature is helpful for the debugging process.
 * Conceptual difficulty of certain concepts:
 * What really happens during "process startup, process terminatio, and passing pointers between process"?
 * What need to be done when "a process is called the first time, versus when the process is called subsequently"?
 * "How are termination conditions checked and how should the prcesses (assets) exit gracefully?"
 * These important questions should be clarified fully in future documentation for all PPS.

==> **Personal thoughts on this experiment**:

By and large, choosing experiment as the prime usability testing method is the best fit for the design issue as researchers can measure and aggregate the variables objectively. The complexity of the issue would be rather hard to be measured through other means. As briefly noted in the experiment, one limitation to the result collected is that it didn't record users' reaction while interacting with the system. Although some of the information can be deferred from the data (numbers of time spent on ...), the researchers didn't get to record the mental model that users have in mind when completing tasks.As a TA was assigned to monitor the lab, the researchers should have asked the TA to fobserve how students work on those tasks with important metric such as: what questions do they usually ask? what problems do they usually encounter? what description have they made about the system? With that being said, as PPS systems are more of inclusive design nature, which is created to serve a selected group of experienced programming users, perhaps the need for overall usability such as memorability, feedbacks is not as important as metrics central to utitlity, effectiveness and efficiency such as error preventation and error tolerance.