Status of this document: Under construction, with additions as they come to mind

Overview

The goal of this general Assessment package is to deliver generic mechanisms that can be inserted into other packages and applications any time they need to collect and manage data from users. Whether we can really pull this off will be interesting to explore. Our current discussion of educational and clinical trials applications provides a great test case to see how well our needs overlap.

Conceptual Issues

  1. A typology of Assessments

    Malte et al. distinguish between "surveys" and "tests" based on whether user data is simply stuffed in the database or whether it is graded first. Structurally, a "survey" Assessment could look identical to a "test" Assessment; what differs is the optional processes that operate on the submitted user data.

    This is a helpful distinction, and is in fact one of the enhancements we built into our Questionnaire package: a data model and UI that allows Questionnaire authors to map questions to "scales" whose scores are calculated from raw user responses. In our case, the "grading" consisted of processes involving 1-n questions; grading in the educational sense more typically would consist of processing involving 1 question. Thus the one situation is a superset of the other.

    There is a third phase that we think is important to make optionally available to Assessment authors: data validation. While is might be acceptable for a "test" to take a wildly wrong answer from a student, for clinical trial data, a user who has typed in a "wrong" answer needs to be notified immediately so that this can be corrected immediately, before any "grading" is done or the database gets stuffed. This feature needs to be optional, though.

    Thus, it appears that the overall structure we're advocating is this, where the various routes through the directed graph determine the nature of the Assessment:

  2. When there is or is not a "right" answer

    Reading through the detailed description of educational question types (and remembering our long ago academic days), a crucial fact is clear: for many or most educational questions, there is a "right" answer. To construct a question, the Assessment author must not only define the question (and optionally question choices), but also designate one or more responses or choices as being "correct". Then during Assessment administration, the user must be allowed to enter a potentially completely bogus answer -- which will simply be declared "wrong" when compared with the "right" one. If the user submits the "correct" answer, there is no reason to store that response in a question_response table, just the fact that the response was correct. If the user submits an "incorrect" answer, then we'd want to be able to store that answer, but it may be a different datatype from the "correct" answer. Thus, a truly stupid student should be allowed to enter a word instead of a number, and the system should accept and store that "wrong" answer. But a smart student who answers the question "correctly" merely needs a boolean record that they answered correctly.

    In contrast, in clinical trials (and probably most other non-academic contexts), questions have no "right" answer. To construct a question, the Assessment author must define the question, optionally question choices, and in most cases range checks or other input filtering checks (both intra-item and inter-item checks). Within those definitions of response acceptability there is no "correct" response. More importantly, the user must not be allowed to enter a potentially completely bogus answer. The whole point of the data collection process is to insure that all data entered is "in expected range" (or "in tolerable range with explanation") even if there is no single "right" answer. Thus, a truly stupid study coordinator should be promptly notified that she entered a word instead of a number, and the system should refuse to accept or store that "wrong" answer. A smart study coordinator who sends in a number that is outside the expected range but stil within a tolerable range should be required to input an explanation/justirication for that value.

    The data model for the academic application must thus include a different form of "range checking" consisting of an "exact range" ie "only this value is the 'right' answer". This does seem like a special case of generic input validation, so that it seems likely that a common mechanism can be devised to handle both cases.

    The data model for acadmic applications seems to require the ability to store arbitrarily wrong answers, though, including wrong data types in the answers. In contrast, in clinical trials (and presumably domains like finance), answers must be strongly typed. It seems that one easy solution would be to include in the "long skinny" question_response table a column for an "is_correct_p" boolean that would flag whether it stores an academic type question that was answered correctly. If "is_correct_p" is TRUE, then no other data need be stored. If "is_correct_p" is FALSE, then probably a clob column would be necessary so that we can stuff whatever garbage the student sent in.

    For the clinical trials setting, we can just ignore those columns in the table.

Requirements Issues

  1. "Interactive grading" vs "annotations"

    In the educational setting, a teacher-user needs to be able to provide feedback to a student ("You're wrong for this reason" eg) either via a textbox the teacher can fill while grading the student's Assessment responses, or via pre-configured responses from which the teacher can choose while grading, or via pre-configured responses from which an automated comparison algorithm generates a response.

    In the clinical trials realm, a user must be able to add annotations to any piece of data--basically any question_response must be linkable with 0-n annotations. If one user says a lab value was "4.5" and then later needs to change it to "4.6", then an annotation explaining why ("Fixed a typo" "Lab now reports calibration error" "I screwed up at first" etc) needs to be attached to the value along with timestamp, user identity (maybe the whole thing hashed into a digital signature etc).

    Though quite different in purpose, it appears that these two requirements are semantically equivalent and the same engineering can accomplish both.

  2. Entity repositories

    The discussion of question catalogues is really important, but the same reasons that make repositories of questions valuable apply to other entities as well: question choices, sections, and assessments themselves.

    For instance, lots of questions get reused in many forms (eg "Gender" -> "M" | "F"). But obviously the "M" and "F" choices will get used in lots of questions as well. Rather than simply duplicate a question choice for each question, it would be better to reuse it (particularly if after the 500th question you use it in you notice a typo that you need to fix). Similarly, lots of sections (or entire forms) may be incorporated in a new form. So IMHO building in a repository mechanism should be done for all the components of an Assessment, not just questions.

    To jump briefly beyond requirements into design, it seems to me that rather than split questions into "regular" ones and "predefined" ones as is the case in the current version of "complex survey", maybe appropriate use of mapping tables (or whatever is the preferred technique in OpenACS5 -- relational_segments? acs_rels? -- I'll get up to speed pretty soon!) to support many-to-many associations between questions and choices, sections and questions, sections and forms, etc will accomplish what we want.

  3. Versioning of Assessments

    One thing that entity repositories can provide if created for all the component entities in an Assessment is the ability to maintain multiple versions of the Assessment. Say a clinical trial needs to collect a form containing 20 elements (ie questions) from patients two months after they enroll in the trial. The trial starts and six months into it, after 150 patients are enrolled and have their two-month data collections in the DB, the prinicipal investigator (or safety committee or someone) decides that an additional 4 elements need to be added and two of the current elements need to be changed. How can that be accommodated?

    And does this kind of scenario show up in educational settings? I presume it can: say Logic 101 gets taught every term, but over the summer the teacher decides that the final exam needs to be changed to include a few new questions and change a few others. If the teacher has to create an entirely new Assessment (perhaps by cloning and modifying the existing one), then all subsequent collected student data are not linked at all to that from prior terms. The teacher won't be able to use the Assessment system to answer the question "Has listening to Brittney Spears made our students more illogical over time?". To answer that, all data for that Assessment can be selected and ordered by metadataversion. How the teacher makes sense out of that is then up to the teacher, but at least the DB can deliver all the relevant data. Without metadataversions, the select will have to involve two different Assessments. That might not seem important until maybe the teachers have made 15 different versions of the Assessment (say it's the required final exam for Econ 101 taught by 15 different teachers) and someone wants to see how Econ 101 students are doing over time throughout the college. A single Assessment's data sorted by metadataversion is easy; selecting over 15 different Assessments is not.

    The CDISC standard uses a central entity called "metadataversion". A given study/assessment can have multiple metadataversions, and all forms, sections, questions etc are tagged with the metadataversion under which they are valid. So to pull up a version of an Assessment, procedures just pull up everything with a particular metadataversion; same with response data.

    To jump into a brief speculation about implementation, it seems that the OpenACS content repository (CR) may give us out of the box what we need here in its support for versioning and maintenance of a "live" version. One requirement that may be different from current CR behavior is that if an earlier version is marked "live", subsequent versions must not be deleted (as happens in ETP anyway -- I don't know if this is something that ETP does or CR does). We'd need to elaborate a bit on CR, though, since there are a variety of info bits we need to store about any given metadataversion (like a description, status -- beyond "live" or not, reason for adding it, etc). So a metadataversion table should itself be a cr_item I guess.

  4. Question formats

    The educational spec identifies several important formats (like matching questions) that don't appear in clinical settings; the Assessment package should accommodate every format we can think of. One that isn't in their docs but that is important in ours (and shows up in other non-clinical settings like the forms I have to complete to get free subscriptions to eWeek and the like) is a "grid" question format.

    See the demo Seattle Angina Questionnaire for an illustration. Here the first block of questions consist of multiple choice radio button questions. All pertain to a single question but constitute different "subquestions". The texts for all the question choices are identical, so rather than complexify the form layout by repeating all the texts, they need to be oriented in a grid of radio buttons. In essence, this is mostly a "presentation_style" issue, but there is one data model modification that is required: the addition of a "question_subtext" column to the questions table to augment the existing "question_text" column.

  5. "Sequencing" aka navigation

    The IMS "simple sequencing" standard is rather complex and confusing (IMHO) and has been discussed in a couple threads here and here.

    In clinical trials, contingent navigation based on submitted data is very important, so we view this requirement to be a crucial one. For instance, a form that asks a patient's gender may branch tinto different locations within the form and subsequently to different sets of forms based on the answer to that question. Or more commonly, if the responses to a subset of questions in a form are one way, then a different navigation route is indicated than if the responses were another way. Thus a branch point may hinge not simply on a single question, but on a set of questions -- obviously more of a challenge to implement.

    The current "complex survey" has basic branching capabilities based on a single "hinge" question. In addition, Staffan Hansson has created a sequencing model for .LRN. This pertains to the notion of "curricula" and not specifically to Assessments, which are one type of activity within a curriculum. I haven't studied this model yet to the point of understanding how applicable it is to the specific needs here, but I presume it is quite germane.

    Anyway, although expressed in different fashion, all of our current requirements identify contingent navigation aka sequencing as a vital capability for the package to support.

  6. Data validation

    In the prior section, we compared the issue of "grading" user input against some defined "right" answer to the process of "validating" user input against configurable parameters. It appears that these are probably semantically equivalent. The system must be able to evaluate exact and inexact-but-within-bounds user input.

    For clinical trials purposes at least (presumably other domains), these data validation steps are fairly complex because we need two layers of data validation checks:

    • Intra-item checks: the user input { exactly matches | falls within narrow "target" bounds | falls within broader "acceptable" bounds with explanation}
    • Inter-item checks: if { a user input for item a is A, item b is B, ... item n is N } then { user input for item z is Z }

    Both levels involve stringing together multiple binary comparisons (eg 0 < input < 3 means checks that 0 < input and input < 3), so we need to express a grammar consisting of

    • comparison1 conjunction comparison2 conjunction ... comparison n
    • appropriate grouping to define precedence order (or simply agree to evaluate left to right)

    All this needs to be modeled in RDBMS tables so that during Assessment authoring, an admin user can define these validation checks and store them to be pulled out during data collection. This appears to me to be nontrivial, and I haven't yet spotted a pre-made solution to this; does it exist?