How many 3 digit numbers can be formed with the digits 12345 if there can be repetition

Ex 7.1, 1 How many 3-digit numbers can be formed from the digits 1, 2, 3, 4 and 5 assuming that [i] repetition of the digits is allowed? 3 digit number : Number of 3 digit numbers with repetition = 5 × 5 × 5 = 125

Show More

Solution : [i] When repetition of digits is allowed:
No. of ways of choosing firsy digits = 5
No. of ways of choosing second digit = 5
No. of ways of choosing third digit = 5
Therefore, total possible numbers `= 5 xx 5 xx 5 = 125`
[ii] When repetition of digits is not allowed:
No. of ways of choosing first digit = 5
No. of ways of choosing second digit = 4
No. of ways of choosing thrid digit = 3
Total possible numbers `= 5 xx 4 xx 3 = 60`.

This is the implementation guide for human clinical trials corresponding to version 1.7 of the CDISC Study Data Tabulation Model.

Revision History

DateVersion2018-11-203.3 Final2013-11-263.2 Final2012-07-163.1.3 Final2008-11-123.1.2 Final2005-08-263.1.1 Final2004-07-143.1

© 2018 Clinical Data Interchange Standards Consortium, Inc. All rights reserved.

Contents

  1. 1 Introduction
    1. 1.1 Purpose
    2. 1.2 Organization of this Document
    3. 1.3 Relationship to Prior CDISC Documents
    4. 1.4 How to Read this Implementation Guide
      1. 1.4.1 How to Read a Domain Specification
  2. 2 Fundamentals of the SDTM
    1. 2.1 Observations and Variables
    2. 2.2 Datasets and Domains
    3. 2.3 The General Observation Classes
    4. 2.4 Datasets Other Than General Observation Class Domains
    5. 2.5 The SDTM Standard Domain Models
    6. 2.6 Creating a New Domain
    7. 2.7 SDTM Variables Not Allowed in SDTMIG
  3. 3 Submitting Data in Standard Format
    1. 3.1 Standard Metadata for Dataset Contents and Attributes
    2. 3.2 Using the CDISC Domain Models in Regulatory Submissions — Dataset Metadata
      1. 3.2.1 Dataset-Level Metadata
        1. 3.2.1.1 Primary Keys
        2. 3.2.1.2 CDISC Submission Value-Level Metadata
      2. 3.2.2 Conformance
  4. 4 Assumptions for Domain Models
    1. 4.1 General Domain Assumptions
      1. 4.1.1 Review Study Data Tabulation and Implementation Guide
      2. 4.1.2 Relationship to Analysis Datasets
      3. 4.1.3 Additional Timing Variables
        1. 4.1.3.1 EPOCH Variable Guidance
      4. 4.1.4 Order of the Variables
      5. 4.1.5 SDTM Core Designations
      6. 4.1.6 Additional Guidance on Dataset Naming
      7. 4.1.7 Splitting Domains
        1. 4.1.7.1 Example of Splitting Questionnaires
      8. 4.1.8 Origin Metadata
        1. 4.1.8.1 Origin Metadata for Variables
        2. 4.1.8.2 Origin Metadata for Records
      9. 4.1.9 Assigning Natural Keys in the Metadata
    2. 4.2 General Variable Assumptions
      1. 4.2.1 Variable-Naming Conventions
      2. 4.2.2 Two-Character Domain Identifier
      3. 4.2.3 Use of "Subject" and USUBJID
      4. 4.2.4 Text Case in Submitted Data
      5. 4.2.5 Convention for Missing Values
      6. 4.2.6 Grouping Variables and Categorization
      7. 4.2.7 Submitting Free Text from the CRF
        1. 4.2.7.1 "Specify" Values for Non-Result Qualifier Variables
        2. 4.2.7.2 "Specify" Values for Result Qualifier Variables
        3. 4.2.7.3 "Specify" Values for Topic Variables
      8. 4.2.8 Multiple Values for a Variable
        1. 4.2.8.1 Multiple Values for an Intervention or Event Topic Variable
        2. 4.2.8.2 Multiple Values for a Findings Result Variable
        3. 4.2.8.3 Multiple Values for a Non-Result Qualifier Variable
      9. 4.2.9 Variable Lengths
    3. 4.3 Coding and Controlled Terminology Assumptions
      1. 4.3.1 Types of Controlled Terminology
      2. 4.3.2 Controlled Terminology Text Case
      3. 4.3.3 Controlled Terminology Values
      4. 4.3.4 Use of Controlled Terminology and Arbitrary Number Codes
      5. 4.3.5 Storing Controlled Terminology for Synonym Qualifier Variables
      6. 4.3.6 Storing Topic Variables for General Domain Models
      7. 4.3.7 Use of "Yes" and "No" Values
    4. 4.4 Actual and Relative Time Assumptions
      1. 4.4.1 Formats for Date/Time Variables
      2. 4.4.2 Date/Time Precision
      3. 4.4.3 Intervals of Time and Use of Duration for --DUR Variables
        1. 4.4.3.1 Intervals of Time and Use of Duration
        2. 4.4.3.2 Interval with Uncertainty
      4. 4.4.4 Use of the "Study Day" Variables
      5. 4.4.5 Clinical Encounters and Visits
      6. 4.4.6 Representing Additional Study Days
      7. 4.4.7 Use of Relative Timing Variables
      8. 4.4.8 Date and Time Reported in a Domain Based on Findings
      9. 4.4.9 Use of Dates as Result Variables
      10. 4.4.10 Representing Time Points
      11. 4.4.11 Disease Milestones and Disease Milestone Timing Variables
    5. 4.5 Other Assumptions
      1. 4.5.1 Original and Standardized Results of Findings and Tests Not Done
        1. 4.5.1.1 Original and Standardized Results
        2. 4.5.1.2 Tests Not Done
        3. 4.5.1.3 Examples of Original and Standard Units and Test Not Done
      2. 4.5.2 Linking of Multiple Observations
      3. 4.5.3 Text Strings That Exceed the Maximum Length for General-Observation-Class Domain Variables
        1. 4.5.3.1 Test Name [--TEST] Greater than 40 Characters
        2. 4.5.3.2 Text Strings Greater than 200 Characters in Other Variables
      4. 4.5.4 Evaluators in the Interventions and Events Observation Classes
      5. 4.5.5 Clinical Significance for Findings Observation Class Data
      6. 4.5.6 Supplemental Reason Variables
      7. 4.5.7 Presence or Absence of Pre-Specified Interventions and Events
      8. 4.5.8 Accounting for Long-Term Follow-up
      9. 4.5.9 Baseline Values
  5. 5 Models for Special Purpose Domains
    1. 5.1 Comments
    2. 5.2 Demographics
    3. 5.3 Subject Elements
    4. 5.4 Subject Disease Milestones
    5. 5.5 Subject Visits
  6. 6 Domain Models Based on the General Observation Classes
    1. 6.1 Models for Interventions Domains
      1. 6.1.1 Procedure Agents
      2. 6.1.2 Concomitant and Prior Medications
      3. 6.1.3 Exposure Domains
        1. 6.1.3.1 Exposure
        2. 6.1.3.2 Exposure as Collected
        3. 6.1.3.3 Exposure/Exposure as Collected Examples
      4. 6.1.4 Meal Data
      5. 6.1.5 Procedures
      6. 6.1.6 Substance Use
    2. 6.2 Models for Events Domains
      1. 6.2.1 Adverse Events
      2. 6.2.2 Clinical Events
      3. 6.2.3 Disposition
      4. 6.2.4 Protocol Deviations
      5. 6.2.5 Healthcare Encounters
      6. 6.2.6 Medical History
    3. 6.3 Models for Findings Domains
      1. 6.3.1 Drug Accountability
      2. 6.3.2 Death Details
      3. 6.3.3 ECG Test Results
      4. 6.3.4 Inclusion/Exclusion Criteria Not Met
      5. 6.3.5 Immunogenicity Specimen Assessments
      6. 6.3.6 Laboratory Test Results
      7. 6.3.7 Microbiology Domains
        1. 6.3.7.1 Microbiology Specimen
        2. 6.3.7.2 Microbiology Susceptibility
        3. 6.3.7.3 Microbiology Specimen/Microbiology Susceptibility Examples
      8. 6.3.8 Microscopic Findings
      9. 6.3.9 Morphology
      10. 6.3.10 Morphology/Physiology Domains
        1. 6.3.10.1 Generic Morphology/Physiology Specification
        2. 6.3.10.2 Cardiovascular System Findings
        3. 6.3.10.3 Musculoskeletal System Findings
        4. 6.3.10.4 Nervous System Findings
        5. 6.3.10.5 Ophthalmic Examinations
        6. 6.3.10.6 Reproductive System Findings
        7. 6.3.10.7 Respiratory System Findings
        8. 6.3.10.8 Urinary System Findings
      11. 6.3.11 Pharmacokinetics Domains
        1. 6.3.11.1 Pharmacokinetics Concentrations
        2. 6.3.11.2 Pharmacokinetics Parameters
        3. 6.3.11.3 Relating PP Records to PC Records
      12. 6.3.12 Physical Examination
      13. 6.3.13 Questionnaires, Ratings, and Scales [QRS] Domains
        1. 6.3.13.1 Functional Tests
        2. 6.3.13.2 Questionnaires
        3. 6.3.13.3 Disease Response and Clin Classification
      14. 6.3.14 Subject Characteristics
      15. 6.3.15 Subject Status
      16. 6.3.16 Tumor/Lesion Domains
        1. 6.3.16.1 Tumor/Lesion Identification
        2. 6.3.16.2 Tumor/Lesion Results
        3. 6.3.16.3 Tumor Identification/Tumor Results Examples
      17. 6.3.17 Vital Signs
    4. 6.4 Findings About Events or Interventions
      1. 6.4.1 When to Use Findings About
      2. 6.4.2 Naming Findings About Domains
      3. 6.4.3 Variables Unique to Findings About
      4. 6.4.4 Findings About
      5. 6.4.5 Skin Response
  7. 7 Trial Design Model Datasets
    1. 7.1 Introduction to Trial Design Model Datasets
      1. 7.1.1 Purpose of Trial Design Model
      2. 7.1.2 Definitions of Trial Design Concepts
      3. 7.1.3 Current and Future Contents of the Trial Design Model
    2. 7.2 Experimental Design [TA and TE]
      1. 7.2.1 Trial Arms
        1. 7.2.1.1 Trial Arms Issues
      2. 7.2.2 Trial Elements
        1. 7.2.2.1 Trial Elements Issues
    3. 7.3 Schedule for Assessments [TV, TD, and TM]
      1. 7.3.1 Trial Visits
        1. 7.3.1.1 Trial Visits Issues
      2. 7.3.2 Trial Disease Assessments
      3. 7.3.3 Trial Disease Milestones
    4. 7.4 Trial Summary and Eligibility [TI and TS]
      1. 7.4.1 Trial Inclusion/Exclusion Criteria
      2. 7.4.2 Trial Summary
        1. 7.4.2.1 Use of Null Flavor
    5. 7.5 How to Model the Design of a Clinical Trial
  8. 8 Representing Relationships and Data
    1. 8.1 Relating Groups of Records Within a Domain Using the --GRPID Variable
      1. 8.1.1 --GRPID Example
    2. 8.2 Relating Peer Records
      1. 8.2.1 RELREC Dataset
      2. 8.2.2 RELREC Dataset Examples
    3. 8.3 Relating Datasets
      1. 8.3.1 RELREC Dataset Relationship Example
    4. 8.4 Relating Non-Standard Variables Values to a Parent Domain
      1. 8.4.1 Supplemental Qualifiers – SUPP-- Datasets
      2. 8.4.2 Submitting Supplemental Qualifiers in Separate Datasets
      3. 8.4.3 SUPP-- Examples
      4. 8.4.4 When Not to Use Supplemental Qualifiers
    5. 8.5 Relating Comments to a Parent Domain
    6. 8.6 How to Determine Where Data Belong in SDTM-Compliant Data Tabulations
      1. 8.6.1 Guidelines for Determining the General Observation Class
      2. 8.6.2 Guidelines for Forming New Domains
      3. 8.6.3 Guidelines for Differentiating Between Events, Findings, and Findings About Events
    7. 8.7 Relating Study Subjects
  9. 9 Study References
    1. 9.1 Device Identifiers
    2. 9.2 Non-host Organism Identifiers
    3. 9.3 Pharmacogenomic/Genetic Biomarker Identifiers

  1. Appendices
    1. Appendix A: CDISC SDS Extended Leadership Team
    2. Appendix B: Glossary and Abbreviations
    3. Appendix C: Controlled Terminology
      1. Appendix C1: Trial Summary Codes
      2. Appendix C2: Supplemental Qualifiers Name Codes
    4. Appendix D: CDISC Variable-Naming Fragments
    5. Appendix E: Revision History
    6. Appendix F: Representations and Warranties, Limitations of Liability, and Disclaimers

1 Introduction

1.1 Purpose

This document comprises the CDISC Version 3.3 [v3.3] Study Data Tabulation Model Implementation Guide for Human Clinical Trials [SDTMIG], which has been prepared by the Submissions Data Standards [SDS] team of the Clinical Data Interchange Standards Consortium [CDISC]. Like its predecessors, v3.3 is intended to guide the organization, structure, and format of standard clinical trial tabulation datasets submitted to a regulatory authority. Version 3.3 supersedes all prior versions of the SDTMIG.

The SDTMIG should be used in close concert with the version 1.7 of the CDISC Study Data Tabulation Model [SDTM, available at //www.cdisc.org/sdtm], which describes the general conceptual model for representing clinical study data that is submitted to regulatory authorities and should be read prior to reading the SDTMIG. Version 3.3 provides specific domain models, assumptions, business rules, and examples for preparing standard tabulation datasets that are based on the SDTM.

This document is intended for companies and individuals involved in the collection, preparation, and analysis of clinical data that will be submitted to regulatory authorities.

1.2 Organization of this Document

This document is organized into the following sections:

  • Section 1, Introduction, provides an overall introduction to the v3.3 models and describes changes from prior versions.
  • Section 2, Fundamentals of the SDTM, recaps the basic concepts of the SDTM, and describes how this implementation guide should be used in concert with the SDTM.
  • Section 3, Submitting Data in Standard Format, explains how to describe metadata for regulatory submissions, and how to assess conformance with the standards.
  • Section 4, Assumptions for Domain Models, describes basic concepts, business rules, and assumptions that should be taken into consideration before applying the domain models.
  • Section 5, Models for Special Purpose Domains, describes special purpose domains, including Demographics, Comments, Subject Visits, and Subject Elements.
  • Section 6, Domain Models Based on the General Observation Classes, provides specific metadata models based on the three general observation classes, along with assumptions and example data.
  • Section 7, Trial Design Model Datasets, describes domains for trial-level data, with assumptions and examples.
  • Section 8, Representing Relationships and Data, describes how to represent relationships between separate domains, datasets, and/or records, and provides information to help sponsors determine where data belong in the SDTM.
  • Section 9, Study References, provides structures for representing study-specific terminology used in subject data.
  • Appendices provide additional background material and describe other supplemental material relevant to implementation.

1.3 Relationship to Prior CDISC Documents

This document, together with the SDTM, represents the most recent version of the CDISC Submission Data Domain Models. Since all updates are intended to be backward compatible, the term "v3.x" is used to refer to Version 3.3 and all subsequent versions. The most significant changes since the prior version, v3.2, include:

  • Preparation of the SDTMIG in the CDISC wiki environment.
  • Renumbering of sections in Section 4.3, Coding and Controlled Terminology Assumptions, to remove an unnecessary layer.
  • The following new domain in Section 5, Models for Special Purpose Domains:
    • Subject Disease Milestones [SM]
  • The following new domains in Section 6.1, Models for Interventions Domains:
    • Meal Data [ML]
    • Procedure Agents [AG]
  • The following new domain in Section 6.3, Models for Findings Domains:
    • Functional Tests [FT]
  • The following body system-based domains in Section 6.3.10, Morphology/Physiology Domains:
    • Cardiovascular System Findings [CV]
    • Musculoskeletal System Findings [MK]
    • Nervous System Findings [NV]
    • Ophthalmic Examinations [OE]
    • Respiratory System Findings [RE]
    • Urinary System Findings [UI]
  • The following new domain in Section 7, Trial Design Model Datasets:
    • Trial Disease Milestones [TM]
  • The new Section 9, Study References
  • The following new domains in Section 9, Study References:
    • Device Identifiers [DI]
    • Non-host Organism Identifiers [OI]
    • Pharmacogenomic/Genetic Biomarker Identifiers [PB]
  • Updated Controlled Terminology for applicable variables across all domains, if available.

A detailed list of changes between versions is provided in Appendix E, Revision History.

Version 3.1 was the first fully implementation-ready version of the CDISC Submission Data Standards that was directly referenced by the FDA for use in human clinical studies involving drug products. However, future improvements and enhancements will continue to be made as sponsors gain more experience submitting data in this format. Therefore, CDISC will be preparing regular updates to the implementation guide to provide corrections, clarifications, additional domain models, examples, business rules, and conventions for using the standard domain models. CDISC will produce further documentation for controlled terminology as separate publications, so sponsors are encouraged to check the CDISC website [//www.cdisc.org/terminology] frequently for additional information. See Section 4.3, Coding and Controlled Terminology Assumptions, for the most up-to-date information on applying Controlled Terminology.

1.4 How to Read this Implementation Guide

This SDTM Implementation Guide [SDTMIG] is best read online, so the reader can benefit from the many hyperlinks included to both internal and external references. The following guidelines may be helpful in reading this document:

  1. First, read the SDTM to gain a general understanding of SDTM concepts.
  2. Next, read Sections 1-3 of this document to review the key concepts for preparing domains and submitting data to regulatory authorities. Refer to Appendix B, Glossary and Abbreviations, as necessary.
  3. Read Section 4, Assumptions for Domain Models.
  4. Review Section 5, Models for Special Purpose Domains, and Section 6, Domain Models Based on the General Observation Classes, in detail, referring back to Section 4, Assumptions for Domain Models, as directed. See the implementation examples for each domain to gain an understanding of how to apply the domain models for specific types of data.
  5. Read Section 7, Trial Design Model Datasets, to understand the fundamentals of the Trial Design Model and consider how to apply the concepts for typical protocols.
  6. Review Section 8, Representing Relationships and Data, to learn advanced concepts of how to express relationships between datasets, records, and additional variables not specifically defined in the models.
  7. Review Section 9, Study References, to learn occasions when it is necessary to establish study-specific references that will be used in accordance with subject data.
  8. Finally, review the Appendices as appropriate. Appendix C, Controlled Terminology, in particular, describes how CDISC Terminology is centrally managed by the CDISC Controlled Terminology Team. Efforts are made at publication time to ensure all SDTMIG domain/dataset specification tables and/or examples reflect the latest CDISC Terminology; users, however, should refer to //www.cancer.gov/research/resources/terminology/cdisc as the authoritative source of controlled terminology, as CDISC controlled terminology is updated on a quarterly basis.

This implementation guide covers most data collected in human clinical trials, but separate implementation guides provide information about certain data, and should be consulted when needed.

  • The SDTM Implementation Guide for Associated Persons [SDTMIG-AP] provides structures for representing data collected about persons who are not study subjects.
  • The SDTM Implementation Guide for Medical Devices [SDTMIG-MD] provides structures for data about devices.
  • The SDTM Implementation Guide for Pharmacogenomics/Genetics [SDTMIG-PGx] provides structures for pharmacogenetic/genomic data and for data about biospecimens.

1.4.1 How to Read a Domain Specification

A domain specification table includes rows for all required and expected variables for a domain and for a set of permissible variables. The permissible variables do not include all the variables that are allowed for the domain; they are a set of variables that the SDS team considered likely to be included. The columns of the table:

  • Variable Name
    • For variables that do not include a domain prefix, this name is taken directly from the SDTM.
    • For variables that do include the domain prefix, this name from the SDTM, but with "--" placeholder in the SDTM variable name replaced by the domain prefix.
  • Variable Label: A longer name for the variable.
    • This may be the same as the label in the SDTM, or it may be customized for the domain.
    • If a sponsor includes in a dataset an allowable variable not in the domain specification, they will create an appropriate label.
  • Type: One of the two SAS datatypes, "Num" or "Char". These values are taken directly from the SDTM.
  • Controlled Terms, Codelist, or Format
    • Controlled Terms are represented as hyperlinked text. The domain code in the row for the DOMAIN variable is the most common kind of controlled term represented in domain specifications.
    • Codelist
      • An asterisk * indicates that the variable may be subject to controlled terminology.
        • The controlled terminology might be of a type that would inherently be sponsor defined.
        • The controlled terminology might be of a type that could be standardized, but has not yet been developed.
        • The controlled terminology might be terminology that would be specified in value-level metadata.
      • A hyperlinked codelist name in parentheses indicates that the variable is subject to the CDISC controlled terminology in the named codelist.
      • The name of an external code system [e.g., MedDRA, ISO 3166 Alpha-3] may be listed in plain text.
    • Format: "ISO8601" in plain text indicates that the variable values should be formatted in conformance with that standard.
  • Role: This is taken directly from the SDTM. Note that if a variable is either a Variable Qualifier or a Synonym Qualifier, the SDTM includes the qualified variable, but SDTMIG domain specifications do not.
  • CDISC Notes: The notes may include any of the following:
    • A description of what the variable means.
    • Information about how this variable relates to another variable.
    • Rules for when or how the variable should be populated, or how the contents should be formatted.
    • Examples of values that might appear in the variable. Such examples are only examples, and although they may be CDISC controlled terminology values, their presence in a CDISC note should not be construed as definitive. For authoritative information on CDISC controlled terminology, consult //www.cancer.gov/research/resources/terminology/cdisc.
  • Core: Contains one of the three values "Req", "Exp", or "Perm", which are explained further in Section 4.1.5, SDTM Core Designations.

2 Fundamentals of the SDTM

2.1 Observations and Variables

The SDTMIG for Human Clinical Trials is based on the SDTM's general framework for organizing clinical trials information that is to be submitted to regulatory authorities. The SDTM is built around the concept of observations collected about subjects who participated in a clinical study. Each observation can be described by a series of variables, corresponding to a row in a dataset. Each variable can be classified according to its Role. A Role determines the type of information conveyed by the variable about each distinct observation and how it can be used. Variables can be classified into five major roles:

  • Identifier variables, such as those that identify the study, subject, domain, and sequence number of the record
  • Topic variables, which specify the focus of the observation [such as the name of a lab test]
  • Timing variables, which describe the timing of the observation [such as start date and end date]
  • Qualifier variables, which include additional illustrative text or numeric values that describe the results or additional traits of the observation [such as units or descriptive adjectives]
  • Rule variables, which express an algorithm or executable method to define start, end, and branching or looping conditions in the Trial Design model

The set of Qualifier variables can be further categorized into five sub-classes:

  • Grouping Qualifiers are used to group together a collection of observations within the same domain. Examples include --CAT and --SCAT.
  • Result Qualifiers describe the specific results associated with the topic variable in a Findings dataset. They answer the question raised by the topic variable. Result Qualifiers are --ORRES, --STRESC, and --STRESN.
  • Synonym Qualifiers specify an alternative name for a particular variable in an observation. Examples include --MODIFY and --DECOD, which are equivalent terms for a --TRT or --TERM topic variable, and --TEST and --LOINC, which are equivalent terms for a --TESTCD.
  • Record Qualifiers define additional attributes of the observation record as a whole [rather than describing a particular variable within a record]. Examples include --REASND, AESLIFE, and all other SAE flag variables in the AE domain; AGE, SEX, and RACE in the DM domain; and --BLFL, --POS, --LOC, --SPEC and --NAM in a Findings domain
  • Variable Qualifiers are used to further modify or describe a specific variable within an observation and are only meaningful in the context of the variable they qualify. Examples include --ORRESU, --ORNRHI, and --ORNRLO, all of which are Variable Qualifiers of --ORRES; and --DOSU, which is a Variable Qualifier of --DOSE.

For example, in the observation, "Subject 101 had mild nausea starting on Study Day 6," the Topic variable value is the term for the adverse event, "NAUSEA". The Identifier variable is the subject identifier, "101". The Timing variable is the study day of the start of the event, which captures the information, "starting on Study Day 6", while an example of a Record Qualifier is the severity, the value for which is "MILD". Additional Timing and Qualifier variables could be included to provide the necessary detail to adequately describe an observation.

2.2 Datasets and Domains

Observations about study subjects are normally collected for all subjects in a series of domains. A domain is defined as a collection of logically related observations with a common topic. The logic of the relationship may pertain to the scientific subject matter of the data or to its role in the trial. Each domain is represented by a single dataset.

Each domain dataset is distinguished by a unique, two-character code that should be used consistently throughout the submission. This code, which is stored in the SDTM variable named DOMAIN, is used in four ways: as the dataset name, the value of the DOMAIN variable in that dataset; as a prefix for most variable names in that dataset; and as a value in the RDOMAIN variable in relationship tables Section 8, Representing Relationships and Data.

All datasets are structured as flat files with rows representing observations and columns representing variables. Each dataset is described by metadata definitions that provide information about the variables used in the dataset. The metadata are described in a data definition document, a Define-XML document, that is submitted with the data to regulatory authorities. The Define-XML standard, available at //www.cdisc.org/standards/transport/define-xml, specifies metadata attributes to describe SDTM data.

Data stored in SDTM datasets include both raw [as originally collected] and derived values [e.g., converted into standard units, or computed on the basis of multiple values, such as an average]. The SDTM lists only the name, label, and type, with a set of brief CDISC guidelines that provide a general description for each variable.

The domain dataset models included in Section 5, Models for Special Purpose Domains and Section 6, Domain Models Based on the General Observation Classes of this document provide additional information about Controlled Terms or Format, notes on proper usage, and examples. See Section 1.4.1, How to Read a Domain Specification.

2.3 The General Observation Classes

Most subject-level observations collected during the study should be represented according to one of the three SDTM general observation classes: Interventions, Events, or Findings. The lists of variables allowed to be used in each of these can be found in the SDTM.

  • The Interventions class captures investigational, therapeutic, and other treatments that are administered to the subject [with some actual or expected physiological effect] either as specified by the study protocol [e.g., exposure to study drug], coincident with the study assessment period [e.g., concomitant medications], or self-administered by the subject [such as use of alcohol, tobacco, or caffeine].
  • The Events class captures planned protocol milestones such as randomization and study completion, and occurrences, conditions, or incidents independent of planned study evaluations occurring during the trial [e.g., adverse events] or prior to the trial [e.g., medical history].
  • The Findings class captures the observations resulting from planned evaluations to address specific tests or questions such as laboratory tests, ECG testing, and questions listed on questionnaires.

In most cases, the choice of observation class appropriate to a specific collection of data can be easily determined according to the descriptions provided above. The majority of data, which typically consists of measurements or responses to questions, usually at specific visits or time points, will fit the Findings general observation class. Additional guidance on choosing the appropriate general observation class is provided in Section 8.6.1, Guidelines for Determining the General Observation Class.

General assumptions for use with all domain models and custom domains based on the general observation classes are described in Section 4, Assumptions for Domain Models; specific assumptions for individual domains are included with the domain models.

2.4 Datasets Other Than General Observation Class Domains

The SDTM includes four types of datasets other than those based on the general observation classes:

  • Domain datasets, which include subject-level data that do not conform to one of the three general observation classes. These include Demographics [DM], Comments [CO], Subject Elements [SE], and Subject Visits [SV] [1], and are described in Section 5, Models for Special Purpose Domains.
  • Trial Design Model [TDM] datasets, which represent information about the study design but do not contain subject data. These include datasets such as Trial Arms [TA] and Trial Elements [TE] and are described in Section 7, Trial Design Model Datasets.
  • Relationship datasets, such as the RELREC and SUPP-- datasets. These are described in Section 8, Representing Relationships and Data.
  • Study Reference datasets, which include Device Identifiers [DI], Non-host Organism Identifiers [OI], and Pharmacogenomic/Genetic Biomarker Identifiers [PB]. These provide structures for representing study-specific terminology used in subject data. These are described in Section 9, Study References.

[1] SE and SV were included as part of the Trial Design Model in SDTMIG v3.1.1, but were moved in SDTMIG v3.1.2.

2.5 The SDTM Standard Domain Models

A sponsor should only submit domain datasets that were actually collected [or directly derived from the collected data] for a given study. Decisions on what data to collect should be based on the scientific objectives of the study, rather than the SDTM. Note that any data collected that will be submitted in an analysis dataset must also appear in a tabulation dataset.

The collected data for a given study may use standard domains from this and other SDTM Implementation Guides as well as additional custom domains based on the three general observation classes. A list of standard domains is provided in Section 3.2.1, Dataset-Level Metadata. Final domains will be published only in an SDTM Implementation Guide [the SDTMIG for human clinical trials or another implementation guide, such as the SDTMIG for Medical Devices]. Therapeutic area standards projects and other projects may develop proposals for additional domains. Draft versions of these domains may be made available in the CDISC wiki in the SDTM Draft Domains [//wiki.cdisc.org/x/s4Iv] area.

Starting with SDTMIG v3.3:

  • A new domain has version 1.0.
  • An existing version that has changed since the last published version of the SDTMIG is up-versioned.
  • An existing version that has not changed since the last published version of the SDTMIG is not up-versioned.

What constitutes a change for the purposes of deciding a domain version will be developed further, but for SDTMIG v3.3, a domain was assigned a version of v3.3 if there was a change to the specification and/or the assumptions from the domain as it appeared in SDTMIG v3.2.

These general rules apply when determining which variables to include in a domain:

  • The Identifier variables, STUDYID, USUBJID, DOMAIN, and --SEQ are required in all domains based on the general observation classes. Other Identifiers may be added as needed.
  • Any Timing variables are permissible for use in any submission dataset based on a general observation class except where restricted by specific domain assumptions.
  • Any additional Qualifier variables from the same general observation class may be added to a domain model except where restricted by specific domain assumptions.
  • Sponsors may not add any variables other than those described in the preceding three bullets. The addition of non-standard variables will compromise the FDA's ability to populate the data repository and to use standard tools. The SDTM allows for the inclusion of a sponsor's non-SDTM variables using the Supplemental Qualifiers special purpose dataset structure, described in Section 8.4, Relating Non-Standard Variables Values to a Parent Domain. As the SDTM continues to evolve over time, certain additional standard variables may be added to the general observation classes.
  • Standard variables must not be renamed or modified for novel usage. Their metadata should not be changed.
  • A Permissible variable should be used in an SDTM dataset wherever appropriate.  
    • If a study includes a data item that would be represented in a Permissible variable, then that variable must be included in the SDTM dataset, even if null. Indicate no data were available for that variable in the Define-XML document.
    • If a study did not include a data item that would be represented in a Permissible variable, then that variable should not be included in the SDTM dataset and should not be declared in the Define-XML document.

2.6 Creating a New Domain

This section describes the overall process for creating a custom domain, which must be based on one of the three SDTM general observation classes. The number of domains submitted should be based on the specific requirements of the study. Follow the process below to create a custom domain:

  1. Confirm that none of the existing published domains will fit the need. A custom domain may only be created if the data are different in nature and do not fit into an existing published domain.
    • Establish a domain of a common topic [i.e., where the nature of the data is the same], rather than by a specific method of collection [e.g., electrocardiogram, EG]. Group and separate data within the domain using --CAT, --SCAT, --METHOD, --SPEC, --LOC, etc. as appropriate. Examples of different topics are: microbiology, tumor measurements, pathology/histology, vital signs, and physical exam results.
    • Do not create separate domains based on time; rather, represent both prior and current observations in a domain [e.g., CM for all non-study medications]. Note that AE and MH are an exception to this best practice because of regulatory reporting needs.
    • How collected data are used [e.g., to support analyses and/or efficacy endpoints] must not result in the creation of a custom domain. For example, if blood pressure measurements are endpoints in a hypertension study, they must still be represented in the VS [Vital Signs] domain, as opposed to a custom "efficacy" domain. Similarly, if liver function test results are of special interest, they must still be represented in the LB [Laboratory Tests] domain.
    • Data that were collected on separate CRF modules or pages may fit into an existing domain [such as separate questionnaires into the QS domain, or prior and concomitant medications in the CM domain].
    • If it is necessary to represent relationships between data that are hierarchical in nature [e.g., a parent record must be observed before child records], then establish a domain pair [e.g., MB/MS, PC/PP]. Note, domain pairs have been modeled for microbiology data [MB/MS domains] and PK data [PC/PP domains] to enable dataset-level relationships to be described using RELREC. The domain pair uses DOMAIN as an Identifier to group parent records [e.g., MB] from child records [e.g., MS] and enables a dataset-level relationship to be described in RELREC. Without using DOMAIN to facilitate description of the data relationships, RELREC, as currently defined, could not be used without introducing a variable that would group data like DOMAIN.
  2. Check the SDTM Draft Domains area of CDISC wiki SDTM Draft Domains Home [//wiki.cdisc.org/x/s4Iv] for proposed domains developed since the last published version of the SDTMIG. These proposed domains may be used as custom domains in a submission.
  3. Look for an existing, relevant domain model to serve as a prototype. If no existing model seems appropriate, choose the general observation class [Interventions, Events, or Findings] that best fits the data by considering the topic of the observation The general approach for selecting variables for a custom domain is as follows [also see Figure 2.6, Creating a New Domain, below].
    1. Select and include the required identifier variables [e.g., STUDYID, DOMAIN, USUBJID, --SEQ] and any permissible Identifier variables from the SDTM.
    2. Include the topic variable from the identified general observation class [e.g., --TESTCD for Findings] in the SDTM.
    3. Select and include the relevant qualifier variables from the identified general observation class in the SDTM. Variables belonging to other general observation classes must not be added.
    4. Select and include the applicable timing variables in the SDTM.
    5. Determine the domain code, one that is not a domain code in the CDISC Controlled Terminology codelist "SDTM Domain Abbreviations" available at  //www.cancer.gov/research/resources/terminology/cdisc. If it desired to have this domain code as part of CDISC controlled terminology, then submit a request to //ncitermform.nci.nih.gov/ncitermform/?version=cdisc. The sponsor-selected, two-character domain code should be used consistently throughout the submission.
    6. Apply the two-character domain code to the appropriate variables in the domain. Replace all variable prefixes [shown in the models as two hyphens "--"] with the domain code.
    7. Set the order of variables consistent with the order defined in the SDTM for the general observation class.
    8. Adjust the labels of the variables only as appropriate to properly convey the meaning in the context of the data being submitted in the newly created domain. Use title case for all labels [title case means to capitalize the first letter of every word except for articles, prepositions, and conjunctions].
    9. Ensure that appropriate standard variables are being properly applied by comparing their use in the custom domain to their use in standard domains.

    10. Describe the dataset within the Define-XML document. See Section 3.2, Using the CDISC Domain Models in Regulatory Submissions — Dataset Metadata.

    11. Place any non-standard [SDTM] variables in a Supplemental Qualifier dataset. Mechanisms for representing additional non-standard qualifier variables not described in the general observation classes and for defining relationships between separate datasets or records are described in Section 8.4, Relating Non-Standard Variables Values to a Parent Domain.

Figure 2.6: Creating a New Domain

2.7 SDTM Variables Not Allowed in SDTMIG

This section identifies those SDTM variables that either 1] should not be used in SDTM-compliant data tabulations of clinical trials data or 2] have not yet been evaluated for use in human clinical trials.

The following SDTM variables, defined for use in non-clinical studies [SEND], must NEVER be used in the submission of SDTM-based data for human clinical trials:

  • --USCHFL [Interventions, Events, Findings]
  • --DTHREL [Findings]
  • --EXCLFL [Findings]
  • --REASEX [Findings]
  • --IMPLBL [Findings]
  • FETUSID [Identifiers]
  • --DETECT [Timing Variables]
  • --NOMDY [Timing Variables]
  • --NOMLBL [Timing Variables]

The following variables can be used for non-clinical studies [SEND] but must NEVER be used in the Demographics domain for human clinical trials, where all subjects are human. See Section 9.2, Non-host Organism Identifiers [OI], for information about representing taxonomic information for non-host organisms such as bacteria and viruses.

  • SPECIES [Demographics]
  • STRAIN [Demographics]
  • SBSTRAIN [Demographics]

The following variables have not been evaluated for use in human clinical trials and must therefore be used with extreme caution:

  • --METHOD [Interventions]
  • --ANTREG [Findings]
  • --CHRON [Findings]
  • --DISTR [Findings]
  • SETCD [Demographics]

    The use of SETCD additionally requires the use of the Trials Sets domain.

The following identifier variable can be used for non-clinical studies [SEND], and may be used in human clinical trials when appropriate:

  • POOLID

    The use of POOLID additionally requires the use of the Pool Definition dataset.

Other variables defined in the SDTM are allowed for use as defined in this SDTMIG except when explicitly stated. Custom domains, created following the guidance in Section 2.6, Creating a New Domain, may utilize any appropriate Qualifier variables from the selected general observation class.

3 Submitting Data in Standard Format

3.1 Standard Metadata for Dataset Contents and Attributes

The SDTMIG provides standard descriptions of some of the most commonly used data domains, with metadata attributes. These include descriptive metadata attributes that should be included in a Define-XML document. In addition, the CDISC domain models include two shaded columns that are not sent to the FDA. These columns assist sponsors in preparing their datasets:

  • "CDISC Notes" is for notes to the sponsor regarding the relevant use of each variable.
  • "Core" indicates how a variable is classified [see Section 4.1.5, SDTM Core Designations].

The domain models in Section 6, Domain Models Based on the General Observation Classes illustrate how to apply the SDTM when creating a specific domain dataset. In particular, these models illustrate the selection of a subset of the variables offered in one of the general observation classes, along with applicable timing variables. The models also show how a standard variable from a general observation class should be adjusted to meet the specific content needs of a particular domain, including making the label more meaningful, specifying controlled terminology, and creating domain-specific notes and examples. Thus the domain models not only demonstrate how to apply the model for the most common domains, but also give insight on how to apply general model concepts to other domains not yet defined by CDISC.

3.2 Using the CDISC Domain Models in Regulatory Submissions — Dataset Metadata

The Define-XML document that accompanies a submission should also describe each dataset that is included in the submission and describe the natural key structure of each dataset. While most studies will include DM and a set of safety domains based on the three general observation classes [typically including EX, CM, AE, DS, MH, LB, and VS], the actual choice of which data to submit will depend on the protocol and the needs of the regulatory reviewer. Dataset definition metadata should include the dataset filenames, descriptions, locations, structures, class, purpose, and keys, as shown in Section 3.2.1, Dataset-Level Metadata. In addition, comments can also be provided where needed.

In the event that no records are present in a dataset [e.g., a small PK study where no subjects took concomitant medications], the empty dataset should not be submitted and should not be described in the Define-XML document. The annotated CRF will show the data that would have been submitted had data been received; it need not be re-annotated to indicate that no records exist.

3.2.1 Dataset-Level Metadata

Note that the key variables shown in this table are examples only. A sponsor's actual key structure may be different.

DatasetDescriptionClassStructurePurposeKeysLocationCOCommentsSpecial PurposeOne record per comment per subjectTabulationSTUDYID, USUBJID, IDVAR, COREF, CODTCco.xptDMDemographicsSpecial PurposeOne record per subjectTabulationSTUDYID, USUBJIDdm.xptSESubject ElementsSpecial PurposeOne record per actual Element per subjectTabulationSTUDYID, USUBJID, ETCD, SESTDTCse.xptSMSubject Disease MilestonesSpecial PurposeOne record per Disease Milestone per subjectTabulationSTUDYID, USUBJID, MIDSsm.xptSVSubject VisitsSpecial PurposeOne record per subject per actual visitTabulationSTUDYID, USUBJID, VISITNUMsv.xptAGProcedure AgentsInterventionsOne record per recorded intervention occurrence per subjectTabulationSTUDYID, USUBJID, AGTRT, AGSTDTCag.xptCMConcomitant/Prior MedicationsInterventionsOne record per recorded intervention occurrence or constant-dosing interval per subjectTabulationSTUDYID, USUBJID, CMTRT, CMSTDTCcm.xptECExposure as CollectedInterventionsOne record per protocol-specified study treatment, collected-dosing interval, per subject, per moodTabulationSTUDYID, USUBJID, ECTRT, ECSTDTC, ECMOODec.xptEXExposureInterventionsOne record per protocol-specified study treatment, constant-dosing interval, per subjectTabulationSTUDYID, USUBJID, EXTRT, EXSTDTCex.xptMLMeal DataInterventionsOne record per food product occurrence or constant intake interval per subjectTabulationSTUDYID, USUBJID, MLTRT, MLSTDTCml.xptPRProceduresInterventionsOne record per recorded procedure per occurrence per subjectTabulationSTUDYID, USUBJID, PRTRT, PRSTDTCpr.xptSUSubstance UseInterventionsOne record per substance type per reported occurrence per subjectTabulationSTUDYID, USUBJID, SUTRT, SUSTDTCsu.xptAEAdverse EventsEventsOne record per adverse event per subjectTabulationSTUDYID, USUBJID, AEDECOD, AESTDTCae.xptCEClinical EventsEventsOne record per event per subjectTabulationSTUDYID, USUBJID, CETERM, CESTDTCce.xptDSDispositionEventsOne record per disposition status or protocol milestone per subjectTabulationSTUDYID, USUBJID, DSDECOD, DSSTDTCds.xptDVProtocol DeviationsEventsOne record per protocol deviation per subjectTabulationSTUDYID, USUBJID, DVTERM, DVSTDTCdv.xptHOHealthcare EncountersEventsOne record per healthcare encounter per subjectTabulationSTUDYID, USUBJID, HOTERM, HOSTDTCho.xptMHMedical HistoryEventsOne record per medical history event per subjectTabulationSTUDYID, USUBJID, MHDECODmh.xptCVCardiovascular System FindingsFindingsOne record per finding or result per time point per visit per subjectTabulationSTUDYID, USUBJID, VISITNUM, CVTESTCD, CVTPTREF, CVTPTNUMcv.xptDADrug AccountabilityFindingsOne record per drug accountability finding per subjectTabulationSTUDYID, USUBJID, DATESTCD, DADTCda.xptDDDeath DetailsFindingsOne record per finding per subjectTabulationSTUDYID, USUBJID, DDTESTCD, DDDTCdd.xptEGECG Test ResultsFindingsOne record per ECG observation per replicate per time point or one record per ECG observation per beat per visit per subjectTabulationSTUDYID, USUBJID, EGTESTCD, VISITNUM, EGTPTREF, EGTPTNUMeg.xptFAFindings About Events or InterventionsFindingsOne record per finding, per object, per time point, per visit per subjectTabulationSTUDYID, USUBJID, FATESTCD, FAOBJ, VISITNUM, FATPTREF, FATPTNUMfa.xptFTFunctional TestsFindingsOne record per Functional Test finding per time point per visit per subjectTabulationSTUDYID, USUBJID, TESTCD, VISITNUM, FTTPTREF, FTTPTNUMft.xptIEInclusion/Exclusion Criteria Not MetFindingsOne record per inclusion/exclusion criterion not met per subjectTabulationSTUDYID, USUBJID, IETESTCDie.xptISImmunogenicity Specimen AssessmentsFindingsOne record per test per visit per subjectTabulationSTUDYID, USUBJID, ISTESTCD, VISITNUMis.xptLBLaboratory Test ResultsFindingsOne record per lab test per time point per visit per subjectTabulationSTUDYID, USUBJID, LBTESTCD, LBSPEC, VISITNUM, LBTPTREF, LBTPTNUMlb.xptMBMicrobiology SpecimenFindingsOne record per microbiology specimen finding per time point per visit per subjectTabulationSTUDYID, USUBJID, MBTESTCD, VISITNUM, MBTPTREF, MBTPTNUMmb.xptMIMicroscopic FindingsFindingsOne record per finding per specimen per subjectTabulationSTUDYID, USUBJID, MISPEC, MITESTCDmi.xptMKMusculoskeletal System FindingsFindingsOne record per assessment per visit per subjectTabulationSTUDYID, USUBJID, VISITNUM, MKTESTCD, MKLOC, MKLATmk.xptMOMorphologyFindingsOne record per Morphology finding per location per time point per visit per subjectTabulationSTUDYID, USUBJID, VISITNUM, MOTESTCD, MOLOC, MOLATmo.xptMSMicrobiology SusceptibilityFindingsOne record per microbiology susceptibility test [or other organism-related finding] per organism found in MBTabulationSTUDYID, USUBJID, MSTESTCD, VISITNUM, MSTPTREF, MSTPTNUMms.xptNVNervous System FindingsFindingsOne record per finding per location per time point per visit per subjectTabulationSTUDYID, USUBJID, VISITNUM, CVTPTNUM, CVLOC, NVTESTCDnv.xptOEOphthalmic ExaminationsFindingsOne record per ophthalmic finding per method per location, per time point per visit per subjectTabulationSTUDYID, USUBJID, FOCID, OETESTCD, OETSTDTL, OEMETHOD, OELOC, OELAT, OEDIR, VISITNUM, OEDTC, OETPTREF, OETPTNUM, OEREPNUMoe.xptPCPharmacokinetics ConcentrationsFindingsOne record per sample characteristic or time-point concentration per reference time point or per analyte per subjectTabulationSTUDYID, USUBJID, PCTESTCD, VISITNUM, PCTPTREF, PCTPTNUMpc.xptPEPhysical ExaminationFindingsOne record per body system or abnormality per visit per subjectTabulationSTUDYID, USUBJID, PETESTCD, VISITNUMpe.xptPPPharmacokinetics ParametersFindingsOne record per PK parameter per time-concentration profile per modeling method per subjectTabulationSTUDYID, USUBJID, PPTESTCD, PPCAT, VISITNUM, PPTPTREFpp.xptQSQuestionnairesFindingsOne record per questionnaire per question per time point per visit per subjectTabulationSTUDYID, USUBJID, QSCAT, QSSCAT, VISITNUM, QSTESTCDqs.xptRERespiratory System FindingsFindingsOne record per finding or result per time point per visit per subjectTabulationSTUDYID, USUBJID, VISITNUM, RETESTCD, RETPTNUM, REREPNUMre.xptRPReproductive System FindingsFindingsOne record per finding or result per time point per visit per subjectTabulationSTUDYID, DOMAIN, USUBJID, RPTESTCD, VISITNUMrp.xptRSDisease Response and Clin ClassificationFindingsOne record per response assessment or clinical classification assessment per time point per visit per subject per assessor per medical evaluatorTabulationSTUDYID, USUBJID, RSTESTCD, VISITNUM, RSTPTREF, RSTPTNUM, RSEVAL, RSEVALIDrs.xptSCSubject CharacteristicsFindingsOne record per characteristic per subject.TabulationSTUDYID, USUBJID, SCTESTCDsc.xptSRSkin ResponseFindingsOne record per finding, per object, per time point, per visit per subjectTabulationSTUDYID, USUBJID, SRTESTCD, SROBJ, VISITNUM, SRTPTREF, SRTPTNUMsr.xptSSSubject StatusFindingsOne record per finding per visit per subjectTabulationSTUDYID, USUBJID, SSTESTCD, VISITNUMss.xptTRTumor/Lesion ResultsFindingsOne record per tumor measurement/assessment per visit per subject per assessorTabulationSTUDYID, USUBJID, TRTESTCD, EVALID, VISITNUMtr.xptTUTumor/Lesion IdentificationFindingsOne record per identified tumor per subject per assessorTabulationSTUDYID, USUBJID, EVALID, LNKIDtu.xptURUrinary System FindingsFindingsOne record per finding per location per per visit per subjectTabulationSTUDYID, USUBJID, VISITNUM, URTESTCD, URLOC, URLAT, URDIRur.xptVSVital SignsFindingsOne record per vital sign measurement per time point per visit per subjectTabulationSTUDYID, USUBJID, VSTESTCD, VISITNUM, VSTPTREF, VSTPTNUMvs.xptTATrial ArmsTrial DesignOne record per planned Element per ArmTabulationSTUDYID, ARMCD, TAETORDta.xptTDTrial Disease AssessmentsTrial DesignOne record per planned constant assessment periodTabulationSTUDYID, TDORDERtd.xptTETrial ElementsTrial DesignOne record per planned ElementTabulationSTUDYID, ETCDte.xptTITrial Inclusion/Exclusion CriteriaTrial DesignOne record per I/E crierionTabulationSTUDYID, IETESTCDti.xptTMTrial Disease MilestonesTrial DesignOne record per Disease Milestone typeTabulationSTUDYID, MIDSTYPEtm.xptTSTrial Summary InformationTrial DesignOne record per trial summary parameter valueTabulationSTUDYID, TSPARMCD, TSSEQts.xptTVTrial VisitsTrial DesignOne record per planned Visit per ArmTabulationSTUDYID, ARM, VISITtv.xptRELRECRelated RecordsRelationshipsOne record per related record, group of records or datasetTabulationSTUDYID, RDOMAIN, USUBJID, IDVAR, IDVARVAL, RELIDrelrec.xptRELSUBRelated SubjectsRelationshipsOne record per relationship per related subject per subjectTabulationSTUDYID, USUBJID, RSUBJID, SRELrelsub.xptSUPP--Supplemental Qualifiers for [domain name]RelationshipsOne record per IDVAR, IDVARVAL, and QNAM value per subjectTabulationSTUDYID, RDOMAIN, USUBJID, IDVAR, IDVARVAL, QNAMsupp--.xptOINon-host Organism IdentifiersStudy ReferenceOne record per taxon per non-host organismTabulationNHOID, OISEQoi.xpt

Separate Supplemental Qualifier datasets of the form supp--.xpt are required. See Section 8.4, Relating Non-Standard Variables Values to a Parent Domain.

3.2.1.1 Primary Keys

The table in Section 3.2.1, Dataset-Level Metadata shows examples of what a sponsor might submit as variables that comprise the primary key for SDTM datasets. Since the purpose of this column is to aid reviewers in understanding the structure of a dataset, sponsors should list all of the natural keys [see definition below] for the dataset. These keys should define uniqueness for records within a dataset, and may define a record sort order. The identified keys for each dataset should be consistent with the description of the dataset structure as described in the Define-XML document. For all the general-observation-class domains [and for some special purpose domains], the --SEQ variable was created so that a unique record could be identified consistently across all of these domains via its use, along with STUDYID, USUBJID, DOMAIN. In most domains, --SEQ will be a surrogate key [see definition below] for a set of variables that comprise the natural key. In certain instances, a Supplemental Qualifier [SUPP--] variable might also contribute to the natural key of a record for a particular domain. See Section 4.1.9, Assigning Natural Keys in the Metadata, for how this should be represented, and for additional information on keys.

A natural key is a set of data [one or more columns of an entity] that uniquely identifies that entity and distinguishes it from any other row in the table. The advantage of natural keys is that they exist already; one does not need to introduce a new, "unnatural" value to the data schema. One of the difficulties in choosing a natural key is that just about any natural key one can think of has the potential to change. Because they have business meaning, natural keys are effectively coupled to the business, and they may need to be reworked when business requirements change. An example of such a change in clinical trials data would be the addition of a position or location that becomes a key in a new study, but that wasn't collected in previous studies.

A surrogate key is a single-part, artificially established identifier for a record. Surrogate key assignment is a special case of derived data, one where a portion of the primary key is derived. A surrogate key is immune to changes in business needs. In addition, the key depends on only one field, so it's compact. A common way of deriving surrogate key values is to assign integer values sequentially. The --SEQ variable in the SDTM datasets is an example of a surrogate key for most datasets; in some instances, however, --SEQ might be a part of a natural key as a replacement for what might have been a key [e.g., a repeat sequence number] in the sponsor's database.

3.2.1.2 CDISC Submission Value-Level Metadata

In general, the SDTMIG v3.x Findings data models are closely related to normalized, relational data models in a vertical structure of one record per observation. Since the v3.x data structures are fixed, sometimes information that might have appeared as columns in a more horizontal [denormalized] structure in presentations and reports will instead be represented as rows in an SDTM Findings structure. Because many different types of observations are all presented in the same structure, there is a need to provide additional metadata to describe the expected properties that differentiate, for example, hematology lab results from serum chemistry lab results in terms of data type, standard units, and other attributes.

For example, the Vital Signs data domain could contain subject records related to diastolic and systolic blood pressure, height, weight, and body mass index [BMI]. These data are all submitted in the normalized SDTM Findings structure of one row per vital signs measurement. This means that there could be five records per subject [one for each test or measurement] for a single visit or time point, with the parameter names stored in the Test Code/Name variables, and the parameter values stored in result variables. Since the unique Test Code/Names could have different attributes [i.e., different origins, roles, and definitions] there would be a need to provide value-level metadata for this information.

The value-level metadata should be provided as a separate section of the Define-XML document. For details on the CDISC Define-XML standard, see //www.cdisc.org/standards/transport/define-xml.

3.2.2 Conformance

Conformance with the SDTMIG Domain Models is minimally indicated by:

  • Following the complete metadata structure for data domains
  • Following SDTMIG domain models wherever applicable
  • Using SDTM-specified standard domain names and prefixes where applicable
  • Using SDTM-specified standard variable names
  • Using SDTM-specified data types for all variables
  • Following SDTM-specified controlled terminology and format guidelines for variables, when provided
  • Including all collected and relevant derived data in one of the standard domains, special purpose datasets, or general-observation-class structures
  • Including all Required and Expected variables as columns in standard domains, and ensuring that all Required variables are populated
  • Ensuring that each record in a dataset includes the appropriate Identifier and Timing variables, as well as a Topic variable
  • Conforming to all business rules described in the CDISC Notes column and general and domain-specific assumptions

4 Assumptions for Domain Models

4.1 General Domain Assumptions

4.1.1 Review Study Data Tabulation and Implementation Guide

Review the Study Data Tabulation Model as well as this complete Implementation Guide before attempting to use any of the individual domain models.

4.1.2 Relationship to Analysis Datasets

Specific guidance on preparing analysis datasets can be found in the CDISC Analysis Data Model [ADaM] Implementation Guide and other ADaM documents, available at //www.cdisc.org/adam.

4.1.3 Additional Timing Variables

Additional Timing variables can be added as needed to a standard domain model based on the three general observation classes, except for the cases specified in Assumption 4.4.8, Date and Time Reported in a Domain Based on Findings. Timing variables can be added to special purpose domains only where specified in the SDTMIG domain model assumptions. Timing variables cannot be added to SUPPQUAL datasets or to RELREC [described in Section 8, Representing Relationships and Data].

4.1.3.1 EPOCH Variable Guidance

When EPOCH is included in a Findings class domain, it should be based on the --DTC variable, since this is the date/time of the test or, for tests performed on specimens, the date/time of specimen collection. For observations in Interventions or Events class domains, EPOCH should be based on the --STDTC variable, since this is the start of the Intervention or Event. A possible, though unlikely, exception would be a finding based on an interval specimen collection that started in one epoch but ended in another. --ENDTC might be a more appropriate basis for EPOCH in such a case.

Sponsors should not impute EPOCH values, but should, where possible, assign EPOCH values on the basis of CRF instructions and structure, even ifEPOCH was not directly collected and date/time data was not collected with sufficient precision to permit assignment of an observation to an EPOCH on the basis of date/time data alone. If it is not possible to determine theEPOCH of an observation, then EPOCH should be null. Methods for assigning EPOCH values can be described in the Define-XML document.

Since EPOCH is a study-design construct, it is not applicable to Interventions or Events that started before the subject's participation in the study, nor to Findings performed before their participation in the study. For such records, EPOCH should be null. Note that a subject's participation in a study includes screening, which generally occurs before the reference start date, RFSTDTC, in the DM domain.

4.1.4 Order of the Variables

The order of variables in the Define-XML document must reflect the order of variables in the dataset. The order of variables in the CDISC domain models has been chosen to facilitate the review of the models and application of the models. Variables for the three general observation classes must be ordered with Identifiers first, followed by the Topic, Qualifier, and Timing variables. Within each role, variables must be ordered as shown in SDTM Tables 2.2.1, 2.2.2, 2.2.3, 2.2.3.1, 2.2.4, and 2.2.5.

4.1.5 SDTM Core Designations

Three categories are specified in the "Core" column in the domain models:

  • A Required variable is any variable that is basic to the identification of a data record [i.e., essential key variables and a topic variable] or is necessary to make the record meaningful. Required variables must always be included in the dataset and cannot be null for any record.
  • An Expected variable is any variable necessary to make a record useful in the context of a specific domain. Expected variables may contain some null values, but in most cases will not contain null values for every record. When the study does not include the data item for an expected variable, however, a null column must still be included in the dataset, and a comment must be included in the Define-XML document to state that the study does not include the data item.
  • A Permissible variable should be used in an SDTM dataset wherever appropriate. Although domain specification tables list only some of the Identifier, Timing, and general observation class variables listed in the SDTM, all are permissible unless specifically restricted in this implementation guide [see Section 2.7, SDTM Variables Not Allowed in SDTMIG] or by specific domain assumptions.
    • Domain assumptions that say a Permissible variable is "generally not used" do not prohibit use of the variable.
    • If a study includes a data item that would be represented in a Permissible variable, then that variable must be included in the SDTM dataset, even if null. Indicate no data were available for that variable in the Define-XML document.
    • If a study did not include a data item that would be represented in a Permissible variable, then that variable should not be included in the SDTM dataset and should not be declared in the Define-XML document.

4.1.6 Additional Guidance on Dataset Naming

SDTM datasets are normally named to be consistent with the domain code; for example, the Demographics dataset [DM] is named dm.xpt. [See the SDTM Domain Abbreviation codelist, C66734, in CDISC Controlled Terminology [//www.cancer.gov/research/resources/terminology/cdisc] for standard domain codes]. Exceptions to this rule are described in Section 4.1.7, Splitting Domains, for general-observation-class datasets and in Section 8, Representing Relationships and Data, for the RELREC and SUPP-- datasets.

In some cases, sponsors may need to define new custom domains and may be concerned that CDISC domain codes defined in the future will conflict with those they choose to use. To eliminate any risk of a sponsor using a name that CDISC later determines to have a different meaning, domain codes beginning with the letters X, Y, or Z have been reserved for the creation of custom domains. Any letter or number may be used in the second position. Note the use of codes beginning with X, Y, or Z is optional, and not required for custom domains.

4.1.7 Splitting Domains

Sponsors may choose to split a domain of topically related information into physically separate datasets.

  • A domain based on a general observation class may be split according to values in --CAT. When a domain is split on --CAT, --CAT must not be null.
  • The Findings About [FA] domain [Section 6.4.4, Findings About] may alternatively be split based on the domain of the value in --OBJ. For example, FACM would store Findings About CM records. See Section 6.4.2, Naming Findings About Domains, for more details.

The following rules must be adhered to when splitting a domain into separate datasets to ensure they can be appended back into one domain dataset:

  1. The value of DOMAIN must be consistent across the separate datasets as it would have been if they had not been split [e.g., QS, FA].
  2. All variables that require a domain prefix [e.g., --TESTCD, --LOC] must use the value of DOMAIN as the prefix value [e.g., QS, FA].
  3. --SEQ must be unique within USUBJID for all records across all the split datasets. If there are 1000 records for a USUBJID across the separate datasets, all 1000 records need unique values for --SEQ.
  4. When relationship datasets [e.g., SUPPxx, FAxx, CO, RELREC] relate back to split parent domains, IDVAR would generally be --SEQ. When IDVAR is a value other than --SEQ [e.g., --GRPID, --REFID, --SPID], care should be used to ensure that the parent records across the split datasets have unique values for the variable specified in IDVAR, so that related children records do not accidentally join back to incorrect parent records.
  5. Permissible variables included in one split dataset need not be included in all split datasets.
  6. For domains with two-letter domain codes [i.e., other than SUPP and RELREC], split dataset names can be up to four characters in length. For example, if splitting by --CAT, then dataset names would be the domain name plus up to two additional characters [e.g., QS36 for SF-36]. If splitting Findings About by parent domain, then the dataset name would be the domain code, "FA", plus the two-character domain code for parent domain code [e.g., "FACM"]. The four-character dataset-name limitation allows the use of a Supplemental Qualifier dataset associated with the split dataset.
  7. Supplemental Qualifier datasets for split domains would also be split. The nomenclature would include the additional one-to-two characters used to identify the split dataset [e.g., SUPPQS36, SUPPFACM]. The value of RDOMAIN in the SUPP-- datasets would be the two-character domain code [e.g., QS, FA].
  8. In RELREC, if a dataset-level relationship is defined for a split Findings About domain, then RDOMAIN may contain the four-character dataset name, rather than the domain name "FA", as shown in the following example

    relrec.xpt

    RowSTUDYIDRDOMAINUSUBJIDIDVARIDVARVALRELTYPERELID1ABCCM
    CMSPID
    ONE12ABCFACM
    FASPID
    MANY1

  9. See the SDTM Implementation Guide for Associated Persons for the naming of split associated persons datasets.
  10. See the SDTM Define-XML specification for details regarding metadata representation when a domain is split into different datasets. Additional examples can be referenced in the Metadata Submission Guidelines [MSG] for SDTMIG.

Note that submission of split SDTM domains may be subject to additional dataset splitting conventions as defined by regulators via technical specifications and/or as negotiated with regulatory reviewers.

4.1.7.1 Example of Splitting Questionnaires

This example shows the QS domain data split into three datasets: Clinical Global Impression [QSCG], Cornell Scale for Depression in Dementia [QSCS] and Mini Mental State Examination [QSMM]. Each dataset represents a subset of the QS domain data and has only one value of QSCAT.

QS Domains

Dataset for Clinical Global Impressions

qscg.xpt

RowSTUDYIDDOMAINUSUBJIDQSSEQQSSPIDQSTESTCDQSTESTQSCATQSORRESQSSTRESCQSSTRESNQSBLFLVISITNUMVISITVISITDYQSDTCQSDY1CDISC01QSCDISC01.1000081CGI-CGI-ICGIGLOBGlobal ImprovementClinical Global ImpressionsNo change44
3WEEK 2152003-05-13152CDISC01QSCDISC01.1000082CGI-CGI-ICGIGLOBGlobal ImprovementClinical Global ImpressionsMuch Improved22
10WEEK 241692003-10-131683CDISC01QSCDISC01.1000141CGI-CGI-ICGIGLOBGlobal ImprovementClinical Global ImpressionsMinimally Improved33
3WEEK 2152003-10-31174CDISC01QSCDISC01.1000142CGI-CGI-ICGIGLOBGlobal ImprovementClinical Global ImpressionsMinimally Improved33
10WEEK 241692004-03-30168

Dataset for Cornell Scale for Depression in Dementia

qscs.xpt

RowSTUDYIDDOMAINUSUBJIDQSSEQQSSPIDQSTESTCDQSTESTQSCATQSORRESQSSTRESCQSSTRESNQSBLFLVISITNUMVISITVISITDYQSDTCQSDY1CDISC01QSCDISC01.1000083CSDD-01CSDD01AnxietyCornell Scale for Depression in DementiaSevere22
1SCREEN-132003-04-15-142CDISC01QSCDISC01.10000823CSDD-01CSDD01AnxietyCornell Scale for Depression in DementiaSevere22Y2BASELINE12003-04-2913CDISC01QSCDISC01.1000143CSDD-01CSDD01AnxietyCornell Scale for Depression in DementiaSevere22
1SCREEN-132003-10-06-94CDISC01QSCDISC01.10001428CSDD-06CSDD06RetardationCornell Scale for Depression in DementiaMild11Y2BASELINE12003-10-151

Dataset for Mini Mental State Examination

qsmm.xpt

RowSTUDYIDDOMAINUSUBJIDQSSEQQSSPIDQSTESTCDQSTESTQSCATQSORRESQSSTRESCQSSTRESNQSBLFLVISITNUMVISITVISITDYQSDTCQSDY1CDISC01QSCDISC01.10000881MMSE-A.1MMSEA1Orientation Time ScoreMini Mental State Examination444
1SCREEN-132003-04-15-142CDISC01QSCDISC01.10000888MMSE-A.1MMSEA1Orientation Time ScoreMini Mental State Examination333Y2BASELINE12003-04-2913CDISC01QSCDISC01.10001481MMSE-A.1MMSEA1Orientation Time scoreMini Mental State Examination222
1SCREEN-132003-10-06-94CDISC01QSCDISC01.10001488MMSE-A.1MMSEA1Orientation Time scoreMini Mental State Examination222Y2BASELINE12003-10-151

SUPPQS Domains

Supplemental Qualifiers for QSCG

suppqscg.xpt

RowSTUDYIDRDOMAINUSUBJIDIDVARIDVARVALQNAMQLABELQVALQORIGQEVAL1CDISC01QSCDISC01.100008QSCATClinical Global ImpressionsQSLANGQuestionnaire LanguageGERMANCRF
2CDISC01QSCDISC01.100014QSCATClinical Global ImpressionsQSLANGQuestionnaire LanguageFRENCHCRF

Supplemental Qualifiers for QSCS

suppqscs.xpt

RowSTUDYIDRDOMAINUSUBJIDIDVARIDVARVALQNAMQLABELQVALQORIGQEVAL1CDISC01QSCDISC01.100008QSCATCornell Scale for Depression in DementiaQSLANGQuestionnaire LanguageGERMANCRF
2CDISC01QSCDISC01.100014QSCATCornell Scale for Depression in DementiaQSLANGQuestionnaire LanguageFRENCHCRF

Supplemental Qualifiers for QSMM

suppqsmm.xpt

RowSTUDYIDRDOMAINUSUBJIDIDVARIDVARVALQNAMQLABELQVALQORIGQEVAL1CDISC01QSCDISC01.100008QSCATMini Mental State ExaminationQSLANGQuestionnaire LanguageGERMANCRF
2CDISC01QSCDISC01.100014QSCATMini Mental State ExaminationQSLANGQuestionnaire LanguageFRENCHCRF

4.1.8 Origin Metadata

4.1.8.1 Origin Metadata for Variables

The origin element in the Define-XML document file is used to indicate where the data originated. Its purpose is to unambiguously communicate to the reviewer the origin of the data source. For example, data could be on the CRF [and thus should be traceable to an annotated CRF], derived [and thus traceable to some derivation algorithm], or assigned by some subjective process [and thus traceable to some external evaluator]. The Define-XML specification is the definitive source of allowable origin values. Additional guidance and supporting examples can be referenced using the Metadata Submission Guidelines [MSG] for SDTMIG.

4.1.8.2 Origin Metadata for Records

Sponsors are cautioned to recognize that a derived origin means that all values for that variable were derived, and that collected on the CRF applies to all values as well. In some cases, both collected and derived values may be reported in the same field. For example, some records in a Findings dataset such as QS contain values collected from the CRF; other records may contain derived values, such as a total score. When both derived and collected values are reported in a variable, the origin is to be described using value-level metadata.

4.1.9 Assigning Natural Keys in the Metadata

Section 3.2, Using the CDISC Domain Models in Regulatory Submissions — Dataset Metadata, indicates that a sponsor should include in the metadata the variables that contribute to the natural key for a domain. In a case where a dataset includes a mix of records with different natural keys, the natural key that provides the most granularity is the one that should be provided. The following examples are illustrations of how to do this, and include a case where a Supplemental Qualifier variable is referenced because it forms part of the natural key.

Musculoskeletal System Findings [MK] domain example:

Sponsor A chooses the following natural key for the MK domain:

STUDYID, USUBJID, VISTNUM, MKTESTCD

Sponsor B collects data in such a way that the location [MKLOC and MKLAT] and method [MKMETHOD] variables need to be included in the natural key to identify a unique row. Sponsor B then defines the following natural key for the MK domain.

STUDYID, USUBJID, VISITNUM, MKTESTCD, MKLOC, MKLAT, MKMETHOD

In certain instances a Supplemental Qualifier variable [i.e., a QNAM value, see Section 8.4, Relating Non-Standard Variables Values to a Parent Domain] might also contribute to the natural key of a record, and therefore needs to be referenced as part of the natural key for a domain. The important concept here is that a domain is not limited by physical structure. A domain may be comprised of more than one physical dataset, for example the main domain dataset and its associated Supplemental Qualifiers dataset. Supplemental Qualifiers variables should be referenced in the natural key by using a two-part name. The word QNAM must be used as the first part of the name to indicate that the contributing variable exists in a domain-specific SUPP-- and the second part is the value of QNAM that ultimately becomes a column reference [e.g., QNAM.XVAR when the SUPP-- record has a QNAM of "XVAR"] when the SUPPQUAL records are joined on to the main domain dataset.

Continuing with the MK domain example above:

Sponsor B might have collected data that used different imaging methods, using imaging devices with different makes and models, and using different hand positions. The sponsor considers the make and model information and hand position to be essential data that contributes to the uniqueness of the test result, and so includes a device identifier [SPDEVID] in the data and creates a Supplemental Qualifier variable for hand position [QNAM = "MKHNDPOS"]. The natural key is then defined as follows:

STUDYID, USUBJID, SPDEVID, VISITNUM, MKTESTCD, MKLOC, MKLAT, MKMETHOD, QNAM.MKHNDPOS

Where the notation "QNAM.MKHNDPOS" means the Supplemental Qualifier whose QNAM is "MKHNDPOS".

This approach becomes very useful in a Findings domain when --TESTCD values are "generic" and rely on other variables to completely describe the test. The use of generic test codes helps to create distinct lists of manageable controlled terminology for --TESTCD. In studies where multiple repetitive tests or measurements are being made, for example in a rheumatoid arthritis study where repetitive measurements of bone erosion in the hands and wrists might be made using both X-ray and MRI equipment, the generic MKTEST "Sharp/Genant Bone Erosion Score" would be used in combination with other variables to fully identify the result.

Taking just the phalanges, a sponsor might want to express the following in a test in order to make it unique:

  • Left or Right hand
  • Phalangeal joint position [which finger, which joint]
  • Rotation of the hand
  • Method of measurement [X-ray or MRI]
  • Machine make and model

When CDISC controlled terminology for a test is not available, and a sponsor creates --TEST and --TESTCD values, trying to encapsulate all information about a test within a unique value of a --TESTCD is not a recommended approach for the following reasons:

  • It results in the creation of a potentially large number of test codes.
  • The eight-character values of --TESTCD become less intuitively meaningful.
  • Multiple test codes are essentially representing the same test or measurement simply to accommodate attributes of a test within the --TESTCD value itself [e.g., to represent a body location at which a measurement was taken].

As a result, the preferred approach would be to use a generic [or simple] test code that requires associated qualifier variables to fully express the test detail. This approach was used in creating the CDISC controlled terminology that would be used in the above example:

The MKTESTCD value "SGBESCR" is a "generic" test code, and additional information about the test is provided by separate qualifier variables. The variables that completely specify a test may include domain variables and supplemental qualifier variables. Expressing the natural key becomes very important in this situation in order to communicate the variables that contribute to the uniqueness of a test.

The following variables would be used to fully describe the test. The natural key for this domain includes both parent dataset variables and a supplemental qualifier variable that contribute to the natural key of each row and to describe the uniqueness of the test.

SPDEVIDMKTESTCDMKTESTMKLOCMKLATMKMETHODQNAM.MKHNDPOSACME3000SGBESCRSharp/Genant Bone Erosion ScoreMETACARPOPHALANGEAL JOINT 1LEFTX-RAYPALM UP

4.2 General Variable Assumptions

4.2.1 Variable-Naming Conventions

SDTM variables are named according to a set of conventions, using fragment names [listed in Appendix D, CDISC Variable-Naming Fragments]. Variables with names ending in "CD" are "short" versions of associated variables that do not include the "CD" suffix [e.g., --TESTCD is the short version of --TEST].

Values of --TESTCD must be limited to eight characters and cannot start with a number, nor can they contain characters other than letters, numbers, or underscores. This is to avoid possible incompatibility with SAS v5 Transport files. This limitation will be in effect until the use of other formats [such as Dataset-XML] becomes acceptable to regulatory authorities.

QNAM serves the same purpose as --TESTCD within supplemental qualifier datasets, and so values of QNAM are subject to the same restrictions as values of --TESTCD.

Values of other "CD" variables are not subject to the same restrictions as --TESTCD.

  • ETCD [the companion to ELEMENT] and TSPARMCD [the companion to TSPARM] are limited to eight characters and do not have the character restrictions that apply to --TESTCD. These values should be short for ease of use in programming, but it is not expected that they will need to serve as variable names.
  • ARMCD is limited to 20 characters and does not have the character restrictions that apply to --TESTCD. The maximum length of ARMCD is longer than for other "short" variables to accommodate the kind of values that are likely to be needed for crossover trials. For example, if ARMCD values for a seven-period crossover were constructed using two-character abbreviations for each treatment and separating hyphens, the length of ARMCD values would be 20. This same rule applies to the ACTARMCD variable also.

Variable descriptive names [labels], up to 40 characters, should be provided as data variable labels for all variables, including Supplemental Qualifier variables.

Use of variable names [other than domain prefixes], formats, decodes, terminology, and data types for the same type of data [even for custom domains and Supplemental Qualifiers] should be consistent within and across studies within a submission.

4.2.2 Two-Character Domain Identifier

In order to minimize the risk of difficulty when merging/joining domains for reporting purposes, the two-character Domain Identifier is used as a prefix in most variable names.

Variables in domain specification tables [see Section 5, Models for Special Purpose Domains, Section 6, Domain Models Based on the General Observation Classes, Section 7, Trial Design Model Datasets, Section 8, Representing Relationships and Data, and Section 9, Study References] already specify the complete variable names. When adding variables from the SDTM to standard domains or creating custom domains based on the General Observation Classes, sponsors must replace the -- [two hyphens] prefix in the SDTM tables of General Observation Class, Timing, and Identifier variables with the two-character Domain Identifier [DOMAIN] value for that domain/dataset. The two-character domain code is limited to A-Z for the first character, and A-Z, 0-9 for the second character. No other characters are allowed. This is for compatibility with SAS version 5 Transport files and with file naming for the Electronic Common Technical Document [eCTD].

The following variables are exceptions to the philosophy that all variable names are prefixed with the Domain Identifier:

  • Required Identifiers [STUDYID, DOMAIN, USUBJID]
  • Commonly used grouping and merge Keys [e.g., VISIT, VISITNUM, VISITDY]
  • All Demographics domain [DM] variables other than DMDTC and DMDY
  • All variables in RELREC and SUPPQUAL, and some variables in Comments and Trial Design datasets.

Required Identifiers are not prefixed because they are usually used as keys when merging/joining observations. The --SEQ and the optional Identifiers --GRPID and --REFID are prefixed because they may be used as keys when relating observations across domains.

4.2.3 Use of "Subject" and USUBJID

"Subject" is used to generically refer to both "patients" and "healthy volunteers" in order to be consistent with the recommendation in FDA guidance. The term "Subject" should be used consistently in all labels and Define-XML document comments. To identify a subject uniquely across all studies for all applications or submissions involving the product, a unique identifier [USUBJID] should be assigned and included in all datasets.

The unique subject identifier [USUBJID] is required in all datasets containing subject-level data. USUBJID values must be unique for each trial participant [subject] across all trials in the submission. This means that no two [or more] subjects, across all trials in the submission, may have the same USUBJID. Additionally, the same person who participates in multiple clinical trials [when this is known] must be assigned the same USUBJID value in all trials.

The below dm.xpt sample rows illustrate a single subject who participates in two studies, first in ACME01 and later in ACME14. Note that this is only one example of the possible values for USUBJID. CDISC does not recommend any specific format for the values of USUBJID, only that the values need to be unique for all subjects in the submission, and across multiple submissions for the same compound. Many sponsors concatenate values for the Study, Site and Subject into USUBJID, but this is not a requirement. It is acceptable to use any format for USUBJID, as long as the values are unique across all subjects per FDA guidance.

Study ACME01 dm.xpt

dm.xpt

RowSTUDYIDDOMAINUSUBJIDSUBJIDSITEIDINVNAM1ACME01DMACME01-05-00100105John Doe

Study ACME14 dm.xpt

dm.xpt

RowSTUDYIDDOMAINUSUBJIDSUBJIDSITEIDINVNAM1ACME14DMACME01-05-00101714Mary Smith

4.2.4 Text Case in Submitted Data

It is recommended that text data be submitted in upper case text. Exceptions may include long text data [such as comment text] and values of --TEST in Findings datasets [which may be more readable in title case if used as labels in transposed views]. Values from CDISC controlled terminology or external code systems [e.g., MedDRA] or response values for QRS instruments specified by the instrument documentation should be in the case specified by those sources, which may be mixed case. The case used in the text data must match the case used in the Controlled Terminology provided in the Define-XML document.

4.2.5 Convention for Missing Values

Missing values for individual data items should be represented by nulls. Conventions for representing observations not done, using the SDTM --STAT and --REASND variables, are addressed in Section 4.5.1.2, Tests Not Done and the individual domain models.

4.2.6 Grouping Variables and Categorization

Grouping variables are Identifiers and Qualifiers variables, such as the --CAT [Category] and --SCAT [Subcategory], that group records in the SDTM domains/datasets and can be assigned by sponsors to categorize topic-variable values. For example, a lab record with LBTEST = "SODIUM" might have LBCAT = "CHEMISTRY" and LBSCAT = "ELECTROLYTES". Values for --CAT and --SCAT should not be redundant with the domain name or dictionary classification provided by --DECOD and --BODSYS.

1. Hierarchy of Grouping Variables

STUDYID
DOMAIN
--CAT

--SCAT


USUBJID



--GRPID
--LNKID
--LNKGRP

2. How Grouping Variables Group Data

A. For the subject

  1. All records with the same USUBJID value are a group of records that describe that subject.

B. Across subjects [records with different USUBJID values]

  1. All records with the same STUDYID value are a group of records that describe that study.
  2. All records with the same DOMAIN value are a group of records that describe that domain.
  3. --CAT [Category] and --SCAT [Sub-category] values further subset groups within the domain. Generally, --CAT/--SCAT values have meaning within a particular domain. However, it is possible to use the same values for --CAT/--SCAT in related domains [e.g., MH and AE]. When values are used across domains, the meanings should be the same. Examples of where --CAT/--SCAT may have meaning across domains/datasets:
    1. Cases where different domains in the same general observation class contain similar conceptual information. Adverse Events [AE], Medical History [MH], and Clinical Events [CE], for example, are conceptually the same data, the only differences being when the event started relative to the study start and whether the event is considered a regulatory reportable adverse event in the study. Neurotoxicities collected in Oncology trials both as separate Medical History CRFs [MH domain] and Adverse Event CRFs [AE domain] could both identify/collect "Paresthesia of the left Arm". In both domains, the --CAT variable could have the value of NEUROTOXICITY.
    2. Cases where multiple datasets are necessary to capture data about the same topic. As an example, perhaps the existence and start and stop date of "Paresthesia of the left Arm" is reported as an Adverse Event [AE domain], but the severity of the event is captured at multiple visits and recorded as Findings About [FA dataset]. In both cases the --CAT variable could have a value of NEUROTOXICITY.
    3. Cases where multiple domains are necessary to capture data that was collected together and have an implicit relationship, perhaps identified in the Related Records [RELREC] special purpose dataset.

      Stress Test data collection, for example, may capture the following:

      1. Information about the occurrence, start, stop, and duration of the test [in the PR domain]
      2. Vital Signs recorded during the stress test [VS domain]
      3. Treatments [e.g., oxygen] administered during the stress test [in an Interventions domain]

      In such cases, the data collected during the stress tests recorded in three separate domains may all have --CAT/--SCAT values [STRESS TEST] that identify that this data was collected during the stress test.

C. Within subjects [records with the same USUBJID values]

  1. --GRPID values further group [subset] records within USUBJID. All records in the same domain with the same --GRPID value are a group of records within USUBJID. Unlike --CAT and --SCAT, --GRPID values are not intended to have any meaning across subjects and are usually assigned during or after data collection.

D. Although --SPID and --REFID are Identifier variables, they may sometimes be used as grouping variables and may also have meaning across domains.

E. --LNKID and --LNKGRP express values that are used to link records in separate domains. As such, these variables are often used in IDVAR in a RELREC relationship when there is a dataset-to-dataset relationship.

  1. --LNKID is a grouping identifier used to identify a record in one domain that is related to records in another domain, often forming a one-to-many relationship.
  2. --LNKGRP is a grouping identifier used to identify a group of records in one domain that is related to a record in another domain, often forming a many-to-one relationship.

3. Differences between Grouping Variables

The primary distinctions between --CAT/--SCAT and --GRPID are:

  • --CAT/--SCAT are known [identified] about the data before it is collected.
  • --CAT/--SCAT values group data across subjects.
  • --CAT/--SCAT may have some controlled terminology.
  • --GRPID is usually assigned during or after data collection at the discretion of the sponsor.
  • --GRPID groups data only within a subject.
  • --GRPID values are sponsor-defined, and will not be subject to controlled terminology.

Therefore, data that would be the same across subjects is usually more appropriate in --CAT/--SCAT, and data that would vary across subjects is usually more appropriate in --GRPID. For example, a Concomitant Medication administered as part of a known combination therapy for all subjects ["Mayo Clinic Regimen", for example] would more appropriately use --CAT/--SCAT to identify the medication as part of that regimen. Groups of medications recorded on an SAE form as treatments for the SAE would more appropriately use --GRPID because the groupings are likely to differ across subjects.

In domains based on the Findings general observation class, the --RESCAT variable can be used to categorize results after the fact. --CAT and --SCAT by contrast, are generally pre-defined by the sponsor or used by the investigator at the point of collection, not after assessing the value of Findings results.

4.2.7 Submitting Free Text from the CRF

Sponsors often collect free text data on a CRF to supplement a standard field. This often occurs as part of a list of choices accompanied by "Other, specify." The manner in which these data are submitted will vary based on their role. The handling of verbatim text values for the ---OBJ variable is discussed in Section 6.4.3 Variables Unique to Findings About.

4.2.7.1 "Specify" Values for Non-Result Qualifier Variables

When free-text information is collected to supplement a standard non-result Qualifier field, the free-text value should be placed in the SUPP-- dataset described in Section 8.4, Relating Non-Standard Variables Values to a Parent Domain. When applicable, controlled terminology should be used for SUPP-- field names [QNAM] and their associated labels [QLABEL] [see Section 8.4, Relating Non-Standard Variables Values to a Parent Domain and Appendix C2, Supplemental Qualifiers Name Codes].

For example, when a description of "Other Medically Important Serious Adverse Event" category is collected on a CRF, the free text description should be stored in the SUPPAE dataset.

  • AESMIE = "Y"
  • SUPPAE QNAM = "AESOSP", QLABEL = "Other Medically Important SAE", QVAL = "HIGH RISK FOR ADDITIONAL THROMBOSIS"

Another example is a CRF that collects reason for dose adjustment with additional free-text description:

Reason for Dose Adjustment [EXADJ]Describe

  • Adverse Event
 
  • Insufficient Response
 
  • Non-medical Reason
 

The free text description should be stored in the SUPPEX dataset.

  • EXADJ = "NONMEDICAL REASON"
  • SUPPEX QNAM = "EXADJDSC", QLABEL = "Reason For Dose Adjustment Description", QVAL = "PATIENT MISUNDERSTOOD INSTRUCTIONS"

    Note that QNAM references the "parent" variable name with the addition of "DSC". Likewise, the label is a modification of the parent variable label.

When the CRF includes a list of values for a qualifier field that includes "Other" and the "Other" is supplemented with a "Specify" free text field, then the manner in which the free text "Specify" value is submitted will vary based on the sponsor's coding practice and analysis requirements.

For example, consider a CRF that collects the indication for an analgesic concomitant medication [CMINDC] using a list of pre-specified values and an "Other, specify" field :

Indication for analgesic

  • Post-operative pain
  • Headache
  • Menstrual pain
  • Myalgia
  • Toothache
  • Other, specify: ________________

An investigator has selected "OTHER" and specified "Broken arm". Several options are available for submission of this data:

1] If the sponsor wishes to maintain controlled terminology for the CMINDC field and limit the terminology to the five pre-specified choices, then the free text is placed in SUPPCM.

CMINDCOTHER

QNAMQLABELQVALCMINDOTHOther IndicationBROKEN ARM

2] If the sponsor wishes to maintain controlled terminology for CMINDC but will expand the terminology based on values seen in the specify field, then the value of CMINDC will reflect the sponsor's coding decision and SUPPCM could be used to store the verbatim text.

CMINDCFRACTURE

QNAMQLABELQVALCMINDOTHOther IndicationBROKEN ARM

Note that the sponsor might choose a different value for CMINDC [e.g., "BONE FRACTURE"] depending on the sponsor's coding practice and analysis requirements.

3] If the sponsor does not require that controlled terminology be maintained and wishes for all responses to be stored in a single variable, then CMINDC will be used and SUPPCM is not required.

CMINDCBROKEN ARM

4.2.7.2 "Specify" Values for Result Qualifier Variables

When the CRF includes a list of values for a result field that includes "Other" and the "Other" is supplemented with a "Specify" free text field, then the manner in which the free text "Specify" value is submitted will vary based on the sponsor's coding practice and analysis requirements.

For example, consider a CRF where the sponsor requests the subject's eye color:

Eye Color

  • Brown
  • Black
  • Blue
  • Green
  • Other, specify: ________________

An investigator has selected "OTHER" and specified "BLUEISH GRAY". As in the above discussion for non-result Qualifier values, the sponsor has several options for submission:

1] If the sponsor wishes to maintain controlled terminology in the standard result field and limit the terminology to the five pre-specified choices, then the free text is placed in --ORRES and the controlled terminology in --STRESC.

SCTESTSCORRESSCSTRESCEye ColorBLUEISH GRAYOTHER

2] If the sponsor wishes to maintain controlled terminology in the standard result field, but will expand the terminology based on values seen in the specify field, then the free text is placed in --ORRES and the value of --STRESC will reflect the sponsor's coding decision.

SCTESTSCORRESSCSTRESCEye ColorBLUEISH GRAYGRAY

3] If the sponsor does not require that controlled terminology be maintained, the verbatim value will be copied to --STRESC.

SCTESTSCORRESSCSTRESCEye ColorBLUEISH GRAYBLUEISH GRAY

4.2.7.3 "Specify" Values for Topic Variables

Interventions: If a list of specific treatments is provided along with "Other, Specify", --TRT should be populated with the name of the treatment found in the specified text. If the sponsor wishes to distinguish between the pre-specified list of treatments and those recorded under "Other, Specify," the --PRESP variable could be used. For example:

Indicate which of the following concomitant medications
was used to treat the subject's headaches:

  • Acetaminophen
  • Aspirin
  • Ibuprofen
  • Naproxen
  • Other, specify: ________________

If ibuprofen and diclofenac were reported, the CM dataset would include the following:

CMTRTCMPRESPIBUPROFENYDICLOFENAC

Events: "Other, Specify" for Events may be handled similarly to Interventions. --TERM should be populated with the description of the event found in the specified text and --PRESP could be used to distinguish between prespecified and free text responses.

Findings: "Other, Specify" for tests may be handled similarly to Interventions. --TESTCD and --TEST should be populated with the code and description of the test found in the specified text. If specific tests are not prespecified on the CRF and the investigator has the option of writing in tests, then the name of the test would have to be coded to ensure that all --TESTCD and --TEST values are consistent with the test controlled terminology.

For example, a lab CRF collected values for Hemoglobin, Hematocrit and "Other, specify". The value the investigator wrote for "Other, specify" was "Prothrombin time" with an associated result and units. The sponsor would submit the controlled terminology for this test, i.e., LBTESTCD would be "PT" and LBTEST would be "Prothrombin Time", rather than the verbatim term, "Prothrombin time" supplied by the investigator.

4.2.8 Multiple Values for a Variable

4.2.8.1 Multiple Values for an Intervention or Event Topic Variable

If multiple values are reported for a topic variable [i.e., --TRT in an Interventions general-observation-class dataset or --TERM in an Events general-observation-class dataset], it is expected that the sponsor will split the values into multiple records or otherwise resolve the multiplicity per the sponsor's standard data management procedures. For example, if an adverse event term of "Headache and Nausea" or a concomitant medication of "Tylenol and Benadryl" is reported, sponsors will often split the original report into separate records and/or query the site for clarification. By the time of submission, the datasets should be in conformance with the record structures described in the SDTMIG. Note that the Disposition dataset [DS] is an exception to the general rule of splitting multiple topic values into separate records. For DS, one record for each disposition or protocol milestone is permitted according to the domain structure. For cases of multiple reasons for discontinuation see Section 6.2.3, Disposition, Assumption 5 for additional information.

4.2.8.2 Multiple Values for a Findings Result Variable

If multiple result values [--ORRES] are reported for a test in a Findings class dataset, multiple records should be submitted for that --TESTCD.

For example,

  • EGTESTCD = "SPRTARRY", EGTEST = "Supraventricular Tachyarrhythmias", EGORRES = "ATRIAL FIBRILLATION"
  • EGTESTCD = "SPRTARRY", EGTEST = "Supraventricular Tachyarrhythmias", EGORRES = "ATRIAL FLUTTER"

When a finding can have multiple results, the key structure for the findings dataset must be adequate to distinguish between the multiple results. See Section 4.1.9 Assigning Natural Keys in the Metadata.

4.2.8.3 Multiple Values for a Non-Result Qualifier Variable

The SDTM permits one value for each Qualifier variable per record. If multiple values exist [e.g., due to a "Check all that apply" instruction on a CRF], then the value for the Qualifier variable should be "MULTIPLE" and SUPP-- should be used to store the individual responses. It is recommended that the SUPP-- QNAM value reference the corresponding standard domain variable with an appended number or letter. In some cases, the standard variable name will be shortened to meet the 8-character variable name requirement, or it may be clearer to append a meaningful character string as shown in the second AE example below, where the first three characters of the drug name are appended. Likewise the QLABEL value should be similar to the standard label. The values stored in QVAL should be consistent with the controlled terminology associated with the standard variable. See Section 8.4, Relating Non-Standard Variables Values to a Parent Domain for additional guidance on maintaining appropriately unique QNAM values.

The following example includes selected variables from the ae.xpt and suppae.xpt datasets for a rash whose locations are the face, neck, and chest.

AE Dataset

AETERMAELOCRASHMULTIPLE

SUPPAE Dataset

QNAMQLABELQVALAELOC1Location of the Reaction 1FACEAELOC2Location of the Reaction 2NECKAELOC3Location of the Reaction 3CHEST

In some cases, values for QNAM and QLABEL more specific than those above may be needed.

For example, a sponsor might conduct a study with two study drugs [e.g., open-label study of Abcicin + Xyzamin], and may require the investigator assess causality and describe action taken for each drug for the rash:

AE Dataset

AETERMAERELAEACNRASHMULTIPLEMULTIPLE

SUPPAE Dataset

QNAMQLABELQVALAERELABCCausality of AbcicinPOSSIBLY RELATEDAERELXYZCausality of XyzaminUNLIKELY RELATEDAEACNABCAction Taken with AbcicinDOSE REDUCEDAEACNXYZAction Taken with XyzaminDOSE NOT CHANGED

In each of the above examples, the use of SUPPAE should be documented in the Define-XML document and the annotated CRF. The controlled terminology used should be documented as part of value-level metadata.

If the sponsor has clearly documented that one response is of primary interest [e.g., in the CRF, protocol, or analysis plan], the standard domain variable may be populated with the primary response and SUPP-- may be used to store the secondary response[s].

For example, if Abcicin is designated as the primary study drug in the example above:

AE Dataset

AETERMAERELAEACNRASHPOSSIBLY RELATEDDOSE REDUCED

SUPPAE Dataset

QNAMQLABELQVALAERELXCausality of XyzaminUNLIKELY RELATEDAEACNXAction Taken with XyzaminDOSE NOT CHANGED

Note that in the latter case, the label for standard variables AEREL and AEACN will have no indication that they pertain to Abcicin. This association must be clearly documented in the metadata and annotated CRF.

4.2.9 Variable Lengths

Very large transport files have become an issue for FDA to process. One of the main contributors to the large file sizes has been sponsors using the maximum length of 200 for character variables. To help rectify this situation:

  • The maximum SAS Version 5 character variable length of 200 characters should not be used unless necessary.
  • Sponsors should consider the nature of the data and apply reasonable, appropriate lengths to variables. For example:
    • The length of flags will always be 1.
    • --TESTCD and IDVAR will never be more than 8, so the length can always be set to 8.
    • The length for variables that use controlled terminology can be set to the length of the longest term.

4.3 Coding and Controlled Terminology Assumptions

Examples provided in the column "CDISC Notes" are only examples and not intended to imply controlled terminology. Check current controlled terminology at this link: //www.cancer.gov/cancertopics/cancerlibrary/terminologyresources/cdisc.

4.3.1 Types of Controlled Terminology

As of SDTMIG v3.3, controlled terminology is represented one of the following ways:

  • A single asterisk, "*", when CDISC controlled terminology is not available at the current time, but the SDS Team expects that sponsors may have their own controlled terminology and/or the CDISC Controlled Terminology Team may develop controlled terminology in the future.
  • The single applicable value for the variable DOMAIN, e.g., "PR".
  • The name of a CDISC codelist, represented as a hyperlink in parentheses, e.g., "[NY]".
  • A short reference to an external terminology, such as "MedDRA" or "ISO 3166 Alpha-3".

In addition, the "Controlled Terms, Codelist or Format" column has been used to indicate variables that use an ISO 8601 format.

4.3.2 Controlled Terminology Text Case

Terms from controlled terminology should be in the case that appears the source codelist or code system [e.g., CDISC codelist or external code system such as MedDRA]. See Section 4.2.4 Text Case in Submitted Data

4.3.3 Controlled Terminology Values

The controlled terminology or a reference to the controlled terminology should be included in the Define-XML document file wherever applicable. All values in the permissible value set for the study should be included, whether they are represented in the submitted data or not. Note that a null value should not be included in the permissible value set. A null value is implied for any list of controlled terms unless the variable is "Required" [see Section 4.1.5, SDTM Core Designations].

When a domain or datasetspecification includes a codelist for a variable, not every value in that codelist may have been part of planned data collection; only values that were part of planned data collection should be included in the Define-XML document. For example, --PRESP variables are associated with the NY codelist, but only the value "Y" is allowed in --PRESP variables. Future versions of the Define-XML Specification are expected to include information on representing subsets of controlled terminology.

4.3.4 Use of Controlled Terminology and Arbitrary Number Codes

Controlled terminology or human-readable text should be used instead of arbitrary number codes in order to reduce ambiguity for submission reviewers. For example, CMDECOD would contain human-readable dictionary text rather than a numeric code. Numeric code values may be submitted as Supplemental Qualifiers if necessary.

4.3.5 Storing Controlled Terminology for Synonym Qualifier Variables

  • For events such as AEs and Medical History, populate --DECOD with the dictionary's preferred term and populate --BODSYS with the preferred body system name. If a dictionary is multi-axial, the value in --BODSYS should represent the system organ class [SOC] used for the sponsor's analysis and summary tables, which may not necessarily be the primary SOC. Populate --SOC with the dictionary-derived primary SOC. In cases where the primary SOC was used for analysis, --BODSYS and --SOC are the same.
  • If the MedDRA dictionary was used to code events, the intermediate levels in the MedDRA hierarchy should also be represented in the dataset. A pair of variables has been defined for each of the levels of the hierarchy other than SOC and PT: one to represent the text description and the other to represent the code value associated with it. For example, --LLT should be used to represent the Lowest Level Term text description and --LLTCD should be used to represent the Lowest Level Term code value.
  • For concomitant medications, populate CMDECOD with the drug's generic name and populate CMCLAS with the drug class used for the sponsor's analysis and summary tables. If coding to multiple classes, follow Section 4.2.8.1, Multiple Values for an Intervention or Event Topic Variable, or omit CMCLAS.
  • For concomitant medications, supplemental qualifiers may be used to represent additional coding dictionary information, e.g., a drug's ATC codes from the WHO Drug dictionary [see Section 8.4, Relating Non-Standard Variables Values to a Parent Domain for more information].

The sponsor is expected to provide the dictionary name and version used to map the terms by utilizing the Define-XML external codelist attributes.

4.3.6 Storing Topic Variables for General Domain Models

The topic variable for the Interventions and Events general-observation-class models is often stored as verbatim text. For an Events domain, the topic variable is --TERM. For an Interventions domain, the topic variable is --TRT. For a Findings domain, the topic variable, --TESTCD, should use Controlled Terminology [e.g., "SYSBP" for Systolic Blood Pressure]. If CDISC standard controlled terminology exists, it should be used; otherwise, sponsors should define their own controlled list of terms. If the verbatim topic variable in an Interventions or Event domain is modified to facilitate coding, the modified text is stored in --MODIFY. In most cases [other than PE], the dictionary-coded text is derived into --DECOD. Since the PEORRES variable is modified instead of the topic variable for PE, the dictionary-derived text would be placed in PESTRESC. The variables used in each of the defined domains are:

DomainOriginal VerbatimModified VerbatimStandardized ValueAEAETERMAEMODIFYAEDECODDSDSTERM
DSDECODCMCMTRTCMMODIFYCMDECODMHMHTERMMHMODIFYMHDECODPEPEORRESPEMODIFYPESTRESC

4.3.7 Use of "Yes" and "No" Values

Variables where the response is "Yes" or "No" ["Y" or "N"] should normally be populated for both "Y" and "N" responses. This eliminates confusion regarding whether a blank response indicates "N" or is a missing value. However, some variables are collected or derived in a manner that allows only one response, such as when a single check box indicates "Yes". In situations such as these, where it is unambiguous to populate only the response of interest, it is permissible to populate only one value ["Y" or "N"] and leave the alternate value blank. An example of when it would be acceptable to use only a value of "Y" would be for Last Observation Before Exposure Flag [--LOBXFL] variables, where "N" is not necessary to indicate that a value is not the last observation before exposure.

Note: Permissible values for variables with controlled terms of "Y" or "N" may be extended to include "U" or "NA" if it is the sponsor's practice to explicitly collect or derive values indicating "Unknown" or "Not Applicable" for that variable.

4.4 Actual and Relative Time Assumptions

Timing variables [SDTM Table 2.2.5] are an essential component of all SDTM subject-level domain datasets. In general, all domains based on the three general observation classes should have at least one Timing variable. In the Events or Interventions general observation class, this could be the start date of the event or intervention. In the Findings observation class, where data are usually collected at multiple visits, at least one Timing variable must be used.

The SDTMIG requires dates and times of day to be stored according to the international standard ISO 8601 [//www.iso.org]. ISO 8601 provides a text-based representation of dates and/or times, intervals of time, and durations of time.

4.4.1 Formats for Date/Time Variables

An SDTM DTC variable may include data that is represented in ISO 8601 format as a complete date/time, a partial date/time, or an incomplete date/time.

The SDTMIG template uses ISO 8601 for calendar dates and times of day, which are expressed as follows:

  • YYYY-MM-DDThh:mm:ss[.n+]?[[[+|-]hh:mm]|Z]?

where:

  • [YYYY] = four-digit year
  • [MM] = two-digit representation of the month [01-12, 01=January, etc.]
  • [DD] = two-digit day of the month [01 through 31]
  • [T] = [time designator] indicates time information follows
  • [hh] = two digits of hour [00 through 23] [am/pm is NOT allowed]
  • [mm] = two digits of minute [00 through 59]
  • [ss] = two digits of second [00 through 59]
    The last two components, indicated in the format pattern with a question mark, are optional:
  • [[.n+]?] = optional fractions of seconds
  • [[[[+|-]hh:mm]|Z]?] = optional time zone

Other characters defined for use within the ISO 8601 standard are:

  • [-] [hyphen]: to separate the time Elements "year" from "month" and "month" from "day" and to represent missing date components.
  • [:] [colon]: to separate the time Elements "hour" from "minute" and "minute" from "second"
  • [/] [solidus]: to separate components in the representation of date/time intervals
  • [P] [duration designator]: precedes the components that represent the duration

    Spaces are not allowed in any ISO 8601 representations

Key aspects of the ISO 8601 standard are as follows:

  • ISO 8601 represents dates as a text string using the notation YYYY-MM-DD.
  • ISO 8601 represents times as a text string using the notation hh:mm:ss[.n+]?[[[+|-]hh:mm]|Z]?.
  • The SDTM and SDTMIG require use of the ISO 8601 Extended format, which requires hyphen delimiters for date components and colon delimiters for time components. The ISO 8601 basic format, which does not require delimiters, should not be used in SDTM datasets.
  • When a date is stored with a time in the same variable [as a date/time], the date is written in front of the time and the time is preceded with "T" using the notation YYYY-MM-DDThh:mm:ss [e.g. 2001-12-26T00:00:01].

Implementation of the ISO 8601 standard means that date/time variables are character/text data types. The SDTM fragment employed for date/time character variables is DTC.

4.4.2 Date/Time Precision

The concept of representing date/time precision is handled through use of the ISO 8601 standard. According to ISO 8601, precision [also referred to by ISO 8601 as "completeness" or "representations with reduced accuracy"] can be inferred from the presence or absence of components in the date and/or time values. Missing components are represented by right truncation or a hyphen [for intermediate components that are missing]. If the date and time values are completely missing, the SDTM date field should be null. Every component except year is represented as two digits. Years are represented as four digits; for all other components, one-digit numbers are always padded with a leading zero.

The table below provides examples of ISO 8601 representations of complete and truncated date/time values using ISO 8601 "appropriate right truncations" of incomplete date/time representations. Note that if no time component is represented, the [T] time designator [in addition to the missing time] must be omitted in ISO 8601 representation.


Date and Time as Originally RecordedPrecisionISO 8601 Date/Time1December 15, 2003 13:14:17.123Date/time, including fractional seconds2003-12-15T13:14:17.1232December 15, 2003 13:14:17Date/time to the nearest second2003-12-15T13:14:173December 15, 2003 13:14Unknown seconds2003-12-15T13:144December 15, 2003 13Unknown minutes and seconds2003-12-15T135December 15, 2003Unknown time2003-12-156December, 2003Unknown day and time2003-1272003Unknown month, day, and time2003

This date and date/time model also provides for imprecise or estimated dates, such as those commonly seen in Medical History. To represent these intervals while applying the ISO 8601 standard, it is recommended that the sponsor concatenate the date/time values [using the most complete representation of the date/time known] that describe the beginning and the end of the interval of uncertainty and separate them with a solidus as shown in the table below:


Interval of UncertaintyISO 8601 Date/Time1Between 10:00 and 10:30 on the morning of December 15, 20032003-12-15T10:00/2003-12-15T10:302Between the first of this year [2003] until "now" [February 15, 2003]2003-01-01/2003-02-153Between the first and the tenth of December, 20032003-12-01/2003-12-104Sometime in the first half of 20032003-01-01/2003-06-30

Other uncertainty intervals may be represented by the omission of components of the date when these components are unknown or missing. As mentioned above, ISO 8601 represents missing intermediate components through the use of a hyphen where the missing component would normally be represented. This may be used in addition to "appropriate right truncations" for incomplete date/time representations. When components are omitted, the expected delimiters must still be kept in place and only a single hyphen is to be used to indicate an omitted component. Examples of this method of omitted component representation are shown in the table below:

Date and Time as Originally RecordedLevel of UncertaintyISO 8601 Date/Time1December 15, 2003 13:15:17Date/time to the nearest second2003-12-15T13:15:172December 15, 2003 ??:15Unknown hour with known minutes2003-12-15T-:153December 15, 2003 13:??:17Unknown minutes with known date, hours, and seconds2003-12-15T13:-:174The 15th of some month in 2003, time not collectedUnknown month and time with known year and day2003---155December 15, but can't remember the year, time not collectedUnknown year with known month and day--12-1567:15 of some unknown dateUnknown date with known hour and minute-----T07:15

Note that Row 6 above, where a time is reported with no date information, represents a very unusual situation. Since most data is collected as part of a visit, when only a time appears on a CRF, it is expected that the date of the visit would usually be used as the date of collection.

Using a character-based data type to implement the ISO 8601 date/time standard will ensure that the date/time information will be machine and human readable without the need for further manipulation, and will be platform and software independent.

4.4.3 Intervals of Time and Use of Duration for --DUR Variables

4.4.3.1 Intervals of Time and Use of Duration

As defined by ISO 8601, an interval of time is the part of a time axis, limited by two time "instants" such as the times represented in SDTM by the variables --STDTC and --ENDTC. These variables represent the two instants that bound an interval of time, while the duration is the quantity of time that is equal to the difference between these time points.

ISO 8601 allows an interval to be represented in multiple ways. One representation, shown below, uses two dates in the format:

YYYY-MM-DDThh:mm:ss/YYYY-MM-DDThh:mm:ss

While the above would represent the interval [by providing the start date/time and end date/time to bound the interval of time], it does not provide the value of the duration [the quantity of time].

Duration is frequently used during a review; however, the duration timing variable [--DUR] should generally be used in a domain if it was collected in lieu of a start date/time [--STDTC] and end date/time [--ENDTC]. If both --STDTC and --ENDTC are collected, durations can be calculated by the difference in these two values, and need not be in the submission dataset.

Both duration and duration units can be provided in the single --DUR variable, in accordance with the ISO 8601 standard. The values provided in --DUR should follow one of the following ISO 8601 duration formats:

PnYnMnDTnHnMnS

- or -

PnW

where:

  • [P] [duration designator]: precedes the alphanumeric text string that represents the duration. Note that the use of the character P is based on the historical use of the term "period" for duration.
  • [n] represents a positive number or zero
  • [W] is used as week designator, preceding a data Element that represents the number of calendar weeks within the calendar year [e.g., P6W represents 6 weeks of calendar time].

The letter "P" must precede other values in the ISO 8601 representation of duration. The "n" preceding each letter represents the number of Years, Months, Days, Hours, Minutes, Seconds, or the number of Weeks. As with the date/time format, "T" is used to separate the date components from time components.

Note that weeks cannot be mixed with any other date/time components such as days or months in duration expressions.

As is the case with the date/time representation in --DTC, --STDTC, or --ENDTC, only the components of duration that are known or collected need to be represented. Also, as is the case with the date/time representation, if no time component is represented, the [T] time designator [in addition to the missing time] must be omitted in ISO 8601 representation.

ISO 8601 also allows that the "lowest-order components" of duration being represented may be represented in decimal format. This may be useful if data are collected in formats such as "one and one-half years", "two and one-half weeks", "one-half a week" or "one quarter of an hour" and the sponsor wishes to represent this "precision" [or lack of precision] in ISO 8601 representation. Remember that this is ONLY allowed in the lowest-order [right-most] component in any duration representation.

The table below provides some examples of ISO-8601-compliant representations of durations:

Duration as originally recordedISO 8601 Duration2 YearsP2Y10 weeksP10W3 Months 14 daysP3M14D3 DaysP3D6 Months 17 Days 3 HoursP6M17DT3H14 Days 7 Hours 57 MinutesP14DT7H57M42 Minutes 18 SecondsPT42M18SOne-half hourPT0.5H5 Days 12¼ HoursP5DT12.25H4 ½ WeeksP4.5W

Note that a leading zero is required with decimal values less than one.

4.4.3.2 Interval with Uncertainty

When an interval of time is an amount of time [duration] following an event whose start date/time is recorded [with some level of precision, i.e. when one knows the start date/time and the duration following the start date/time], the correct ISO 8601 usage to represent this interval is as follows:

YYYY-MM-DDThh:mm:ss/PnYnMnDTnHnMnS

where the start date/time is represented before the solidus [/], the "Pn…" following the solidus represents a "duration", and the entire representation is known as an "interval". Note that this is the recommended representation of elapsed time, given a start date/time and the duration elapsed.

When an interval of time is an amount of time [duration] measured prior to an event whose start date/time is recorded [with some level of precision, i.e., where one knows the end date/time and the duration preceding that end date/time], the syntax is:

PnYnMnDTnHnMnS/YYYY-MM-DDThh:mm:ss

where the duration, "Pn…", is represented before the solidus [/], the end date/time is represented following the solidus, and the entire representation is known as an "interval".

4.4.4 Use of the "Study Day" Variables

The permissible Study Day variables [--DY, --STDY, and --ENDY] describe the relative day of the observation starting with the reference date as Day 1. They are determined by comparing the date portion of the respective date/time variables [--DTC, --STDTC, and --ENDTC] to the date portion of the Subject Reference Start Date [RFSTDTC from the Demographics domain].

The Subject Reference Start Date [RFSTDTC] is designated as Study Day 1. The Study Day value is incremented by 1 for each date following RFSTDTC. Dates prior to RFSTDTC are decreased by 1, with the date preceding RFSTDTC designated as Study Day -1 [there is no Study Day 0]. This algorithm for determining Study Day is consistent with how people typically describe sequential days relative to a fixed reference point, but creates problems if used for mathematical calculations because it does not allow for a Day 0. As such, Study Day is not suited for use in subsequent numerical computations, such as calculating duration. The raw date values should be used rather than Study Day in those calculations.

All Study Day values are integers. Thus, to calculate Study Day:

--DY = [date portion of --DTC] - [date portion of RFSTDTC] + 1 if --DTC is on or after RFSTDTC 
--DY = [date portion of --DTC] - [date portion of RFSTDTC] if --DTC precedes RFSTDTC

This algorithm should be used across all domains.

4.4.5 Clinical Encounters and Visits

All domains based on the three general observation classes should have at least one timing variable. For domains in the Events or Interventions observation classes, and for domains in the Findings observation class, for which data are collected only once during the study, the most appropriate timing variable may be a date [e.g., --DTC, --STDTC] or some other timing variable. For studies that are designed with a prospectively defined schedule of visit-based activities, domains for data that are to be collected more than once per subject [e.g., Labs, ECG, Vital Signs] are expected to include VISITNUM as a timing variable.

Clinical encounters are described by the CDISC Visit variables. For planned visits, values of VISIT, VISITNUM, and VISITDY must be those defined in the Trial Visits [TV] dataset [Section 7.3.1, Trial Visits]. For planned visits:

  • Values of VISITNUM are used for sorting and should, wherever possible, match the planned chronological order of visits. Occasionally, a protocol will define a planned visit whose timing is unpredictable [e.g., one planned in response to an adverse event, a threshold test value, or a disease event], and completely chronological values of VISITNUM may not be possible in such a case.
  • There should be a one-to-one relationship between values of VISIT and VISITNUM.
  • For visits that may last more than one calendar day, VISITDY should be the planned day of the start of the visit.

Sponsor practices for populating visit variables for unplanned visits may vary across sponsors.

  • VISITNUM should generally be populated, even for unplanned visits, as it is expected in many Findings domains, as described above. The easiest method of populating VISITNUM for unplanned visits is to assign the same value [e.g., 99] to all unplanned visits, but this method provides no differentiation between the unplanned visits and does not provide chronological sorting. Methods that provide a one-to-one relationship between visits and values of VISITNUM, that are consistent across domains, and that assign VISITNUM values that sort chronologically require more work and must be applied after all of a subject's unplanned visits are known.
  • VISIT may be left null or may be populated with a generic value [e.g., "Unscheduled"] for all unplanned visits, or individual values may be assigned to different unplanned visits.
  • VISITDY must not be populated for unplanned visits, since VISITDY is, by definition, the planned study day of visit, and since the actual study day of an unplanned visit belongs in a --DY variable.

The following table shows an example of how the visit identifiers might be used for lab data:

USUBJIDVISITVISITNUMVISITDYLBDY001Week 1277001Week 231413001Week 2 Unscheduled3.1
17

4.4.6 Representing Additional Study Days

The SDTM allows to represent study days relative to the RFSTDTC reference start date variable in the DM dataset, using variables --DY, as described above in Section 4.4.4, Use of the "Study Day" Variables. The calculation of additional study days within subdivisions of time in a clinical trial may be based on one or more sponsor-defined reference dates not represented by RFSTDTC. In such cases, the sponsor may define Supplemental Qualifier variables and the Define-XML document should reflect the reference dates used to calculate such study days. If the sponsor wishes to define "day within element" or "day within epoch", the reference date/time will be an element start date/time in the Subject Elements [SE] dataset [Section 5.3, Subject Elements].

4.4.7 Use of Relative Timing Variables

--STRF and --ENRF

The variables --STRF and --ENRF represent the timing of an observation relative to the sponsor-defined Study Reference Period, when information such as "BEFORE", "PRIOR", "ONGOING"', or "CONTINUING" is collected in lieu of a date and this collected information is in relation to the sponsor-defined Study Reference Period. The sponsor-defined Study Reference Period is the continuous period of time defined by the discrete starting point, RFSTDTC, and the discrete ending point, RFENDTC, for each subject in the Demographics dataset.

--STRF is used to identify the start of an observation relative to the sponsor-defined Study Reference Period.

--ENRF is used to identify the end of an observation relative to the sponsor-defined Study Reference Period.

Allowable values for --STRF are "BEFORE", "DURING", "DURING/AFTER", "AFTER", and "U" [for unknown]. Although "COINCIDENT" and "ONGOING" are in the STENRF codelist, they describe timing relative to a point in time rather than an interval of time, so are not appropriate for use with --STRF variables. It would be unusual for an event or intervention to be recorded as starting "AFTER" the Study Reference Period, but could be possible, depending on how the Study Reference Period is defined in a particular study.

Allowable values for --ENRF are "BEFORE", "DURING", "DURING/AFTER", "AFTER" and "U" [for unknown]. If --ENRF is used, then --ENRF = "AFTER" means that the event did not end before or during the Study Reference Period. Although "COINCIDENT" and "ONGOING" are in the STENRF codelist, they describe timing relative to a point in time rather than an interval of time, so are not appropriate for use with --ENRF variables.

As an example, a CRF checkbox that identifies concomitant medication use that began prior to the Study Reference Period would translate into CMSTRF = "BEFORE", if selected. Note that in this example, the information collected is with respect to the start of the concomitant medication use only, and therefore the collected data corresponds to variable CMSTRF, not CMENRF. Note also that the information collected is relative to the Study Reference Period, which meets the definition of CMSTRF.

Some sponsors may wish to derive --STRF and --ENRF for analysis or reporting purposes even when dates are collected. Sponsors are cautioned that doing so in conjunction with directly collecting or mapping data such as "BEFORE", "PRIOR", "ONGOING", etc., to --STRF and --ENRF will blur the distinction between collected and derived values within the domain. Sponsors wishing to do such derivations are instead encouraged to use analysis datasets for this derived data.

In general, sponsors are cautioned that representing information using variables --STRF and --ENRF may not be as precise as other methods, particularly because information is often collected relative to a point in time or to a period of time other than the one defined as the Study Reference Period. SDTMIG v3.1.2 attempted to address these limitations by the addition of four new relative timing variables, which are described in the following paragraph. Sponsors should use the set of variables that allows for accurate representation of the collected data. In many cases, this will mean using these new relative timing variables in place of --STRF and --ENRF.

--STRTPT, --STTPT, --ENRTPT, and --ENTPT

While the variables --STRF and --ENRF are useful in the case when relative timing assessments are made coincident with the start and end of the Study Reference Period, these may not be suitable for expressing relative timing assessments such as "Prior" or "Ongoing" that are collected at other times of the study. As a result, four new timing variables were added in v3.1.2 to express a similar concept at any point in time. The variables --STRTPT and --ENRTPT contain values similar to --STRF and --ENRF, but may be anchored with any timing description or date/time value expressed in the respective --STTPT and --ENTPT variables, and are not limited to the Study Reference Period. Unlike the variables --STRF and --ENRF, which for all domains are defined relative to one Study Reference Period, the timing variables --STRTPT, --STTPT, --ENRTPT, and --ENTPT are defined by each sponsor for each study. Allowable values for --STRTPT and --ENRTPT are as follows:

If the reference time point corresponds to the date of collection or assessment:

  • Start values: An observation can start BEFORE that time point, can start COINCIDENT with that time point, or it is unknown [U] when it started.
  • End values: An observation can end BEFORE that time point, can end COINCIDENT with that time point, can be known that it didn't end but was ONGOING, or it is unknown [U] when it ended or if it was ongoing.
  • AFTER is not a valid value in this case because it would represent an event after the date of collection.

If the reference time point is prior to the date of collection or assessment:

  • Start values: An observation can start BEFORE the reference point, can start COINCIDENT with the reference point, can start AFTER the reference point, or it may not be known [U] when it started.
  • End values: An observation can end BEFORE the reference point, can end COINCIDENT with the reference point, can end AFTER the reference point, can be known that it didn't end but was ONGOING, or it is unknown [U] when it ended or if it was ongoing.

Although "DURING" and "DURING/AFTER" are in the STENRF codelist, they describe timing relative to an interval of time rather than a point in time, so are not allowable for use with --STRTPT and --ENRTPT variables.

Examples of --STRTPT, --STTPT, --ENRTPT, and --ENTPT

Example: Medical History

Assumptions:

  • CRF contains "Year Started" and check box for "Active"
  • "Date of Assessment" is collected

Example when "Active" is checked:

  • MHDTC = date of assessment value, e.g., "2006-11-02"
  • MHSTDTC = year of condition start, e.g., "2002"
  • MHENRTPT = "ONGOING"
  • MHENTPT = date of assessment value, e.g., "2006-11-02"

Figure 4.4.7: Example of --ENRTPT and --ENTPT for Medical History

Example: Prior and Concomitant Medications

Assumptions:

  • CRF includes collection of "Start Date" and "Stop Date", and check boxes for
    • "Prior" if start date was before the screening visit and was unknown or uncollected
    • "Continuing" if medication had not stopped as of the final study visit, so no end date was collected

Example when both "Prior" and "Continuing" are checked:

  • CMSTDTC is null
  • CMENDTC is null
  • CMSTRTPT = "BEFORE"
  • CMSTTPT is screening date, e.g., "2006-10-21"
  • CMENRTPT = "ONGOING"
  • CMENTPT is final study visit date, e.g., "2006-11-02"

Example: Adverse Events

Assumptions:

  • CRF contains "Start Date", "Stop Date"
  • Collection of "Outcome" includes check boxes for "Continuing" and "Unknown", to be used, if necessary, at the end of the subject's participation in the trial
  • No assessment date or visit information was collected

Example when "Unknown" is checked:

  • AESTDTC is start date, e.g., "2006-10-01"
  • AEENDTC is null
  • AEENRTPT = "U"
  • AEENTPT is final subject contact date, e.g., "2006-11-02"

4.4.8 Date and Time Reported in a Domain Based on Findings

When the date/time of collection is reported in any domain, the date/time should go into the --DTC field [e.g., EGDTC for Date/Time of ECG]. For any domain based on the Findings general observation class, such as lab tests which are based on a specimen, the collection date is likely to be tied to when the specimen or source of the finding was captured, not necessarily when the data were recorded. In order to ensure that the critical timing information is always represented in the same variable, the --DTC variable is used to represent the time of specimen collection. For example, in the LB domain the LBDTC variable would be used for all single-point blood collections or spot urine collections. For timed lab collections [e.g., 24-hour urine collections] the LBDTC variable would be used for the start date/time of the collection and LBENDTC for the end date/time of the collection. This approach will allow the single-point and interval collections to use the same date/time variables consistently across all datasets for the Findings general observation class. The table below illustrates the proper use of these variables. Note that --STDTC is not used for collection dates over an interval in the Findings general observation class and is therefore blank in the following table.

Collection Type--DTC--STDTC--ENDTCSingle-Point CollectionX

Interval CollectionX
X

4.4.9 Use of Dates as Result Variables

Dates are generally used only as timing variables to describe the timing of an event, intervention, or collection activity, but there may be occasions when it may be preferable to model a date as a result [--ORRES] in a Findings dataset. Note that using a date as a result to a Findings question is unusual and atypical, and should be approached with caution. This situation, however, may occasionally occur when a] a group of questions [each of which has a date response] is asked and analyzed together; or b] the Event[s] and Intervention[s] in question are not medically significant [often the case when included in questionnaires]. Consider the following cases:

  • Calculated due date
  • Date of last day on the job
  • Date of high school graduation

One approach to modeling these data would be to place the text of the question in --TEST and the response to the question, a date represented in ISO 8601 format, in --ORRES and --STRESC, as long as these date results do not contain the dates of medically significant events or interventions.

Again, use extreme caution when storing dates as the results of Findings. Remember, in most cases, these dates should be timing variables associated with a record in an Intervention or Events dataset.

4.4.10 Representing Time Points

Time points can be represented using the time point variables, --TPT, --TPTNUM, --ELTM, and the time point anchors, --TPTREF [text description] and --RFTDTC [the date/time]. Note that time-point data will usually have an associated --DTC value. The interrelationship of these variables is shown in Figure 4.4.10 below.

Figure 4.4.10: Relationships among Time Point Variables

Values for these variables for Vital Signs measurements taken at 30, 60, and 90 minutes after dosing would look like the following.

VSTPTNUMVSTPTVSELTMVSTPTREFVSRFTDTCVSDTC130 MINPT30MDOSE ADMINISTRATION2006-08-01T08:002006-08-01T08:30260 MINPT1HDOSE ADMINISTRATION2006-08-01T08:002006-08-01T09:01390 MINPT1H30MDOSE ADMINISTRATION2006-08-01T08:002006-08-01T09:32

Note that VSELTM is the planned elapsed time, not the actual elapsed time. The actual elapsed time could be derived in an analysis dataset, if desired, as VSDTC-VSRFTDTC.

Values for these variables for Urine Collections taken pre-dose, and from 0-12 hours and 12-24 hours after dosing would look like the following.

LBTPTNUMLBTPTLBELTMLBTPTREFLBRFTDTCLBDTC115 MIN PRE-DOSE-PT15MDOSE ADMINISTRATION2006-08-01T08:002006-08-01T07:4520-12 HOURSPT12HDOSE ADMINISTRATION2006-08-01T08:002006-08-01T20:35312-24 HOURSPT24HDOSE ADMINISTRATION2006-08-01T08:002006-08-02T08:40

Note that the value in LBELTM represents the end of the specimen collection interval.

When time points are used, --TPTNUM is expected. Time points may or may not have an associated --TPTREF. Sometimes, --TPTNUM may be used as a key for multiple values collected for the same test within a visit; as such, there is no dependence upon an anchor such as --TPTREF, but there will be a dependency upon the VISITNUM. In such cases, VISITNUM will be required to confer uniqueness to values of --TPTNUM.

If the protocol describes the scheduling of a dose using a reference intervention or assessment, then --TPTREF should be populated, even if it does not contribute to uniqueness. The fact that time points are related to a reference time point, and what that reference time point is, are important for interpreting the data collected at the time point.

Not all time points will require all three variables to provide uniqueness. In fact, in some cases a time point may be uniquely identified without the use of VISIT, or without the use of --TPTREF, or, without the use of either one. For instance:

  • A trial might have time points only within one visit, so that the contribution of VISITNUM to uniqueness is trivial. [VISITNUM would be populated, but would not contribute to uniqueness.]
  • A trial might have time points that do not relate to any visit, such as time points relative to a dose of drug self-administered by the subject at home. [Visit variables would not be included, but --TPTREF and other time point variables would be populated.]
  • A trial may have only one reference time point per visit, and all reference time points may be similar, so that only one value of --TPTREF [e.g., "DOSE"] is needed. [--TPTREF would be populated, but would not contribute to uniqueness.]
  • A trial may have time points not related to a reference time point. For instance, --TPTNUM values could be used to distinguish first, second, and third repeats of a measurement scheduled without any relationship to dosing. [--TPTREF and --ELTM would not be included.] In this case, where the protocol calls for repeated measurements but does not specify timing of the measurements, the --REPNUM variable could be used instead of time point variables.

For trials with many time points, the requirement to provide uniqueness using only VISITNUM, --TPTREF, and --TPTNUM may lead to a scheme where multiple natural keys are combined into the values of one of these variables.

For instance, in a crossover trial with multiple doses on multiple days within each period, either of the following options could be used. VISITNUM might be used to designate period, --TPTREF might be used to designate the day and the dose, and --TPTNUM might be used to designate the timing relative to the reference time point. Alternatively, VISITNUM might be used to designate period and day within period, --TPTREF might be used to designate the dose within the day, and --TPTNUM might be used to designate the timing relative to the reference time point.

Option 1

VISITVISITNUM--TPT--TPTNUM--TPTREFPERIOD 13PRE-DOSE1DAY 1, AM DOSE1H24H3PRE-DOSE1DAY 1, PM DOSE1H24H3PRE-DOSE1DAY 5, AM DOSE1H24H3PRE-DOSE1DAY 5, PM DOSE1H24H3PERIOD 24PRE-DOSE1DAY 1, AM DOSE1H24H3PRE-DOSE1DAY 1, PM DOSE1H24H3

Option 2

VISITVISITNUM--TPT--TPTNUM--TPTREFPERIOD 1, DAY 13PRE-DOSE1AM DOSE1H24H3PRE-DOSE1PM DOSE1H24H3PERIOD 1, DAY 54PRE-DOSE1AM DOSE1H24H3PRE-DOSE1PM DOSE1H24H3PERIOD 2, DAY 15PRE-DOSE1AM DOSE1H24H3PRE-DOSE1PM DOSE1H24H3

Within the context that defines uniqueness for a time point, which may include domain, visit, and reference time point, there must be a one-to-relationship between values of --TPT and --TPTNUM. In other words, if domain, visit, and reference time point uniquely identify subject data, then if two subjects have records with the same values of DOMAIN, VISITNUM, --TPTREF, and --TPTNUM, then these records may not have different time point descriptions in --TPT.

Within the context that defines uniqueness for a time point, there is likely to be a one-to-one relationship between most values of --TPT and --ELTM. However, since --ELTM can only be populated with ISO 8601 periods of time [as described in Section 4.4.3, Intervals of Time and Use of Duration for --DUR Variables], --ELTM may not be populated for all time points. For example, --ELTM is likely to be null for time points described by text such as "pre-dose" or "before breakfast". When --ELTM is populated, if two subjects have records with the same values of DOMAIN, VISITNUM, --TPTREF, and --TPTNUM, then these records may not have different values in --ELTM.

When the protocol describes a time point with text such as "4-6 hours after dose" or "12 hours +/- 2 hours after dose" the sponsor may choose whether and how to populate --ELTM. For example, a time point described as "4-6 hours after dose" might be associated with an --ELTM value of PT4H. A time point described as "12 hours +/- 2 hours after dose" might be associated with an --ELTM value of PT12H. Conventions for populating --ELTM should be consistent [the examples just given would probably not both be used in the same trial]. It would be good practice to indicate the range of intended timings by some convention in the values used to populate --TPT.

Sponsors may, of course, use more stringent requirements for populating --TPTNUM, --TPT, and --ELTM. For instance, a sponsor could decide that all time points with a particular --ELTM value would have the same values of --TPTNUM, and --TPT, across all visits, reference time points, and domains.

4.4.11 Disease Milestones and Disease Milestone Timing Variables

A "disease milestone" is an event or activity that can be anticipated in the course of a disease, but whose timing is not controlled by the study schedule. A disease milestone may be something that occurred pre-study, but which represents a time at which data would have been collected, such as diagnosis of the disease under study. A disease milestone may also be something which is anticipated to occur during a study and which, if it occurs, triggers the collection of related data outside the regular schedule of visits, such as an adverse event of interest. The types of Disease Milestones for a study are defined in the study-level Trial Disease Milestones [TM] dataset [Section 7.3.3, Trial Disease Milestones]. The times at which disease milestones occurred for a particular subject are summarized in the special purpose Subject Disease Milestones [SM] domain [Section 5.4, Subject Disease Milestones], a domain similar in structure to the Subject Visits [SV] and Subject Elements [SE] domains.

Not all studies will have disease milestones. If a study does not have disease milestones, the TM and SM domains will not be present and the disease milestones timing variables may not be included in other domains.

Disease Milestone Naming

Instances of disease milestones are given names at a subject level. The name of a disease milestone is composed of a character string that depends on the disease milestone type [MIDSTYPE in TM and SM] and, if the type of disease milestone is one that may occur multiple times, a chronological sequence number for this disease milestone among other instances of the same type for the subject. The character string used in the name of a disease milestone is usually a short form of the disease milestone type. For example, if the type of disease milestone was "EPISODE OF DISEASE UNDER STUDY", the values of MIDS for instances of this type of event could include "EPISODE1", "EPISODE2", etc, or "EPISODE01", "EPISODE02", etc. The association between the longer text in MIDSTYPE and the shorter text in MIDS can be seen in SM, which includes both variables.

Disease Milestones Name [MIDS]

If something that has been defined as a disease milestone for a particular study occurred for a particular subject, it is represented as usual, in the appropriate findings, intervention, or events class record. In addition this record will include the MIDS timing variable, populated with the name of the disease milestone. The timing of a disease milestone is also represented in the special purpose SM domain.

The record that represents a disease milestone does not include values for the timing variables RELMIDS and MIDSDTC, which are used to represent the timing of other observations relative to a disease milestone. The usual timing variables in the record for a disease milestone [e.g., --DTC, --STDTC, --ENDTC] provide the needed timing for this observation and for the timing information represented in the SM domain.

Timing Relative to a Disease Milestone [MIDS, RELMIDS, MIDSDTC]

For an observation triggered by the occurrence of a disease milestone, the relationship of the observation to the disease milestone can be represented using the disease milestones timing variables MIDS, RELMIDS, and MIDSDTC to describe the timing of the observation.

  • MIDS is populated with the name of a disease milestone for this subject. MIDS is the "anchor" for describing the timing of the observation relative to the disease milestone. In this sense, its function is similar to --TPTREF for time points.
  • RELMIDS is usually populated with a textual description of the temporal relationship between the observation and the disease milestone named in MIDS. Controlled vocabulary has not yet been developed for RELMIDS, but is likely to include terms such as "IMMEDIATELY BEFORE", "AT START OF", "DURING", "AT END OF", and "SHORTLY AFTER". It is similar to --ELTM, except that --ELTM is represented ISO 8601 duration.
  • MIDSDTC is populated with the date/time of the disease milestone. This is the --DTC for a finding, or the --STDTC for an event or intervention, and is the date recorded in SMSTDTC in the SM domain. Its function is similar to --RFTDTC for time points.

In some cases, data collected in conjunction with a disease milestone does not include the collection of a separate date for the related observation. This is particularly common for pre-study disease milestones, but may occur with on-study disease milestones as well. In such cases, MIDSDTC provides a related date/time in records that would not otherwise contain any date. In records that do contain date/time[s] of the observation, MIDSDTC allows easy comparison of the date[s] of the observation to the [start] date of the disease milestone. In such cases, it functions much like the reference time point date/time [--RFTDTC] in observations at time points.

When a disease milestone is an event or intervention, some data triggered by the disease milestone may be modeled as Findings About the disease milestone [i.e., FAOBJ is the disease milestone]. In such cases, RELMIDS should be used to describe the temporal relationship between the Disease Milestone and the subject of the question being asked in the finding, rather than as describing when the question was asked.

  • When the subject of the question is the disease milestone itself, RELMIDS may be populated with a value such as "ENTIRE EVENT" or "ENTIRE TREATMENT."
  • When the subject of the question is a question about the occurrence of some activity or event related to the disease milestone, RELMIDS acts like an evaluation interval, describing the period of time over which the question is focused.
    • For questions about a possible cause of an event or about the indication for a treatment, RELMIDS would have a value such as "WEEK PRIOR" or "IMMEDIATELY BEFORE", or even just "BEFORE".
    • RELMIDS would be "DURING" for questions about things that may have occurred while an event or intervention disease milestone was in progress.
    • For sequelae of a disease milestone, RELMIDS would have a value such as "AT DISCHARGE" or "WEEK AFTER" or simply "AFTER".

Use of Disease Milestone Timing Variables with other Timing Variables

The disease milestone timing variables provide timing relative to an activity or event that has been identified, for the particular study, as a disease milestone. Their use does not preclude the use of variables that collect actual date/times or timing relative to the study schedule.

  • The use of actual date/times is unaffected. The Disease Milestone Timing variables may provide timing information in cases where actual date/times are unavailable, particularly for pre-study disease milestones. When the question text for an observation references a disease milestone, but a separate date for the observation is not collected, the disease milestone timing variables should be populated but the actual date/s should not be imputed by populating them with the date of the disease milestone. Examples of such questions: Disease stage at initial diagnosis of disease under study; Treatment for most recent disease episode.
  • Study-day variables should be populated wherever complete actual date/times are populated. This includes negative study days for pre-study observations.
  • The timing variables EPOCH and TAETORD [Planned Order of Element within Arm] may be populated for on-study observations associated with disease milestones. However, pre-study disease milestones, those which occur before the start of study participation when informed consent is obtained, by definition, do not have an associated EPOCH or TAETORD.
  • Visit variables are expected in many findings domains, but findings triggered by the occurrence of a study milestone may not occur at a scheduled visit.
    • Findings associated with pre-study disease milestones are often collected at a screening visit, although the test was not performed at that visit.
    • For findings associated with on-study disease milestones but not conducted at a scheduled visit, practices for populating VISITNUM as for an unscheduled visit should be followed.
  • The use of time-point variables with disease milestone variables may occur in cases where a disease milestone triggers treatment, and time points relative to treatment are part of the study schedule. For instance, a migraine trial may call for assessments of symptom severity at prescribed times after treatment of the migraine. If the migraine episodes were treated as disease milestones, then the disease milestone timing variables might be populated in the exposure and symptom-severity records. If the study planned to treat multiple migraine episodes, the MIDS variable would provide a convenient way to determine the episode with which data were associated.
  • An evaluation interval variable [--EVLINT or --EVLTXT] could be used in conjunction with disease milestone variables. For instance, patient-reported outcome instruments might be administered at the time of a disease milestone, and the questions in the instrument might include an evaluation interval.
  • The timing variables for start and end of an event or intervention relative to the study reference period [--STRF and --ENRF] or relative to a reference time point [--STRTPT and --STTPT, --ENRTPT and --ENTPT] could be used in conjunction with disease milestone variables. For example, a concomitant medication could be collected in association with a disease milestone, so that the disease milestone timing variables were populated, but relative timing variables could be used for the start or end of the concomitant medication.
  • The timing variables for start and end of a planned assessment interval might be populated for an assessment triggered by a disease milestone, if applicable. For example, the occurrence of a particular event might trigger both a treatment and Holter monitoring for 24 hours after the treatment.

Linking and Disease Milestones

When disease milestones have been defined for a study, the MIDS variable serves to link observations associated with a disease milestone in a way similar to the way that VISITNUM links observations collected at a visit. If disease milestones were not defined for the study, it would be possible to link records associated with a disease milestone using RELREC, but the use of disease milestones has certain advantages:

  • RELREC indicates that there is a relationship between records or datasets, but not the nature of the relationship. Records with the same MIDS value are related to the same disease milestone.
  • When disease milestones are defined, it is not necessary to create RELREC records to establish relationships between observations associated with a disease milestone.

4.5 Other Assumptions

4.5.1 Original and Standardized Results of Findings and Tests Not Done

4.5.1.1 Original and Standardized Results

The --ORRES variable contains the result of the measurement or finding as originally received or collected. --ORRES is an expected variable and should always be populated, with two exceptions:

  • When --STAT = "NOT DONE" since there is no result for such a record
  • When --DRVFL = "Y" since the distinction between an original result and a standard result is not applicable for records for which --DRVFL = "Y".

Note that records for which --DRVFL = "Y" may combine data collected at more than one visit. In such a case the sponsor must define the value for VISITNUM, addressing the correct temporal sequence. If a new record is derived for a dataset, and the source is not eDT, then that new record should be flagged as derived.

For example, in ECG data, if a corrected QT interval value derived in-house by the sponsor were represented in an SDTM record, then EGDRVFL would be "Y". If a corrected QT interval value was received from a vendor or was produced by the ECG machine, the derived flag would be null.

When --ORRES is populated, --STRESC must also be populated, regardless of whether the data values are character or numeric. The variable, --STRESC, is populated either by the conversion of values in --ORRES to values with standard units, or by the assignment of the value of --ORRES [as in the PE Domain, where --STRESC could contain a dictionary-derived term]. A further step is necessary when --STRESC contains numeric values. These are converted to numeric type and written to --STRESN. Because --STRESC may contain a mixture of numeric and character values, --STRESN may contain null values, as shown in the flowchart below.

--ORRES
[all original values]→--STRESC
[derive or copy all results]→--STRESN
[numeric results only]

When the original measurement or finding is a selection from a defined codelist, in general, the --ORRES and --STRESC variables contain results in decoded format, that is, the textual interpretation of whichever code was selected from the codelist. In some cases where the code values in the codelist are statistically meaningful standardized values or scores, which are defined by sponsors or by valid methodologies such as SF36 questionnaires, the --ORRES variables will contain the decoded format, whereas, the --STRESC variables as well as the --STRESN variables will contain the standardized values or scores.

Occasionally data that are intended to be numeric are collected with characters attached that cause the character-to-numeric conversion to fail. For example, numeric cell counts in the source data may be specified with a greater than [>] or less than [10,000 or ] or less than [] or less than [300Papules○ 0 ○ 1 to 25 ○ 26 to 100 ○ 101 to 200 ○ 201 to 300 ○ >300Vesicles○ 0 ○ 1 to 25 ○ 26 to 100 ○ 101 to 200 ○ 201 to 300 ○ >300Pustules○ 0 ○ 1 to 25 ○ 26 to 100 ○ 101 to 200 ○ 201 to 300 ○ >300Scabs○ 0 ○ 1 to 25 ○ 26 to 100 ○ 101 to 200 ○ 201 to 300 ○ >300Scars○ 0 ○ 1 to 25 ○ 26 to 100 ○ 101 to 200 ○ 201 to 300 ○ >300

The collected data meet the following Findings About criteria: 1] Data that do not describe an Event or Intervention as a whole and 2] Data ["about" an Event or Intervention] that have Qualifiers of their own that can be represented in Findings variables [e.g., units, method].

In this mock scenario, the rash event is considered a reportable AE; therefore the form design collects a reference number to the AE form where the event is captured. Data points collected on the Rash Assessment form can be represented in the Findings About domain and related to the AE via RELREC. Note that in the mock datasets below, the AE started on May 10, 2007, and the rash assessment was conducted on May 12 and May 19, 2007.

Certain Required or Expected variables have been omitted in consideration of space and clarity.

ae.xpt

RowSTUDYIDDOMAINUSUBJIDAESEQAESPIDAETERM…AEBODSYS…AELOCAELATAESEVAESERAEACNAESTDTC…1XYZAEXYZ-789478695Injection site rash…General disorders and administration site conditions…ARMLEFTMILDNNOT APPLICABLE2007-05-10…

fa.xpt

RowSTUDYIDDOMAINUSUBJIDFASEQFASPIDFATESTCDFATESTFAOBJFAORRESFAORRESUFASTRESCFASTRESUVISITNUMEPOCHFADTC1XYZFAXYZ-7891234515DIAMDiameterInjection Site Rash2.5IN2.5IN3TREATMENT2007-05-122XYZFAXYZ-7891234525COUNTCountMacules26 to 100
26 to 100
3TREATMENT2007-05-123XYZFAXYZ-7891234535COUNTCountPapules1 to 25
1 to 25
3TREATMENT2007-05-124XYZFAXYZ-7891234545COUNTCountVesicles0
0
3TREATMENT2007-05-125XYZFAXYZ-7891234555COUNTCountPustules0
0
3TREATMENT2007-05-126XYZFAXYZ-7891234565COUNTCountScabs0
0
3TREATMENT2007-05-127XYZFAXYZ-7891234575COUNTCountScars0
0
3TREATMENT2007-05-128XYZFAXYZ-7891234595DIAMDiameterInjection Site Rash1IN1IN4TREATMENT2007-05-199XYZFAXYZ-7891234605COUNTCountMacules1 to 25
1 to 25
4TREATMENT2007-05-1910XYZFAXYZ-7891234615COUNTCountPapules1 to 25
1 to 25
4TREATMENT2007-05-1911XYZFAXYZ-7891234625COUNTCountVesicles0
0
4TREATMENT2007-05-1912XYZFAXYZ-7891234635COUNTCountPustules0
0
4TREATMENT2007-05-1913XYZFAXYZ-7891234645COUNTCountScabs0
0
4TREATMENT2007-05-1914XYZFAXYZ-7891234655COUNTCountScars0
0
4TREATMENT2007-05-19

relrec.xpt

RowSTUDYIDRDOMAINUSUBJIDIDVARIDVARVALRELTYPERELID1XYZAE
AESPID
ONE232XYZFA
FASPID
MANY23

Example

The form below collects information about rheumatoid arthritis. In this mock scenario, rheumatoid arthritis is a prerequisite for participation in an osteoporosis trial and was not collected as a Medical History event.

Rheumatoid Arthritis HistoryDate of AssessmentDD-MMM-YYYYDuring the past 6 months, how would you rate the following:Joint stiffness○ MILD ○ MODERATE ○ SEVEREInflammation○ MILD ○ MODERATE ○ SEVEREJoint swelling○ MILD ○ MODERATE ○ SEVEREJoint pain [arthralgia]○ MILD ○ MODERATE ○ SEVEREMalaise○ MILD ○ MODERATE ○ SEVEREDuration of early morning stiffness [hours and minutes]_____Hours _____Minutes

The collected data meet the following Findings About criteria: Data ["about" an Event or Intervention] for which no Event or Intervention record has been collected or created. In this mock scenario, the rheumatoid arthritis history was assessed on August 13, 2006.

fa.xpt

RowSTUDYIDDOMAINUSUBJIDFASEQFATESTCDFATESTFAOBJFACATFAORRESFASTRESCFADTCFAEVLINT1ABCFAABC-1231SEVSeverity/IntensityJoint StiffnessRHEUMATOID ARTHRITIS HISTORYSEVERESEVERE2006-08-13-P6M2ABCFAABC-1232SEVSeverity/IntensityInflammationRHEUMATOID ARTHRITIS HISTORYMODERATEMODERATE2006-08-13-P6M3ABCFAABC-1233SEVSeverity/IntensityJoint SwellingRHEUMATOID ARTHRITIS HISTORYMODERATEMODERATE2006-08-13-P6M4ABCFAABC-1234SEVSeverity/IntensityArthralgiaRHEUMATOID ARTHRITIS HISTORYMODERATEMODERATE2006-08-13-P6M5ABCFAABC-1235SEVSeverity/IntensityMalaiseRHEUMATOID ARTHRITIS HISTORYMILDMILD2006-08-13-P6M6ABCFAABC-1236DURDurationEarly Morning StiffnessRHEUMATOID ARTHRITIS HISTORYPT1H30MPT1H30M2006-08-13-P6M

Example

In this example, details about bone-fracture events are collected. This form is designed to collect multiple entries of fracture information, including an initial entry for the most recent fracture prior to study participation, as well as entry of information for fractures that occur during the study.

Bone Fracture AssessmentComplete form for most recent fracture prior to study participation.Enter Fracture Event Reference Number for all
fractures occurring during study participation:_____How did fracture occur?○ Pathologic
○ Fall
○ Other trauma
○ UnknownWhat was the outcome?

○ Normal Healing
○ Complications

Select all that apply:

□ Complication x
□ Complication y
□ Complication z

Additional therapeutic measures required

○ No
○ Unknown
○ Yes

Select all that apply

□ Therapeutic measure a
□ Therapeutic measure b
□ Therapeutic measure c

The collected data meet the following Findings About criteria: [1] Data ["about" an Event or Intervention] that indicate the occurrence of related symptoms or therapies and [2] Data ["about" an event/intervention] for which no Event or Intervention record has been collected or created.

Determining when data further describe the parent event record either as Variable Qualifiers or Supplemental Qualifiers may be dependent on data collection design. In the above form, responses are provided for the most recent fracture, but an event record reference number was not collected. For in-study fracture events, a reference number is collected, which would allow representing the responses as part of the Event record either as Supplemental Qualifiers and/or variables like --OUT and --CONTRT.

The below domains reflect responses to each Bone Fracture Assessment question. The historical-fracture responses that are without a parent record are represented in the FA domain, while the current-fracture responses are represented as Event records with Supplemental Qualifiers.

Historical Fractures Having No Event Records

fa.xpt

RowSTUDYIDDOMAINUSUBJIDFASEQFASPIDFATESTCDFATESTFAOBJFACATFAORRESFADTC1ABCFAABC -US-701-0021798654REASReasonBone FractureBONE FRACTURE ASSESSMENT - HISTORYFALL2006-04-102ABCFAABC -US-701-0022798654OUTOutcomeBone FractureBONE FRACTURE ASSESSMENT - HISTORYCOMPLICATIONS2006-04-103ABCFAABC -US-701-0023798654OCCUROccurrenceComplicationsBONE FRACTURE ASSESSMENTY2006-04-104ABCFAABC -US-701-0024798654OCCUROccurrenceTherapeutic MeasureBONE FRACTURE ASSESSMENTY2006-04-10

suppfa.xpt

RowSTUDYIDRDOMAINUSUBJIDIDVARIDVARVALQNAMQLABELQVALQORIGQEVAL1ABCFAABC -US-701-002FASEQ1FATYPFA TypeMOST RECENTCRF
2ABCFAABC -US-701-002FASEQ2FATYPFA TypeMOST RECENTCRF
3ABCFAABC -US-701-002FASEQ3FATYPFA TypeMOST RECENTCRF
4ABCFAABC -US-701-002FASEQ4FATYPFA TypeMOST RECENTCRF

Current Fractures Having Event Records

ce.xpt

RowSTUDYIDDOMAINUSUBJIDCESEQCESPIDCETERMCELOCCEOUTCECONTRTCESTDTC1ABCCEABC -US-701-00211FractureARMNORMAL HEALINGY2006-07-032ABCCEABC -US-701-00222FractureLEGCOMPLICATIONSN2006-10-15

suppce.xpt

RowSTUDYIDRDOMAINUSUBJIDIDVARIDVARVALQNAMQLABELQVALQORIGQEVAL1ABCCEABC -US-701-002CESPID1REASReasonFALLCRF
2ABCCEABC -US-701-002CESPID2REASReasonOTHER TRAUMACRF
3ABCCEABC -US-701-002CESPID2OUTOutcomeCOMPLICATIONSCRF

Example

In this example, three AEs are pre-specified and are scheduled to be asked at each visit. If the occurrence is "Yes", then a complete AE record is collected on the AE form.

Pre-Specified Adverse Events of Clinical InterestDate of AssessmentDD-MMM-YYYYDid the following occur? If Yes, then enter a complete record in the AE CRF

Headache

Respiratory infection

Nausea

○ No ○ Yes ○ Not Done

○ No ○ Yes ○ Not Done

○ No ○ Yes ○ Not Done

The collected data meet the following Findings About criteria: Data that indicate the occurrence of pre-specified adverse events.

In this mock scenario, each response to the pre-specified terms is represented in the Findings About domain. For the "Y" responses, an AE record is represented in the AE domain with its respective Qualifiers and timing details. In the example below, the AE of "Headache" encompasses multiple pre-specified "Y" responses and the AE of "Nausea", collected on October 10, was reported that to have occurred and started on October 8 and ended on October 9. Note that in the example below, no relationship was collected to link the "Yes" responses with the AE entries; therefore, no RELREC was created.

fa.xpt

RowSTUDYIDDOMAINUSUBJIDFASEQFATESTCDFATESTFAOBJFAORRESFASTRESCFASTATVISITNUMVISITEPOCHFADTC1QRSFA12341OCCUROccurrenceHeadacheYY
2VISIT 2TREATMENT2005-10-012QRSFA12342OCCUROccurrenceRespiratory InfectionNN
2VISIT 2TREATMENT2005-10-013QRSFA12343OCCUROccurrenceNausea

NOT DONE2VISIT 2TREATMENT2005-10-014QRSFA12344OCCUROccurrenceHeadacheYY
3VISIT 3TREATMENT2005-10-105QRSFA12345OCCUROccurrenceRespiratory InfectionNN
3VISIT 3TREATMENT2005-10-106QRSFA12346OCCUROccurrenceNauseaYY
3VISIT 3TREATMENT2005-10-10

ae.xpt

RowSTUDYIDDOMAINUSUBJIDAESEQAETERM…AEDECOD…AEPRESPAEBODSYS…AESEV…AEACNEPOCHAESTDTCAEENDTC1QRSAE12341Headache…Headache…YNervous system disorders…MILD…NONETREATMENT2005-09-30
2QRSAE12342Nausea…Nausea…YGastrointestinal disorders…MODERATE…NONETREATMENT2005-10-082005-10-09

Example

In this example, the following CRF is used to capture data about pre-specified symptoms of the disease under study on a daily basis. The date of the assessment is captured, but start and end timing of the events are not.

SYMPTOMSINVESTIGATOR GERD SYMPTOM MEASUREMENT
VOLUME [mL]NUMBER OF EPISODESMAXIMUM SEVERITY
None, Mild, Moderate, SevereVomiting


Diarrhea


Nausea


The collected data meet the following Findings About criteria: 1] data that do not describe an Event or Intervention as a whole, and 2] data ["about" an Event or Intervention] having Qualifiers that can be represented in Findings variables [e.g., units, method].

The data below represent data from two visits for one subject. Records occur in blocks of three for Vomit, and in blocks of two for Diarrhea and Nausea.

Rows 1-3:Show the results for the Vomiting tests at Visit 1.Rows 4-5:Show the results for the Diarrhea tests at Visit 1.Rows 6-7:Show the results for the Nausea tests at Visit 1.Rows 8-10:Show the results for the Vomiting tests at Visit 2. These indicate that Vomiting was absent at Visit 2.Rows 11-12:Show the results for the Diarrhea tests at Visit 2.Rows 13-14:Indicate that Nausea was not assessed at Visit 2.

fa.xpt

RowSTUDYIDDOMAINUSUBJIDFASEQFATESTCDFATESTFAOBJFACATFAORRESFAORRESUFASTRESCFASTRESUFASTATVISITNUMVISITFADTC1XYZFAXYZ-701-0021VOLVolumeVomitGERD250mL250mL
1VISIT 12006-02-022XYZFAXYZ-701-0022NUMEPISDNumber of EpisodesVomitGERD>10
>10

1VISIT 12006-02-023XYZFAXYZ-701-0023SEVSeverity/IntensityVomitGERDSEVERE
SEVERE

1VISIT 12006-02-024XYZFAXYZ-701-0024NUMEPISDNumber of EpisodesDiarrheaGERD2
2

1VISIT 12006-02-025XYZFAXYZ-701-0025SEVSeverity/IntensityDiarrheaGERDSEVERE
SEVERE

1VISIT 12006-02-026XYZFAXYZ-701-0026NUMEPISDNumber of EpisodesNauseaGERD1
1

1VISIT 12006-02-027XYZFAXYZ-701-0027SEVSeverity/IntensityNauseaGERDMODERATE
MODERATE

1VISIT 12006-02-028XYZFAXYZ-701-0028VOLVolumeVomitGERD0mL0mL
2VISIT 22006-02-039XYZFAXYZ-701-0029NUMEPISDNumber of EpisodesVomitGERD0
0

2VISIT 22006-02-0310XYZFAXYZ-701-00210SEVSeverity/IntensityVomitGERDNONE
NONE

2VISIT 22006-02-0311XYZFAXYZ-701-00211NUMEPISDNumber of EpisodesDiarrheaGERD1
1

2VISIT 22006-02-0312XYZFAXYZ-701-00212SEVSeverity/IntensityDiarrheaGERDSEVERE
SEVERE

2VISIT 22006-02-0313XYZFAXYZ-701-00213NUMEPISDNumber of EpisodesNauseaGERD



NOT DONE2VISIT 22006-02-0314XYZFAXYZ-701-00214SEVSeverity/IntensityNauseaGERD



NOT DONE2VISIT 22006-02-03

Example

This example is similar to the one above except that with the following CRF, which includes a separate column to collect the occurrence of symptoms, measurements are collected only for symptoms that occurred. There is a record for the occurrence test for each symptom. If Vomiting occurs, there are three additional records; for each occurrence of Diarrhea or Nausea, there are two additional records. Whether there are adverse event records related to these symptoms depends on agreements in place for the study about whether these symptoms are considered reportable adverse events.

SYMPTOMS
INVESTIGATOR GERD SYMPTOM MEASUREMENT [IF SYMPTOM OCCURRED]
OCCURRED?
Yes/NoVOLUME [mL]NUMBER OF EPISODESMAXIMUM SEVERITY
Mild, Moderate, SevereVomiting



Diarrhea



Nausea



The collected data meet the following Findings About criteria: 1] data that do not describe an Event or Intervention as a whole; 2] data ["about" an Event or Intervention] having Qualifiers that can be represented in Findings variables [e.g., units, method]; and 3] data ["about" an Event or Intervention] that indicate the occurrence of related symptoms or therapies.

The data below represent two visits for one subject.

Rows 1-4:Show the results for the Vomiting tests at Visit 1.Rows 5-7:Show the results for the Diarrhea tests at Visit 1.Rows 8-10:Show the results for the Nausea tests at Visit 1.Row 11:Shows that Vomiting was absent at Visit 2.Rows 12-14:Show the results for the Diarrhea tests at Visit 2.Row 15:Shows that Nausea was not assessed at Visit 2.

fa.xpt

RowSTUDYIDDOMAINUSUBJIDFASEQFATESTCDFATESTFAOBJFACATFAORRESFAORRESUFASTRESCFASTRESUFASTATVISITNUMEPOCHFADTC1XYZFAXYZ-701-0021OCCUROccurrenceVomitGERDY
Y

1SCREENING2006-02-022XYZFAXYZ-701-0022VOLVolumeVomitGERD250mL250mL
1SCREENING2006-02-023XYZFAXYZ-701-0023NUMEPISDNumber of EpisodesVomitGERD>10
>10

1SCREENING2006-02-024XYZFAXYZ-701-0024SEVSeverity/IntensityVomitGERDSEVERE
SEVERE

1SCREENING2006-02-025XYZFAXYZ-701-0025OCCUROccurrenceDiarrheaGERDY
Y

1SCREENING2006-02-026XYZFAXYZ-701-0026NUMEPISDNumber of EpisodesDiarrheaGERD2
2

1SCREENING2006-02-027XYZFAXYZ-701-0027SEVSeverity/IntensityDiarrheaGERDSEVERE
SEVERE

1SCREENING2006-02-028XYZFAXYZ-701-0028OCCUROccurrenceNauseaGERDY
Y

1SCREENING2006-02-029XYZFAXYZ-701-0029NUMEPISDNumber of EpisodesNauseaGERD1
1

1SCREENING2006-02-0210XYZFAXYZ-701-00210SEVSeverity/IntensityNauseaGERDMODERATE
MODERATE

1SCREENING2006-02-0211XYZFAXYZ-701-00211OCCUROccurrenceVomitGERDN
N

2TREATMENT2006-02-0312XYZFAXYZ-701-00212OCCUROccurrenceDiarrheaGERDY
Y

2TREATMENT2006-02-0313XYZFAXYZ-701-00213NUMEPISDNumber of EpisodesDiarrheaGERD1
1

2TREATMENT2006-02-0314XYZFAXYZ-701-00214SEVSeverity/IntensityDiarrheaGERDSEVERE
SEVERE

2TREATMENT2006-02-0315XYZFAXYZ-701-00215OCCUROccurrenceNauseaGERD



NOT DONE2TREATMENT2006-02-03

Example

The adverse event module collects, instead of a single assessment of severity, assessments of severity at each visit, as follows:

At each visit, record severity of the Adverse Event.Visit123456Severity





The collected data meet the following Findings About criteria: data that do not describe an Event or Intervention as a whole.

Row 1:Shows the record for a verbatim term of "Morning queasiness", for which the maximum severity over the course of the event was "Moderate".Row 2:Shows the record for a verbatim term of "Watery stools", for which "Mild" severity was collected at Visits 2 and 3 before the event ended.

ae.xpt

RowDOMAINUSUBJIDAESEQAETERM…AEDECOD…AESEV…AESTDTCAEENDTC1AE1231Morning queasiness…Nausea…MODERATE…2006-02-012006-02-232AE1232Watery stools…Diarrhea…MILD…2006-02-012006-02-15

Rows 1-4:Show severity data collected at the four visits that occurred between the start and end of the AE, "Morning queasiness". FAOBJ = "NAUSEA", which is the value of AEDECOD in the associated AE record.Rows 5-6:Show severity data collected at the two visits that occurred between the start and end of the AE, "Watery stools." FAOBJ = "DIARRHEA", which is the value of AEDECOD in the associated AE record.

fa.xpt

RowSTUDYIDDOMAINUSUBJIDFASEQFATESTCDFATESTFAOBJFAORRESVISITNUMVISITFADTC1XYZFAXYZ-US-701-0021SEVSeverity/IntensityNauseaMILD2VISIT 22006-02-022XYZFAXYZ-US-701-0022SEVSeverity/IntensityNauseaMODERATE3VISIT 32006-02-093XYZFAXYZ-US-701-0023SEVSeverity/IntensityNauseaMODERATE4VISIT 42006-02-164XYZFAXYZ-US-701-0024SEVSeverity/IntensityNauseaMILD5VISIT 52006-02-235XYZFAXYZ-US-701-0025SEVSeverity/IntensityDiarrheaMILD2VISIT 22006-02-026XYZFAXYZ-US-701-0026SEVSeverity/IntensityDiarrheaMILD3VISIT 32006-02-09

RELREC dataset

Depending on how the relationships were collected, in this example, RELREC could be created with either two or six RELIDs. With two RELIDs, the sponsor is describing that the severity ratings are related to the AE as well as being related to each other. With six RELIDs, the sponsor is describing that the severity ratings are related to the AE only [and not to each other].

Example with two RELIDs:

relrec.xpt

RowSTUDYIDRDOMAINUSUBJIDIDVARIDVARVALRELTYPERELID1ABCAEXYZ-US-701-002AESEQ1
12ABCFAXYZ-US-701-002FASEQ1
13ABCFAXYZ-US-701-002FASEQ2
14ABCFAXYZ-US-701-002FASEQ3
15ABCFAXYZ-US-701-002FASEQ4
16ABCAEXYZ-US-701-002AESEQ2
27ABCFAXYZ-US-701-002FASEQ5
28ABCFAXYZ-US-701-002FASEQ6
2

Example with six RELIDs:

relrec.xpt

RowSTUDYIDRDOMAINUSUBJIDIDVARIDVARVALRELTYPERELID1ABCAEXYZ-US-701-002AESEQ1
12ABCFAXYZ-US-701-002FASEQ1
13ABCAEXYZ-US-701-002AESEQ1
24ABCFAXYZ-US-701-002FASEQ2
25ABCAEXYZ-US-701-002AESEQ1
36ABCFAXYZ-US-701-002FASEQ3
37ABCAEXYZ-US-701-002AESEQ1
48ABCFAXYZ-US-701-002FASEQ4
49ABCAEXYZ-US-701-002AESEQ2
510ABCFAXYZ-US-701-002FASEQ5
511ABCAEXYZ-US-701-002AESEQ2
612ABCFAXYZ-US-701-002FASEQ6
6

6.4.5 Skin Response

SR – Description/Overview

A findings about domain for submitting dermal responses to antigens.

SR – Specification

sr.xpt, Skin Response — Findings, Version 3.3. One record per finding, per object, per time point, per visit per subject, Tabulation.

Variable NameVariable LabelTypeControlled Terms, Codelist or Format1RoleCDISC NotesCoreSTUDYIDStudy IdentifierChar
IdentifierUnique identifier for a study.ReqDOMAINDomain AbbreviationCharSRIdentifierTwo-character abbreviation for the domain.ReqUSUBJIDUnique Subject IdentifierChar
IdentifierIdentifier used to uniquely identify a subject across submission.ReqSRSEQSequence NumberNum
IdentifierSequence number given to ensure uniqueness of subject records within a domain. May be any valid number.ReqSRGRPIDGroup IDChar
IdentifierUsed to tie together a block of related records in a single domain for a subject.PermSRREFIDReference IDChar
IdentifierInternal or external specimen identifier. Example: "Specimen ID".PermSRSPIDSponsor-Defined IdentifierChar
IdentifierSponsor-defined identifier.PermSRTESTCDSkin Response Test or Exam Short NameChar[SRTESTCD]TopicShort name of the measurement, test, or examination described in SRTEST. It can be used as a column name when converting a dataset from a vertical to a horizontal format. The value in SRTESTCD cannot be longer than 8 characters, nor can it start with a number [e.g., "1TEST" is not valid]. SRTESTCD cannot contain characters other than letters, numbers, or underscores.ReqSRTESTSkin Response Test or Examination NameChar[SRTEST]Synonym QualifierVerbatim name of the test or examination used to obtain the measurement or finding. The value in SRTEST cannot be longer than 40 characters. Example: "Wheal Diameter".ReqSROBJObject of the ObservationChar
Record QualifierUsed to describe the object or focal point of the findings observation that is represented by --TEST. Examples: the dose of the immunogenic material or the allergen associated with the response [e.g., "Johnson Grass IgE 0.15 BAU mL"].ReqSRCATCategory for TestChar
Grouping QualifierUsed to define a category of Topic-variable values across subjects.PermSRSCATSubcategory for TestChar
Grouping QualifierA further categorization of SRCAT values.PermSRORRESResults or Findings in Original UnitsChar
Result QualifierResults of measurement or finding as originally received or collected.ExpSRORRESUOriginal UnitsChar[UNIT]Variable QualifierOriginal units in which the data were collected. The unit for SRORRES. Example: "mm".ExpSRSTRESCCharacter Result/Finding in Std FormatChar
Result QualifierContains the result value for all findings, copied or derived from SRORRES in a standard format or in standard units. SRSTRESC should store all results or findings in character format; if results are numeric, they should also be stored in numeric format in SRSTRESN.ExpSRSTRESNNumeric Results/Findings in Std. UnitsNum
Result QualifierUsed for continuous or numeric results or findings in standard format; copied in numeric format from SRSTRESC. SRSTRESN should store all numeric test results or findings.ExpSRSTRESUStandard UnitsChar[UNIT]Variable QualifierStandardized units used for SRSTRESC and SRSTRESN, Example: "mm".ExpSRSTATCompletion StatusChar[ND]Record QualifierUsed to indicate exam not done. Should be null if a result exists in SRORRES.PermSRREASNDReason Not DoneChar
Record QualifierDescribes why a measurement or test was not performed. Used in conjunction with SRSTAT when value is "NOT DONE".PermSRNAMVendor NameChar
Record QualifierName or identifier of the laboratory or vendor who provided the test results.PermSRSPECSpecimen TypeChar[SPECTYPE]Record QualifierDefines the types of specimen used for a measurement. Example: "SKIN".PermSRLOCLocation Used for MeasurementChar[LOC]Record QualifierLocation relevant to the collection of the measurement.PermSRLATLateralityChar[LAT]Variable QualifierQualifier for anatomical location further detailing laterality of intervention administration. Examples: "RIGHT", "LEFT", "BILATERAL".PermSRMETHODMethod of Test or ExaminationChar[METHOD]Record QualifierMethod of test or examination. Examples: "ELISA", "EIA", "MICRONEUTRALIZATION ASSAY", "PRNT" [Plaque Reduction Neutralization Tests].PermSRLOBXFLLast Observation Before Exposure FlagChar[NY]Record QualifierOperationally-derived indicator used to identify the last non-missing value prior to RFXSTDTC. The value should be "Y" or null.PermSRBLFLBaseline FlagChar[NY]Record QualifierIndicator used to identify a baseline value. The value should be "Y" or null. Note that SRBLFL is retained for backward compatibility. The authoritative baseline flag for statistical analysis is in an ADaM dataset.PermSREVALEvaluatorChar[EVAL]Record QualifierRole of person who provided evaluation. Used only for results that are subjective [e.g., assigned by a person or a group]. Should be null for records that contain collected or derived data. Examples: "INVESTIGATOR", "ADJUDICATION COMMITTEE", "VENDOR".PermVISITNUMVisit NumberNum
Timing

  1. Clinical encounter number.
  2. Numeric version of VISIT, used for sorting.
ExpVISITVisit NameChar
Timing
  1. Protocol-defined description of clinical encounter.
  2. May be used in addition to VISITNUM and/or VISITDY.
PermVISITDYPlanned Study Day of VisitNum
TimingPlanned study day of the visit based upon RFSTDTC in Demographics.PermTAETORDPlanned Order of Element within ArmNum
TimingNumber that gives the planned order of the Element within the Arm.PermEPOCHEpochChar[EPOCH]TimingEpoch associated with the date/time of the observation. Examples: "SCREENING", "TREATMENT", and "FOLLOW-UP".PermSRDTCDate/Time of CollectionCharISO 8601TimingCollection date and time of an observation represented in ISO 8601.ExpSRDYStudy Day of Visit/Collection/ExamNum
TimingActual study day of visit/collection/exam expressed in integer days relative to sponsor- defined RFSTDTC in Demographics.PermSRTPTPlanned Time Point NameChar
Timing
  1. Text description of time when measurement should be taken.
  2. This may be represented as an elapsed time relative to a fixed reference point, such as time of last dose. See SRTPTNUM and SRTPTREF. Examples: "Start", "5 min post".
PermSRTPTNUMPlanned Time Point NumberNum
TimingNumerical version of SRTPT to aid in sorting.PermSRELTMPlanned Elapsed Time from Time Point RefCharISO 8601TimingPlanned elapsed time [in ISO 8601] relative to a fixed time point reference [SRTPTREF]. Not a clock time or a date time variable. Represented as an ISO 8601 duration. Examples: "-PT15M" to represent the period of 15 minutes prior to the reference point indicated by EGTPTREF, or "PT8H" to represent the period of 8 hours after the reference point indicated by SRTPTREF.PermSRTPTREFTime Point ReferenceChar
TimingName of the fixed reference point referred to by SRELTM, SRTPTNUM, and SRTPT. Example: "INTRADERMAL INJECTION".PermSRRFTDTCDate/Time of Reference Time PointCharISO 8601TimingDate/time of the reference time point, SRTPTREF.Perm

¹ In this column, * indicates the variable may be subject to controlled terminology, and CDISC/NCI codelist code values are enclosed in [parenthesis].

SR – Assumptions

  1. The Skin Response [SR] domain is used to represent findings about an intervention, but it has its own domain code, SR, rather than the domain code FA.
  2. This domain is intended for tests of the immune response to substances that are intended to provoke such a response, such as allergens used in allergy testing. It is not intended for other injection site reactions including reactogencity events that may follow a vaccine administration.
  3. Because a subject is typically exposed to many test materials at the same time, SROBJ is needed to represent the test material for each response record. The method of assessment could be a skin-prick test, a skin-scratch test, or other method of introducing the challenge substance into the skin.
  4. Any Identifier variables, Timing variables, or Findings general observation class qualifiers may be added to the SR domain, but the following qualifiers would not generally be used in SR: --POS, --BODSYS, --ORNRLO, --ORNRHI, --STNRLO, --STNRHI, --STRNC, --NRIND, --RESCAT, --XFN, --LOINC, --SPCCND, --FAST, --TOX, --TOXGR, --SEV.

SR – Examples

Example

In this example, the subject is dosed with increasing concentrations of Johnson Grass IgE.

Rows 1-4:Show responses associated with the administration of a Histamine Control.Rows 5-8:Show responses associated with the administration of Johnson Grass IgE. These records describe the dose response to different concentrations of Johnson Grass IgE antigen, as reflected in SROBJ.

All rows show a specific location on the BACK [e.g., QUADRANT1]. Since Quandrant1, Quandrant2, etc., are not currently part of the SDTM terminology, the sponsor has decided to include this information in the SRSUBLOC SUPPQUAL variable.

sr.xpt

RowSTUDYIDDOMAINUSUBJIDSRSEQSRTESTCDSRTESTSROBJSRORRESSRORRESUSRSTRESCSRSTRESNSRSTRESUSRLOCVISITNUMVISIT1SPI-001SRSPI-001-110351FLRMDIAMFlare Mean DiameterHistamine Control 10 mg/mL5mm55mmBACK1VISIT 12SPI-001SRSPI-001-110352FLRMDIAMFlare Mean DiameterHistamine Control 10 mg/mL4mm44mmBACK1VISIT 13SPI-001SRSPI-001-110353FLRMDIAMFlare Mean DiameterHistamine Control 10 mg/mL5mm55mmBACK1VISIT 14SPI-001SRSPI-001-110354FLRMDIAMFlare Mean DiameterHistamine Control 10 mg/mL5mm55mmBACK1VISIT 15SPI-001SRSPI-001-110355FLRMDIAMFlare Mean DiameterJohnson Grass 0.05 BAU/mL10mm1010mmBACK1VISIT 16SPI-001SRSPI-001-110356FLRMDIAMFlare Mean DiameterJohnson Grass 0.10 BAU/mL11mm1111mmBACK1VISIT 17SPI-001SRSPI-001-110357FLRMDIAMFlare Mean DiameterJohnson Grass 0.15 BAU mL20mm2020mmBACK1VISIT 18SPI-001SRSPI-001-110358FLRMDIAMFlare Mean DiameterJohnson Grass 0.20 BAU/mL30mm3030mmBACK1VISIT 1

suppsr.xpt

RowSTUDYIDRDOMAINUSUBJIDIDVARIDVARVALQNAMQLABELQVALQORIG1SPI-001SRSPI-001-11035SRSEQ1SRSUBLOCAnatomical Sub-LocationQUADRANT1CRF2SPI-001SRSPI-001-11035SRSEQ2SRSUBLOCAnatomical Sub-LocationQUADRANT2CRF3SPI-001SRSPI-001-11035SRSEQ3SRSUBLOCAnatomical Sub-LocationQUADRANT3CRF4SPI-001SRSPI-001-11035SRSEQ4SRSUBLOCAnatomical Sub-LocationQUADRANT4CRF5SPI-001SRSPI-001-11035SRSEQ5SRSUBLOCAnatomical Sub-LocationQUADRANT1CRF6SPI-001SRSPI-001-11035SRSEQ6SRSUBLOCAnatomical Sub-LocationQUADRANT2CRF7SPI-001SRSPI-001-11035SRSEQ7SRSUBLOCAnatomical Sub-LocationQUADRANT3CRF8SPI-001SRSPI-001-11035SRSEQ8SRSUBLOCAnatomical Sub-LocationQUADRANT4CRF

Example

In this example, the study product dose, Dog Epi IgG, was administered at increasing concentrations. The size of the wheal is being measured [reaction to Dog Epi IgG ] to evaluate the efficacy of the Dog Epi IgG extract versus a Negative Control [NC] and a Positive Control [PC] in the testing of allergenic extracts. While SROBJ is populated with information about the substance administered, full details regarding the study product would be submitted in the EX dataset. The relationship between the SR records and the EX records would be represented using RELREC.

Rows 1-6:Show the response [description and reaction grade] to the study product at a series of different dose levels, the latter reflected in SROBJ. The descriptions of SRORRES values are correlated to a grade and the grade values are stored in SRSTRESC.Rows 7-12:Show the results of wheal diameter measurements in response to the study product at a series of different dose levels.

sr.xpt

RowSTUDYIDDOMAINUSUBJIDSRSEQSRSPIDSRTESTCDSRTESTSROBJSRORRESSRORRESUSRSTRESCSRSTRESNSRSTRESUSRLOCVISITNUMVISIT1CC-001SRCC-001-10111RCTGRDEReaction GradeDog Epi 0 mgNEGATIVE
NEGATIVE

FOREARM1WEEK 12CC-001SRCC-001-10122RCTGRDEReaction GradeDog Epi 0.1 mgNEGATIVE
NEGATIVE

FOREARM1WEEK 13CC-001SRCC-001-10133RCTGRDEReaction GradeDog Epi 0.5 mgERYTHEMA, INFILTRATION, POSSIBLY DISCRETE PAPULES
1+

FOREARM1WEEK 14CC-001SRCC-001-10144RCTGRDEReaction GradeDog Epi 1 mgERYTHEMA, INFILTRATION, PAPULES, VESICLES
2+

FOREARM1WEEK 15CC-001SRCC-001-10155RCTGRDEReaction GradeDog Epi 1.5 mgERYTHEMA, INFILTRATION, PAPULES, VESICLES
2+

FOREARM1WEEK 16CC-001SRCC-001-10166RCTGRDEReaction GradeDog Epi 2 mgERYTHEMA, INFILTRATION, PAPULES, COALESCING VESICLES
3+

FOREARM1WEEK 17CC-001SRCC-001-10177FLRMDIAMFlare Mean DiameterDog Epi 0 mg5mm55mmFOREARM1WEEK 18CC-001SRCC-001-10188FLRMDIAMFlare Mean DiameterDog Epi 0.1 mg10mm1010mmFOREARM1WEEK 19CC-001SRCC-001-10199FLRMDIAMFlare Mean DiameterDog Epi 0.5 mg22mm2222mmFOREARM1WEEK 110CC-001SRCC-001-1011010FLRMDIAMFlare Mean DiameterDog Epi 1 mg100mm100100mmFOREARM1WEEK 111CC-001SRCC-001-1011111FLRMDIAMFlare Mean DiameterDog Epi 1.5 mg1mm11mmFOREARM1WEEK 112CC-001SRCC-001-1011212FLRMDIAMFlare Mean DiameterDog Epi 2 mg8mm88mmFOREARM1WEEK 1

ex.xpt

RowSTUDYIDDOMAINUSUBJIDEXSPIDEXTRTEXDOSEEXDOSEUEXROUTEEXLOC1CC-001EX1011Dog Epi IgG0mgCUTANEOUSFOREARM2CC-001EX1012Dog Epi IgG0.1mgCUTANEOUSFOREARM3CC-001EX1013Dog Epi IgG0.5mgCUTANEOUSFOREARM4CC-001EX1014Dog Epi IgG1mgCUTANEOUSFOREARM5CC-001EX1015Dog Epi IgG1.5mgCUTANEOUSFOREARM6CC-001EX1016Dog Epi IgG2mgCUTANEOUSFOREARM

The relationships between SR and EX records are represented at the record level in RELREC.

relrec.xpt

RowSTUDYIDRDOMAINUSUBJIDIDVARIDVARVALRELTYPERELID1CC-001SRCC-001-101SRSPID1
R12CC-001SRCC-001-101SRSPID7
R13CC-001EXCC-001-101EXSPID1
R14CC-001SRCC-001-101SRSPID2
R25CC-001SRCC-001-101SRSPID8
R26CC-001EXCC-001-101EXSPID2
R27CC-001SRCC-001-101SRSPID3
R38CC-001SRCC-001-101SRSPID9
R39CC-001EXCC-001-101EXSPID3
R310CC-001SRCC-001-101SRSPID4
R411CC-001SRCC-001-101SRSPID10
R412CC-001EXCC-001-101EXSPID4
R413CC-001SRCC-001-101SRSPID5
R514CC-001SRCC-001-101SRSPID11
R515CC-001EXCC-001-101EXSPID5
R516CC-001SRCC-001-101SRSPID6
R617CC-001SRCC-001-101SRSPID12
R618CC-001EXCC-001-101EXSPID6
R6

Example

This example shows the results from a tuberculin PPD skin tests administered using the Mantoux technique. The subject was given an intradermal injection of standard tuberculin purified protein derivative [PPD-S] in the left forearm at Visit 1 [See Procedure Agents record below]. At Visit 2, the induration diameter and presence of blistering were recorded. Because the tuberculin PPD skin test cannot be interpreted using the induration diameter and blistering alone [e.g., risk for being infected with TB must also be considered], the interpretation of the skin test resides in its own row. The time point variables show that the planned time for reading the test was 48 hours after Mantoux administration. However, a comparison of datetime values in SRDTC and SRRFTDTC shows that in this case the test was read at 53 hours and 56 minutes after Mantoux administration.

Row 1:Shows the diameter in millimeters of the induration after receiving an intradermal injection of 0.1 mL containing 5TU of PPD-S in the left forearm.Row 2:Shows the presence of blistering at the tuberculin PPD skin test site.Row 3:Shows the interpretation of the tuberculin PPD skin test. SRGRPID is used to tie together the results to the interpretation.

sr.xpt

RowSTUDYIDDOMAINUSUBJIDSRSEQSRGRPIDSRTESTCDSRTESTSROBJSRORRESSRORRESUSRSTRESCSRSTRESNSRSTRESUSRLOCSRLATSRMETHODVISITNUMVISITEPOCHSRDTCSRTPTSRELTMSRTPTREFSRRFTDTC1ABCSRABC-00111INDURDIAInduration DiameterTuberculin PPD-S16mm1616mmFOREARMLEFTRULER2VISIT 2OPEN LABEL TREATMENT2011-01-19T14:08:2448 HPT48HMANTOUX ADMINISTRATION2011-01-17T08:30:002ABCSRABC-00121BLISTERBlisteringTuberculin PPD-SY
Y

FOREARMLEFT
2VISIT 2OPEN LABEL TREATMENT2011-01-19T14:08:2448 HPT48HMANTOUX ADMINISTRATION2011-01-17T08:30:003ABCSRABC-00131INTPInterpretationTuberculin PPD-SPOSITIVE
POSITIVE




2VISIT 2OPEN LABEL TREATMENT2011-01-19T14:08:2448 HPT48HMANTOUX ADMINISTRATION2011-01-17T08:30:00

The Tuberculin PPD skin test administration was represented in the AG domain.

ag.xpt

RowSTUDYIDDOMAINUSUBJIDAGSEQAGTRTAGDOSEAGDOSUAGVAMTAGVAMTUVISITNUMVISITEPOCHAGSTDTC1ABCAGABC-0011Tuberculin PPD-S5TU0.1mL1VISIT 1OPEN LABEL TREATMENT2011-01-17T08:30:00

Relationships between SR and AG records were shown in RELREC.

relrec.xpt

RowSTUDYIDRDOMAINUSUBJIDIDVARIDVARVALRELTYPERELID1ABCSRABC-001SRGRPID1
R12ABCAGABC-001AGSEQ1
R1

7 Trial Design Model Datasets

The table below describes how Trial Design datasets are grouped in this document.

Section 7 Organization
[SDTMIG v3.3]Content7.1 Introduction to Trial DesignAn overview of the Trial Design purpose, concepts, and content.7.2 Experimental Design

Trial Design datasets that describe the planned design of the study and provide the representation of study treatment in its most granular components:

Trial Arms [TA]

A trial design domain that contains each planned arm in the trial. An arm is described as an ordered sequence of elements.

Trial Elements [TE]

A trial design domain that contains the element code that is unique for each element, the element description, and the rules for starting and ending an element.

7.3 Schedule for Assessments

Trial Design datasets that describe the protocol-defined planned schedule of subject encounters at the healthcare facility where the study is being conducted:

Trial Visits [TV]

A trial design domain that contains the planned order and number of visits in the study within each arm.

Trial Disease Assessments [TD]

A trial design domain that provides information on the protocol-specified disease assessment schedule, to be used for comparison with the actual occurrence of the efficacy assessments in order to determine whether there was good compliance with the schedule. TD describes the planned schedule of disease assessments, when that schedule is not necessarily visit based.

Trial Disease Milestones [TM]

A trial design domain that is used to describe disease milestones, which are observations or activities anticipated to occur in the course of the disease under study, and which trigger the collection of data.

7.4 Trial Summary and Eligibility

Trial Design datasets that describe the characteristics of the trial:

Trial Inclusion/Exclusion Criteria [TI]

A trial design domain that contains one record for each of the inclusion and exclusion criteria for the trial. This domain is not subject oriented.

Trial Summary [TS]

A trial design domain that contains one record for each trial summary characteristic. This domain is not subject oriented.

7.5 How to Model the Design of a Clinical TrialA short guidance for how to develop the Trial Design datasets for any study.

7.1 Introduction to Trial Design Model Datasets

7.1.1 Purpose of Trial Design Model

ICH E3, Guidance for Industry, Structure and Content of Clinical Study Reports [available at //www.ich.org/products/guidelines/efficacy/article/efficacy-guidelines.html], Section 9.1, calls for a brief, clear description of the overall plan and design of the study, and supplies examples of charts and diagrams for this purpose in Annex IIIa and Annex IIIb. Each Annex corresponds to an example trial, and each shows a diagram describing the study design and a table showing the schedule of assessments. The Trial Design Model provides a standardized way to describe those aspects of the planned conduct of a clinical trial shown in the study design diagrams of these examples. The standard Trial Design Datasets will allow reviewers to:

  • Clearly and quickly grasp the design of a clinical trial
  • Compare the designs of different trials
  • Search a data warehouse for clinical trials with certain features
  • Compare planned and actual treatments and visits for subjects in a clinical trial

Modeling a clinical trial in this standardized way requires the explicit statement of certain decision rules that may not be addressed or may be vague or ambiguous in the usual prose protocol document. Prospective modeling of the design of a clinical trial should lead to a clearer, better protocol. Retrospective modeling of the design of a clinical trial should ensure a clear description of how the trial protocol was interpreted by the sponsor.

7.1.2 Definitions of Trial Design Concepts

A clinical trial is a scientific experiment involving human subjects, which is intended to address certain scientific questions [the objectives of the trial]. [See the CDISC Glossary, //www.cdisc.org/standards/semantics/glossary, for more complete definitions of clinical trial and objective.]

ConceptDefinitionTrial DesignThe design of a clinical trial is a plan for what will be done to subjects, and what data will be collected about them, in the course of the trial, to address the trial's objectives.EpochAs part of the design of a trial, the planned period of subjects' participation in the trial is divided into Epochs. Each Epoch is a period of time that serves a purpose in the trial as a whole. That purpose will be at the level of the primary objectives of the trial. Typically, the purpose of an Epoch will be to expose subjects to a treatment, or to prepare for such a treatment period [e.g., determine subject eligibility, washout previous treatments] or to gather data on subjects after a treatment has ended. Note that at this high level, a "treatment" is a treatment strategy, which may be simple [e.g., exposure to a single drug at a single dose] or complex. Complex treatment strategies could involve tapering through several doses, titrating dose according to clinical criteria, complex regimens involving multiple drugs, or strategies that involve adding or dropping drugs according to clinical criteria.ArmAn Arm is a planned path through the trial. This path covers the entire time of the trial. The group of subjects assigned to a planned path is also often colloquially called an Arm. The group of subjects assigned to an Arm is also often called a treatment group, and in this sense, an Arm is equivalent to a treatment group.Study CellSince the trial as a whole is divided into Epochs, each planned path through the trial [i.e., each Arm] is divided into pieces, one for each Epoch. Each of these pieces is called a Study Cell. Thus, there is a Study Cell for each combination of Arm and Epoch. Each Study Cell represents an implementation of the purpose of its associated Epoch. For an Epoch whose purpose is to expose subjects to treatment, each Study Cell associated with the Epoch has an associated treatment strategy. For example, a three-Arm parallel trial might have a Treatment Epoch whose purpose is to expose subjects to one of three study treatments: placebo, investigational product, or active control. There would be three Study Cells associated with the Treatment Epoch, one for each Arm. Each of these Study Cells exposes the subject to one of the three study treatments. Another example involving more complex treatment strategies: A trial compares the effects of cycles of chemotherapy drug A given alone or in combination with drug B, where drug B is given as a pre-treatment to each cycle of drug A.ElementAn Element is a basic building block in the trial design. It involves administering a planned intervention, which may be treatment or no treatment, during a period of time. Elements for which the planned intervention is "no treatment" would include Elements for screening, washout, and follow-up.Study Cells and ElementsMany, perhaps most, clinical trials involve a single, simple administration of a planned intervention within a Study Cell, but for some trials, the treatment strategy associated with a Study Cell may involve a complex series of administrations of treatment. It may be important to track the component steps in a treatment strategy both operationally and because secondary objectives and safety analyses require that data be grouped by the treatment step during which it was collected. The steps within a treatment strategy may involve different doses of drug, different drugs, or different kinds of care, as in pre-operative, operative, and post-operative periods surrounding surgery. When the treatment strategy for a Study Cell is simple, the Study Cell will contain a single Element, and for many purposes there is little value in distinguishing between the Study Cell and the Element. However, when the treatment strategy for a Study Cell consists of a complex series of treatments, a Study Cell can contain multiple Elements. There may be a fixed sequence of Elements, or a repeating cycle of Elements, or some other complex pattern. In these cases, the distinction between a Study Cell and an Element is very useful.BranchIn a trial with multiple Arms, the protocol plans for each subject to be assigned to one Arm. The time within the trial at which this assignment takes place is the point at which the Arm paths of the trial diverge, and so is called a branch point. For many trials, the assignment to an Arm happens all at one time, so the trial has one branch point. For other trials, there may be two or more branches that collectively assign a subject to an Arm. The process that makes this assignment may be a randomization, but it need not be.TreatmentsThe word "treatment" may be used in connection with Epochs, Study Cells, or Elements, but has somewhat different meanings in each context:

  • Since Epochs cut across Arms, an "Epoch treatment" is at a high level that does not specify anything that differs between Arms. For example, in a three-period crossover study of three doses of Drug X, each treatment Epoch is associated with Drug X, but not with a specific dose.
  • A "Study Cell treatment" is specific to a particular Arm. For example, a parallel trial might have Study Cell treatments Placebo and Drug X, without any additional detail [e.g., dose, frequency, route of administration] being specified. A Study Cell treatment is at a relatively high level, the level at which treatments might be planned in an early conceptual draft of the trial, or in the title or objectives of the trial.
  • An "Element treatment" may be fairly detailed. For example, for an Element representing a cycle of chemotherapy, Element treatment might specify 5 daily 100 mg doses of Drug X.

The distinctions between these levels are not rigid, and depend on the objectives of the trial. For example, route is generally a detail of dosing, but in a bioequivalence trial that compared IV and oral administration of Drug X, route is clearly part of Study Cell treatment.

VisitA clinical encounter. The notion of a Visit derives from trials with outpatients, where subjects interact with the investigator during Visits to the investigator's clinical site. However, the term is used in other trials, where a trial Visit may not correspond to a physical Visit. For example, in a trial with inpatients, time may be subdivided into Visits, even though subjects are in hospital throughout the trial. For example, data for a screening Visit may be collected over the course of more than one physical visit. One of the main purposes of Visits is the performance of assessments, but not all assessments need take place at clinic Visits; some assessments may be performed by means of telephone contacts, electronic devices or call-in systems. The protocol should specify what contacts are considered Visits and how they are defined.

7.1.3 Current and Future Contents of the Trial Design Model

The datasets currently included in the Trial Design Model:

  • Trial Arms: describes the sequences of Elements in each Epoch for each Arm, and thus describes the complete sequence of Elements in each Arm.
  • Trial Elements: describes the Elements used in the trial.
  • Trial Visits: describes the planned schedule of Visits.
  • Trial Disease Assessment: provides information on the protocol-specified disease assessment schedule, and will be used for comparison with the actual occurrence of the efficacy assessments in order to determine whether there was good compliance with the schedule.
  • Trial Disease Milestones: describes observations or activities identified for the trial which are anticipated to occur in the course of the disease under study and which trigger the collection of data.
  • Trial Inclusion/Exclusion Criteria: describes the inclusion/exclusion criteria used to screen subjects.
  • Trial Summary: lists key facts [parameters] about the trial that are likely to appear in a registry of clinical trials.

The Trial Inclusion/Exclusion Criteria [TI] dataset is discussed in Section 7.4.1, Trial Inclusion/Exclusion Criteria. The Inclusion/Exclusion Criteria Not Met [IE] domain described in Section 6.3.4, Inclusion/Exclusion Criteria Not Met, contains the actual exceptions to those criteria for enrolled subjects.

The current Trial Design Model has limitations in representing protocols, which include the following:

  • Plans for indefinite numbers of repeating Elements [e.g., indefinite numbers of chemotherapy cycles]
  • Indefinite numbers of Visits [e.g., periodic follow-up Visits for survival]
  • Indefinite numbers of Epochs
  • Indefinite numbers of Arms

The last two situations arise in dose-escalation studies where increasing doses are given until stopping criteria are met. Some dose-escalation studies enroll a new cohort of subjects for each new dose, and so, at the planning stage, have an indefinite number of Arms. Other dose-escalation studies give new doses to a continuing group of subjects, and so are planned with an indefinite number of Epochs.

There may also be limitations in representing other patterns of Elements within a Study Cell that are more complex than a simple sequence. For the purpose of submissions about trials that have already completed, these limitations are not critical, so it is expected that development of the Trial Design Model to address these limitations will have a minimal impact on SDTM.

7.2 Experimental Design [TA and TE]

This subsection contains the Trial Design datasets that describe the planned design of the study, and provide the representation of study treatment in its most granular components [Section 7.2.2, Trial Elements [TE]] as well as the representation of all sequences of these components [Section 7.2.1, Trial Arms [TA]] as specified by the study protocol.

The TA and TE datasets are interrelated, and they provide the building block for the development of the subject-level treatment information [see Sections 5.2, Demographics [DM], and 5.3, Subject Elements [SE], for the subject's actual study treatment information].

7.2.1 Trial Arms

TA – Description/Overview

A trial design domain that contains each planned arm in the trial.

This section contains:

  • The Trial Arms dataset and assumptions
  • A series of example trials, which illustrate the development of the Trial Arms dataset
  • Advice on various issues in the development of the Trial Arms dataset
  • A recap of the Trial Arms dataset and the function of its variables

TA – Specification

ta.xpt, Trial Arms — Trial Design, Version 3.3. One record per planned Element per Arm, Tabulation.

Variable NameVariable LabelTypeControlled Terms, Codelist or Format1RoleCDISC NotesCoreSTUDYIDStudy IdentifierChar
IdentifierUnique identifier for a study.ReqDOMAINDomain AbbreviationCharTAIdentifierTwo-character abbreviation for the domain.ReqARMCDPlanned Arm CodeChar*TopicARMCD is limited to 20 characters and does not have special character restrictions. The maximum length of ARMCD is longer than that for other "short" variables to accommodate the kind of values that are likely to be needed for crossover trials. For example, if ARMCD values for a seven-period crossover were constructed using two-character abbreviations for each treatment and separating hyphens, the length of ARMCD values would be 20.ReqARMDescription of Planned ArmChar*Synonym QualifierName given to an Arm or treatment group.ReqTAETORDPlanned Order of Element within ArmNum
TimingNumber that gives the order of the Element within the Arm.ReqETCDElement CodeChar*Record QualifierETCD [the companion to ELEMENT] is limited to 8 characters and does not have special character restrictions. These values should be short for ease of use in programming, but it is not expected that ETCD will need to serve as a variable name.ReqELEMENTDescription of ElementChar*Synonym QualifierThe name of the Element. The same Element may occur more than once within an Arm.PermTABRANCHBranchChar
RuleCondition subject met, at a "branch" in the trial design at the end of this Element, to be included in this Arm [e.g., "Randomization to DRUG X"].ExpTATRANSTransition RuleChar
RuleIf the trial design allows a subject to transition to an Element other than the next Element in sequence, then the conditions for transitioning to those other Elements, and the alternative Element sequences, are specified in this rule [e.g., "Responders go to washout"].ExpEPOCHEpochChar[EPOCH]TimingName of the Trial Epoch with which this Element of the Arm is associated.Req

¹ In this column, * indicates the variable may be subject to controlled terminology, and CDISC/NCI codelist code values are enclosed in [parenthesis].

TA – Assumptions

  1. TAETORD is an integer. In general, the value of TAETORD is 1 for the first Element in each Arm, 2 for the second Element in each Arm, etc. Occasionally, it may be convenient to skip some values [see Example Trial 6 for an example]. Although the values of TAETORD need not always be sequential, their order must always be the correct order for the Elements in the Arm path.
  2. Elements in different Arms with the same value of TAETORD may or may not be at the same time, depending on the design of the trial. The example trials illustrate a variety of possible situations. The same Element may occur more than once within an Arm.
  3. TABRANCH describes the outcome of a branch decision point in the trial design for subjects in the Arm. A branch decision point takes place between Epochs, and is associated with the Element that ends at the decision point. For instance, if subjects are assigned to an Arm where they receive treatment A through a randomization at the end of Element X, the value of TABRANCH for Element X would be "Randomized to A."
  4. Branch decision points may be based on decision processes other than randomizations, such as clinical evaluations of disease response or subject choice.
  5. There is usually some gap in time between the performance of a randomization and the start of randomized treatment. However, in many trials this gap in time is small and it is highly unlikely that subjects will leave the trial between randomization and treatment. In these circumstances, the trial does not need to be modeled with this time period between randomization and start of treatment as a separate Element.
  6. Some trials include multiple paths that are closely enough related so that they are all considered to belong to one Arm. In general, this set of paths will include a "complete" path along with shorter paths that skip some Elements. The sequence of Elements represented in the Trial Arms should be the complete, longest path. TATRANS describes the decision points that may lead to a shortened path within the Arm.
  7. If an Element does not end with a decision that could lead to a shortened path within the Arm, then TATRANS will be blank. If there is such a decision, TATRANS will be in a form like, "If condition X is true, then go to Epoch Y" or "If condition X is true, then go to Element with TAETORD = 'Z'".
  8. EPOCH is not strictly necessary for describing the sequence of Elements in an Arm path, but it is the conceptual basis for comparisons between Arms, and also provides a useful way to talk about what is happening in a blinded trial while it is blinded. During periods of blinded treatment, blinded participants will not know which Arm and Element a subject is in, but EPOCH should provide a description of the time period that does not depend on knowing Arm.
  9. EPOCH should be assigned in such a way that Elements from different Arms with the same value of EPOCH are "comparable" in some sense. The degree of similarity across Arms varies considerably in different trials, as illustrated in the examples.
  10. EPOCH values for multiple similar Epochs:
    1. When a study design includes multiple Epochs with the same purpose [e.g., multiple similar treatment Epochs], it is recommended that the EPOCH values be terms from the controlled terminology, but with numbers appended. For example, multiple treatment Epochs could be represented using "TREATMENT 1", "TREATMENT 2", etc. Since the codelist is extensible, this convention allows multiple similar epochs to be represented without adding numbered terms to the CDISC controlled terminology for Epoch. The inclusion of multiple numbered terms in the EPOCH codelist is not considered to add value.
    2. Note that the controlled terminology does include some more granular terms for distinguishing between epochs that differ in ways other than mere order, and these terms should be used where applicable, as they are more informative. For example, when "BLINDED TREATMENT" and "OPEN LABEL TREATMENT" are applicable, those terms would be preferred over "TREATMENT 1" and "TREATMENT 2".
  11. Note that Study Cells are not explicitly defined in the Trial Arms dataset. A set of records with a common value of both ARMCD and EPOCH constitute the description of a Study Cell. Transition rules within this set of records are also part of the description of the Study Cell.
  12. EPOCH may be used as a timing variable in other datasets, such as EX and DS, and values of EPOCH must be different for different epochs. For instance, in a crossover trial with three treatment epochs, each must be given a distinct name; all three cannot be called "TREATMENT".

TA – Examples

The core of the Trial Design Model is the Trial Arms [TA] dataset. For each Arm of the trial, it contains one record for each occurrence of an Element in the path of the Arm.

Although the Trial Arms dataset has one record for each trial Element traversed by subjects assigned to the Arm, it is generally more useful to work out the overall design of the trial at the Study Cell level first, then to work out the Elements within each Study Cell, and finally to develop the definitions of the Elements that are contained in the Trial Elements table.

It is generally useful to draw diagrams, like those mentioned in ICH E3, when working out the design of a trial. The protocol may include a diagram that can serve as a starting point. Such a diagram can then be converted into a Trial Design Matrix, which displays the Study Cells and which can be, in turn, converted into the Trial Arms dataset.

This section uses example trials of increasing complexity, numbered 1 to 7, to illustrate the concepts of trial design. For each example trial, the process of working out the Trial Arms table is illustrated by means of a series of diagrams and tables, including the following:

  • A diagram showing the branching structure of the trial in a "study schema" format such as might appear in a protocol.
  • A diagram that shows the "prospective" view of the trial, the view of those participating in the trial. It is similar to the "study schema" view in that it usually shows a single pool of subjects at the beginning of the trial, with the pool of subjects being split into separate treatment groups at randomizations and other branches. They show the Epochs of the trial, and, for each group of subjects and each Epoch, the sequence of Elements within each Epoch for that treatment group. The Arms are also indicated on these diagrams.
  • A diagram that shows the "retrospective" view of the trial, the view of the analyst reporting on the trial. The style of diagram looks more like a matrix; it is also more like the structure of the Trial Arms dataset. It is an Arm-centered view, which shows, for each study cell [Epoch/Arm combination] the sequence of Elements within that study cell. It can be thought of as showing, for each Arm, the Elements traversed by a subject who completed that Arm as intended.
  • If the trial is blinded, a diagram that shows the trial as it appears to a blinded participant.
  • A Trial Design Matrix, an alternative format for representing most of the information in the diagram that shows Arms and Epochs, and emphasizes the Study Cells.
  • The Trial Arms dataset.

Readers are advised to read the following section with Example 1 before reading other examples, since Example 1 explains the conventions used for the diagrams and tables.

Example

Diagrams that represent study schemas generally conceive of time as moving from left to right, using horizontal lines to represent periods of time and slanting lines to represent branches into separate treatments, convergence into a common follow-up, or cross-over to a different treatment.

In this document, diagrams are drawn using "blocks" corresponding to trial Elements rather than horizontal lines. Trial Elements are the various treatment and non-treatment time periods of the trial, and we want to emphasize the separate trial Elements that might otherwise be "hidden" in a single horizontal line. See Section 7.2.2, Trial Elements [TE], for more information about defining trial Elements. In general, the Elements of a trial will be fairly clear. However, in the process of working out a trial design, alternative definitions of trial Elements may be considered, in which case diagrams for each alternative may be constructed.

In the study schema diagrams in this document, the only slanting lines used are those that represent branches, the decision points where subjects are divided into separate treatment groups. One advantage of this style of diagram, which does not show convergence of separate paths into a single block, is that the number of Arms in the trial can be determined by counting the number of parallel paths at the right end of the diagram.

Below is the study schema diagram for Example Trial 1, a simple parallel trial. This trial has three Arms, corresponding to the three possible left-to-right "paths" through the trial. Each path corresponds to one of the three treatment Elements at the right end of the diagram. Note that the randomization is represented by the three red arrows leading from the Run-in block.

Example Trial 1, Parallel Design Study Schema

The next diagram for this trial shows the Epochs of the trial, indicates the three Arms, and shows the sequence of Elements for each group of subjects in each Epoch. The arrows are at the right hand side of the diagram because it is at the end of the trial that all the separate paths through the trial can be seen. Note that, in this diagram, the randomization, which was shown using three red arrows connecting the Run-in block with the three treatment blocks in the first diagram, is now indicated by a note with an arrow pointing to the line between two Epochs.

Example Trial 1, Parallel Design Prospective view

The next diagram can be thought of as the "retrospective" view of a trial, the view back from a point in time when a subject's assignment to an Arm is known. In this view, the trial appears as a grid, with an Arm represented by a series of study cells, one for each Epoch, and a sequence of Elements within each study cell. In this trial, as in many trials, there is exactly one Element in each study cell, but later examples will illustrate that this is not always the case.

Example Trial 1, Parallel Design Retrospective view

The next diagram shows the trial from the viewpoint of blinded participants. To blinded participants in this trial, all Arms look alike. They know when a subject is in the Screen Element, or the Run-in Element, but when a subject is in the Treatment Epoch, they know only that the subject is in an Element that involves receiving a study drug, not which study drug, and therefore not which Element.

Example Trial 1, Parallel Design Blinded View

A trial design matrix is a table with a row for each Arm in the trial and a column for each Epoch in the trial. It is closely related to the retrospective view of the trial, and many users may find it easier to construct a table than to draw a diagram. The cells in the matrix represent the Study Cells, which are populated with trial Elements. In this trial, each Study Cell contains exactly one Element.

The columns of a Trial Design Matrix are the Epochs of the trial, the rows are the Arms of the trial, and the cells of the matrix [the Study Cells] contain Elements. Note that the randomization is not represented in the Trial Design Matrix. All the diagrams above and the trial design matrix below are alternative representations of the trial design. None of them contains all the information that will be in the finished Trial Arms dataset, but users may find it useful to draw some or all of them when working out the dataset.

Trial Design Matrix for Example Trial 1


ScreenRun-inTreatmentPlaceboScreenRun-inPLACEBOAScreenRun-inDRUG ABScreenRun-inDRUG B

For Example Trial 1, the conversion of the Trial Design Matrix into the Trial Arms dataset is straightforward. For each cell of the matrix, there is a record in the Trial Arms dataset. ARM, EPOCH, and ELEMENT can be populated directly from the matrix. TAETORD acts as a sequence number for the Elements within an Arm, so it can be populated by counting across the cells in the matrix. The randomization information, which is not represented in the Trial Design Matrix, is held in TABRANCH in the Trial Arms dataset. TABRANCH is populated only if there is a branch at the end of an Element for the Arm. When TABRANCH is populated, it describes how the decision at the branch point would result in a subject being in this Arm.

ta.xpt

RowSTUDYIDDOMAINARMCDARMTAETORDETCDELEMENTTABRANCHTATRANSEPOCH1EX1TAPPlacebo1SCRNScreen

SCREENING2EX1TAPPlacebo2RIRun-InRandomized to Placebo
RUN-IN3EX1TAPPlacebo3PPlacebo

TREATMENT4EX1TAAA1SCRNScreen

SCREENING5EX1TAAA2RIRun-InRandomized to Drug A
RUN-IN6EX1TAAA3ADrug A

TREATMENT7EX1TABB1SCRNScreen

SCREENING8EX1TABB2RIRun-InRandomized to Drug B
RUN-IN9EX1TABB3BDrug B

TREATMENT

Example

The diagram below is for a crossover trial. However, the diagram does not use the crossing slanted lines sometimes used to represent crossover trials, since the order of the blocks is sufficient to represent the design of the trial. Slanted lines are used only to represent the branch point at randomization, when a subject is assigned to a sequence of treatments. As in most crossover trials, the Arms are distinguished by the order of treatments, with the same treatments present in each Arm. Note that even though all three Arms of this trial end with the same block, the block for the follow-up Element, the diagram does not show the Arms converging into one block. Also note that the same block [the "Rest" Element] occurs twice within each Arm. Elements are conceived of as "reusable" and can appear in more than one Arm, in more than one Epoch, and more than once in an Arm.

Example Trial 2, Crossover Trial Study Schema

The next diagram for this crossover trial shows the prospective view of the trial, identifies the Epoch and Arms of the trial, and gives each a name. As for most crossover studies, the objectives of the trial will be addressed by comparisons between the Arms and by within-subject comparisons between treatments. The design thus depends on differentiating the periods during which the subject receives the three different treatments and so there are three different treatment Epochs. The fact that the rest periods are identified as separate Epochs suggests that these also play an important part in the design of the trial; they are probably designed to allow subjects to return to "baseline", with data collected to show that this occurred. Note that Epochs are not considered "reusable", so each Epoch has a different name, even though all the Treatment Epochs are similar and both the Rest Epochs are similar. As with the first example trial, there is a one-to-one relationship between the Epochs of the trial and the Elements in each Arm.

Example Trial 2, Crossover Trial Prospective View

The next diagram shows the retrospective view of the trial.

Example Trial 2, Crossover Trial Retrospective View

The last diagram for this trial shows the trial from the viewpoint of blinded participants. As in the simple parallel trial above, blinded participants see only one sequence of Elements, since during the treatment Epochs they do not know which of the treatment Elements a subject is in.

Example Trial 2, Crossover Trial Blinded View

The trial design matrix for the crossover example trial is shown below. It corresponds closely to the retrospective diagram above.

Trial Design Matrix for Example Trial 2


ScreenFirst TreatmentFirst RestSecond TreatmentSecond RestThird TreatmentFollow-upP-5-10ScreenPlaceboRest5 mgRest10 mgFollow-up5-P-10Screen5 mgRestPlaceboRest10 mgFollow-up5-10-PScreen5 mgRest10 mgRestPlaceboFollow-up

It is straightforward to produce the Trial Arms dataset for this crossover trial from the diagram showing Arms and Epochs, or from the Trial Design Matrix.

ta.xpt

RowSTUDYIDDOMAINARMCDARMTAETORDETCDELEMENTTABRANCHTATRANSEPOCH1EX2TAP-5-10Placebo-5mg-10mg1SCRNScreenRandomized to Placebo - 5 mg - 10 mg
SCREENING2EX2TAP-5-10Placebo-5mg-10mg2PPlacebo

TREATMENT 13EX2TAP-5-10Placebo-5mg-10mg3RESTRest

WASHOUT 14EX2TAP-5-10Placebo-5mg-10mg455 mg

TREATMENT 25EX2TAP-5-10Placebo-5mg-10mg5RESTRest

WASHOUT 26EX2TAP-5-10Placebo-5mg-10mg61010 mg

TREATMENT 37EX2TAP-5-10Placebo-5mg-10mg7FUFollow-up

FOLLOW-UP8EX2TA5-P-105mg-Placebo-10mg1SCRNScreenRandomized to 5 mg - Placebo - 10 mg
SCREENING9EX2TA5-P-105mg-Placebo-10mg255 mg

TREATMENT 110EX2TA5-P-105mg-Placebo-10mg3RESTRest

WASHOUT 111EX2TA5-P-105mg-Placebo-10mg4PPlacebo

TREATMENT 212EX2TA5-P-105mg-Placebo-10mg5RESTRest

WASHOUT 213EX2TA5-P-105mg-Placebo-10mg61010 mg

TREATMENT 314EX2TA5-P-105mg-Placebo-10mg7FUFollow-up

FOLLOW-UP15EX2TA5-10-P5mg-10mg-Placebo1SCRNScreenRandomized to 5 mg - 10 mg – Placebo
SCREENING16EX2TA5-10-P5mg-10mg-Placebo255 mg

TREATMENT 117EX2TA5-10-P5mg-10mg-Placebo3RESTRest

WASHOUT 118EX2TA5-10-P5mg-10mg-Placebo41010 mg

TREATMENT 219EX2TA5-10-P5mg-10mg-Placebo5RESTRest

WASHOUT 220EX2TA5-10-P5mg-10mg-Placebo6PPlacebo

TREATMENT 321EX2TA5-10-P5mg-10mg-Placebo7FUFollow-up

FOLLOW-UP

Example

Each of the paths for the trial shown in the diagram below goes through one branch point at randomization, and then through another branch point when response is evaluated. This results in four Arms, corresponding to the number of possible paths through the trial, and also to the number of blocks at the right end of the diagram. The fact that there are only two kinds of block at the right end ["Open DRUG X" and "Rescue"] does not affect the fact that there are four "paths" and thus four Arms.

Example Trial 3, Multiple Branches Study Schema

The next diagram for this trial is the prospective view. It shows the Epochs of the trial and how the initial group of subjects is split into two treatment groups for the double blind treatment Epoch, and how each of those initial treatment groups is split in two at the response evaluation, resulting in the four Arms of this trial The names of the Arms have been chosen to represent the outcomes of the successive branches that, together, assign subjects to Arms. These compound names were chosen to facilitate description of subjects who may drop out of the trial after the first branch and before the second branch. See DM Example 7, which illustrates DM and SE data for such subjects.

Example Trial 3, Multiple Branches Prospective View

The next diagram shows the retrospective view. As with the first two example trials, there is one Element in each study cell.

Example Trial 3, Multiple Branches Retrospective View

The last diagram for this trial shows the trial from the viewpoint of blinded participants. Since the prospective view is the view most relevant to study participants, the blinded view shown here is a prospective view. Since blinded participants can tell which treatment a subject receives in the Open Label Epoch, they see two possible element sequences.

Example Trial 3, Multiple Branches Blinded View

The trial design matrix for this trial can be constructed easily from the diagram showing Arms and Epochs.

Trial Design Matrix for Example Trial 3


ScreenDouble BlindOpen LabelA-Open AScreenTreatment AOpen Drug AA-RescueScreenTreatment ARescueB-Open AScreenTreatment BOpen Drug AB-RescueScreenTreatment BRescue

Creating the Trial Arms dataset for Example Trial 3 is similarly straightforward. Note that because there are two branch points in this trial, TABRANCH is populated for two records in each Arm. Note also that the values of ARMCD, like the values of ARM, reflect the two separate processes that result in a subject's assignment to an Arm.

ta.xpt

RowSTUDYIDDOMAINARMCDARMTAETORDETCDELEMENTTABRANCHTATRANSEPOCH1EX3TAAAA-Open A1SCRNScreenRandomized to Treatment A
SCREENING2EX3TAAAA-Open A2DBATreatment AAssigned to Open Drug A on basis of response evaluation
BLINDED TREATMENT3EX3TAAAA-Open A3OAOpen Drug A

OPEN LABEL TREATMENT4EX3TAARA-Rescue1SCRNScreenRandomized to Treatment A
SCREENING5EX3TAARA-Rescue2DBATreatment AAssigned to Rescue on basis of response evaluation
BLINDED TREATMENT6EX3TAARA-Rescue3RSCRescue

OPEN LABEL TREATMENT7EX3TABAB-Open A1SCRNScreenRandomized to Treatment B
SCREENING8EX3TABAB-Open A2DBBTreatment BAssigned to Open Drug A on basis of response evaluation
BLINDED TREATMENT9EX3TABAB-Open A3OAOpen Drug A

OPEN LABEL TREATMENT10EX3TABRB-Rescue1SCRNScreenRandomized to Treatment B
SCREENING11EX3TABRB-Rescue2DBBTreatment BAssigned to Rescue on basis of response evaluation
BLINDED TREATMENT12EX3TABRB-Rescue3RSCRescue

OPEN LABEL TREATMENT

See Section 7.2.1.1 Trial Arms Issues, Issue 1, "Distinguishing Between Branches and Transitions", for additional discussion of when a decision point in a trial design should be considered to give rise to a new Arm.

Example

The diagram below uses a new symbol, a large curved arrow representing the fact that the chemotherapy treatment [A or B] and the rest period that follows it are to be repeated. In this trial, the chemotherapy "cycles" are to be repeated until disease progression. Some chemotherapy trials specify a maximum number of cycles, but protocols that allow an indefinite number of repeats are not uncommon.

Example Trial 4, Cyclical Chemotherapy Study Schema

The next diagram shows the prospective view of this trial. Note that, in spite of the repeating element structure, this is, at its core, a two-arm parallel study, and thus has two Arms. In SDTMIG 3.1.1, there was an implicit assumption that each Element must be in a separate Epoch, and trials with cyclical chemotherapy were difficult to handle. The introduction of the concept of study cells, and the dropping of the assumption that Elements and Epochs have a one-to-one relationship resolves these difficulties. This trial is best treated as having just three Epochs, since the main objectives of the trial involve comparisons between the two treatments, and do not require data to be considered cycle by cycle.

Example Trial 4, Cyclical Chemotherapy Prospective View

The next diagram shows the retrospective view of this trial.

Example Trial 4, Cyclical Chemotherapy Retrospective View

For the purpose of developing a Trial Arms dataset for this oncology trial, the diagram must be redrawn to explicitly represent multiple treatment and rest elements. If a maximum number of cycles is not given by the protocol, then, for the purposes of constructing an SDTM Trial Arms dataset for submission, which can only take place after the trial is complete, the number of repeats included in the Trial Arms dataset should be the maximum number of repeats that occurred in the trial. The next diagram assumes that the maximum number of cycles that occurred in this trial was four. Some subjects will not have received all four cycles, because their disease progressed. The rule that directed that they receive no further cycles of chemotherapy is represented by a set of green arrows, one at the end of each Rest Epoch, that shows that a subject "skips forward" if their disease progresses. In the Trial Arms dataset, each "skip forward" instruction is a transition rule, recorded in the TATRANS variable; when TATRANS is not populated, the rule is to transition to the next Element in sequence.

Example Trial 4, Cyclical Chemotherapy Retrospective View with Explicit Repeats

The logistics of dosing mean that few oncology trials are blinded, if this trial is blinded, then the next diagram shows the trial from the viewpoint of blinded participant.

Example Trial 4, Cyclical Chemotherapy Blinded View

The Trial Design Matrix for Example Trial 4 corresponds to the diagram showing the retrospective view with explicit repeats of the treatment and Rest Elements. As noted above, the Trial Design Matrix does not include information on when randomization occurs; similarly, information corresponding to the "skip forward" rules is not represented in the Trial Design Matrix.

Trial Design Matrix for Example Trial 4


ScreenTreatmentFollow-upAScreenTrt ARestTrt ARestTrt ARestTrt ARestFollow-upBScreenTrt BRestTrt BRestTrt BRestTrt BRestFollow-up

The Trial Arms dataset for Example Trial 4 requires the use of the TATRANS variable in the Trial Arms dataset to represent the "repeat until disease progression" feature. In order to represent this rule in the diagrams that explicitly represent repeated elements, a green "skip forward" arrow was included at the end of each element where disease progression is assessed. In the Trial Arms dataset, TATRANS is populated for each Element with a green arrow in the diagram. In other words, if there is a possibility that a subject will, at the end of this Element, "skip forward" to a later part of the Arm, then TATRANS is populated with the rule describing the conditions under which a subject will go to a later Element. If the subject always goes to the next Element in the Arm [as was the case for the first three example trials presented here] then TATRANS is null. The Trial Arms datasets presented below corresponds to the Trial Design Matrix above.

ta.xpt

RowSTUDYIDDOMAINARMCDARMTAETORDETCDELEMENTTABRANCHTATRANSEPOCH1EX4TAAA1SCRNScreenRandomized to A
SCREENING2EX4TAAA2ATrt A

TREATMENT3EX4TAAA3RESTRest
If disease progression, go to Follow-up EpochTREATMENT4EX4TAAA4ATrt A

TREATMENT5EX4TAAA5RESTRest
If disease progression, go to Follow-up EpochTREATMENT6EX4TAAA6ATrt A

TREATMENT7EX4TAAA7RESTRest
If disease progression, go to Follow-up EpochTREATMENT8EX4TAAA8ATrt A

TREATMENT9EX4TAAA9RESTRest

TREATMENT10EX4TAAA10FUFollow-up

FOLLOW-UP11EX4TABB1SCRNScreenRandomized to B
SCREENING12EX4TABB2BTrt B

TREATMENT13EX4TABB3RESTRest
If disease progression, go to Follow-up EpochTREATMENT14EX4TABB4BTrt B

TREATMENT15EX4TABB5RESTRest
If disease progression, go to Follow-up EpochTREATMENT16EX4TABB6BTrt B

TREATMENT17EX4TABB7RESTRest
If disease progression, go to Follow-up EpochTREATMENT18EX4TABB8BTrt B

TREATMENT19EX4TABB9RESTRest

TREATMENT20EX4TABB10FUFollow-up

FOLLOW-UP

Example

Example Trial 5 is much like the last oncology trial in that the two treatments being compared are given in cycles, and the total length of the cycle is the same for both treatments. In this trial, however, Treatment A is given over longer duration than Treatment B. Because of this difference in treatment patterns, this trial cannot be blinded.

Example Trial 5, Different Chemo Durations Study Schema

In SDTMIG 3.1.1, the assumption of a one-to-one relationship between Elements and Epochs made this example difficult to handle. However, without that assumption, this trial is essentially the same as Trial 4. The next diagram shows the retrospective view of this trial.

Example Trial 5, Cyclical Chemotherapy Retrospective View

The Trial Design Matrix for this trial is almost the same as for Example Trial 4; the only difference is that the maximum number of cycles for this trial was assumed to be three.

Trial Design Matrix for Example Trial 5


ScreenTreatmentFollow-upAScreenTrt ARest ATrt ARest ATrt ARest AFollow-upBScreenTrt BRest BTrt BRest BTrt BRest BFollow-up

The Trial Arms dataset for this trial shown below corresponds to the Trial Design Matrix above.

ta.xpt

RowSTUDYIDDOMAINARMCDARMTAETORDETCDELEMENTTABRANCHTATRANSEPOCH1EX5TAAA1SCRNScreenRandomized to A
SCREENING2EX5TAAA2ATrt A

TREATMENT3EX5TAAA3RESTARest A
If disease progression, go to Follow-up EpochTREATMENT4EX5TAAA4ATrt A

TREATMENT5EX5TAAA5RESTARest A
If disease progression, go to Follow-up EpochTREATMENT6EX5TAAA6ATrt A

TREATMENT7EX5TAAA7RESTARest A

TREATMENT8EX5TAAA8FUFollow-up

FOLLOW-UP9EX5TABB1SCRNScreenRandomized to B
SCREENING10EX5TABB2BTrt B

TREATMENT11EX5TABB3RESTBRest B
If disease progression, go to Follow-up EpochTREATMENT12EX5TABB4BTrt B

TREATMENT13EX5TABB5RESTBRest B
If disease progression, go to Follow-up EpochTREATMENT14EX5TABB6BTrt B

TREATMENT15EX5TABB7RESTBRest B

TREATMENT16EX5TABB8FUFollow-up

FOLLOW-UP

Example

Example Trial 6 is an oncology trial comparing two types of chemotherapy that are given using cycles of different lengths with different internal patterns. Treatment A is given in 3-week cycles with a longer duration of treatment and a short rest, while Treatment B is given in 4-week cycles with a short duration of treatment and a long rest.

Example Trial 6, Different Cycle Durations Study Schema

The design of this trial is very similar to that for Example Trials 4 and 5. The main difference is that there are two different Rest Elements, the short one used with Drug A and the long one used with Drug B. The next diagram shows the retrospective view of this trial.

Example Trial 6, Cyclical Chemotherapy Retrospective View

The Trial Design Matrix for this trial assumes that there was a maximum of four cycles of Drug A and a maximum of three cycles of Drug B.

Trial Design Matrix for Example Trial 6


ScreenTreatmentFollow-upAScreenTrt ARest ATrt ARest ATrt ARest ATrt ARest AFollow-upBScreenTrt BRest BTrt BRest BTrt BRest BFollow-up

In the following Trial Arms dataset, because the Treatment Epoch for Arm A has more Elements than the Treatment Epoch for Arm B, TAETORD is 10 for the Follow-up Element in Arm A, but 8 for the Follow-up Element in Arm B. It would also be possible to assign a TAETORD value of 10 to the Follow-up Element in Arm B. The primary purpose of TAETORD is to order Elements within an Arm; leaving gaps in the series of TAETORD values does not interfere with this purpose.

ta.xpt

RowSTUDYIDDOMAINARMCDARMTAETORDETCDELEMENTTABRANCHTATRANSEPOCH1EX6TAAA1SCRNScreenRandomized to A
SCREENING2EX6TAAA2ATrt A

TREATMENT3EX6TAAA3RESTARest A
If disease progression, go to Follow-up EpochTREATMENT4EX6TAAA4ATrt A

TREATMENT5EX6TAAA5RESTARest A
If disease progression, go to Follow-up EpochTREATMENT6EX6TAAA6ATrt A

TREATMENT7EX6TAAA7RESTARest A
If disease progression, go to Follow-up EpochTREATMENT8EX6TAAA8ATrt A

TREATMENT9EX6TAAA9RESTARest A

TREATMENT10EX6TAAA10FUFollow-up

FOLLOW-UP11EX6TABB1SCRNScreenRandomized to B
SCREENING12EX6TABB2BTrt B

TREATMENT13EX6TABB3RESTBRest B
If disease progression, go to Follow-up EpochTREATMENT14EX6TABB4BTrt B

TREATMENT15EX6TABB5RESTBRest B
If disease progression, go to Follow-up EpochTREATMENT16EX6TABB6BTrt B

TREATMENT17EX6TABB7RESTBRest B

TREATMENT18EX6TABB8FUFollow-up

FOLLOW-UP

Example

In open trials, there is no requirement to maintain a blind, and the Arms of a trial may be quite different from each other. In such a case, changes in treatment in one Arm may differ in number and timing from changes in treatment in another Arm, so that there is nothing like a one-to-one match between the Elements in the different Arms. In such a case, Epochs are likely to be defined as broad intervals of time, spanning several Elements, and be chosen to correspond to periods of time that will be compared in analyses of the trial.

Example Trial 7, RTOG 93-09, involves treatment of lung cancer with chemotherapy and radiotherapy, with or without surgery. The protocol [RTOG-93-09], which is available online at the Radiation Oncology Therapy Group [RTOG] website //www.rtog.org, does not include a study schema diagram, but does include a text-based representation of diverging "options" to which a subject may be assigned. All subjects go through the branch point at randomization, when subjects are assigned to either Chemotherapy + Radiotherapy [CR] or Chemotherapy + Radiotherapy + Surgery [CRS]. All subjects receive induction chemotherapy and radiation, with a slight difference between those randomized to the two Arms during the second cycle of chemotherapy. Those randomized to the non-surgery Arm are evaluated for disease somewhat earlier, to avoid delays in administering the radiation boost to those whose disease has not progressed. After induction chemotherapy and radiation, subjects are evaluated for disease progression, and those whose disease has progressed stop treatment, but enter follow-up. Not all subjects randomized to receive surgery who do not have disease progression will necessarily receive surgery. If they are poor candidates for surgery or do not wish to receive surgery, they will not receive surgery, but will receive further chemotherapy. The diagram below is based on the text "schema" in the protocol, with the five "options" it names. The diagram in this form might suggest that the trial has five Arms.

Example Trial 7, RTOG 93-09 Study Schema with 5 "options"

*Disease evaluation earlier **Disease evaluation later

However, the objectives of the trial make it clear that this trial is designed to compare two treatment strategies, chemotherapy and radiation with and without surgery, so this study is better modeled as a two-Arm trial, but with major "skip forward" arrows for some subjects, as illustrated in the following diagram. This diagram also shows more detail within the blocks labeled "Induction Chemo + RT" and "Additional Chemo" than the diagram above. Both the "induction" and "additional" chemotherapy are given in two cycles. Also, the second induction cycle is different for the two Arms, since radiation therapy for those assigned to the non-surgery arm includes a "boost", which those assigned to the surgery Arm do not receive.

The next diagram shows the prospective view of this trial. The protocol conceives of treatment as being divided into two parts, Induction and Continuation, so these have been treated as two different Epochs. This is also an important point in the trial operationally, the point when subjects are "registered" a second time, and when subjects are identified who will "skip forward", because of disease progression or ineligibility for surgery.

Example Trial 7, RTOG-93-09 Prospective View

*Disease evaluation earlier **Disease evaluation later

The next diagram shows the retrospective view of this trial. The fact that the Elements in the study cell for the CR Arm in the Continuation Treatment Epoch do not fill the space in the diagram is an artifact of the diagram conventions. Those subjects who do receive surgery will in fact spend a longer time completing treatment and moving into follow-up. Although it is tempting to think of the horizontal axis of these diagrams as a timeline, this can sometimes be misleading. The diagrams are not necessarily "to scale" in the sense that the length of the block representing an Element represents its duration, and elements that line up on the same vertical line in the diagram may not occur at the same relative time within the study.

Example Trial 7, RTOG 93-09 Retrospective View

*Disease evaluation earlier **Disease evaluation later

The Trial Design Matrix for Example Trial 7, RTOG 93-09, a two-Arm trial is shown in the following table.


ScreenInductionContinuationFollow-upCRScreenInitial Chemo + RTChemo + RT [non-Surgery]ChemoChemoOff Treatment Follow-upCRSScreenInitial Chemo + RTChemo + RT [Surgery]3-5 w RestSurgery4-6 w RestChemoChemoOff Treatment Follow-up

The Trial Arms dataset for the trial is shown below for Example Trial 7, as a two-Arm trial.

ta.xpt

RowSTUDYIDDOMAINARMCDARMTAETORDETCDELEMENTTABRANCHTATRANSEPOCH1EX7TA1CR1SCRNScreenRandomized to CR
SCREENING2EX7TA1CR2ICRInitial Chemo + RT

INDUCTION TREATMENT3EX7TA1CR3CRNSChemo+RT [non-Surgery]
If progression, skip to Follow-up.INDUCTION TREATMENT4EX7TA1CR4CChemo

CONTINUATION TREATMENT5EX7TA1CR5CChemo

CONTINUATION TREATMENT6EX7TA1CR6FUOff Treatment Follow-up

FOLLOW-UP7EX7TA2CRS1SCRNScreenRandomized to CRS
SCREENING8EX7TA2CRS2ICRInitial Chemo + RT

INDUCTION TREATMENT9EX7TA2CRS3CRSChemo+RT [Surgery]
If progression, skip to Follow-up. If no progression, but subject is
ineligible for or does not consent to surgery, skip to Chemo.INDUCTION TREATMENT10EX7TA2CRS4R33-5 week rest

CONTINUATION TREATMENT11EX7TA2CRS5SURGSurgery

CONTINUATION TREATMENT12EX7TA2CRS6R44-6 week rest

CONTINUATION TREATMENT13EX7TA2CRS7CChemo

CONTINUATION TREATMENT14EX7TA2CRS8CChemo

CONTINUATION TREATMENT15EX7TA2CRS9FUOff Treatment Follow-up

FOLLOW-UP

7.2.1.1 Trial Arms Issues

1. Distinguishing Between Branches and Transitions

Both the Branch and Transition columns contain rules, but the two columns represent two different types of rules. Branch rules represent forks in the trial flowchart, giving rise to separate Arms. The rule underlying a branch in the trial design appears in multiple records, once for each "fork" of the branch. Within any one record, there is no choice [no "if" clause] in the value of the Branch condition. For example, the value of TABRANCH for a record in Arm A is "Randomized to Arm A" because a subject in Arm A must have been randomized to Arm A. Transition rules are used for choices within an Arm. The value for TATRANS does contain a choice [an "if" clause]. In Example Trial 4, subjects who receive 1, 2, 3, or 4 cycles of Treatment A are all considered to belong to Arm A.

In modeling a trial, decisions may have to be made about whether a decision point in the flow chart represents the separation of paths that represent different Arms, or paths that represent variations within the same Arm, as illustrated in the discussion of Example Trial 7. This decision will depend on the comparisons of interest in the trial.

Some trials refer to groups of subjects who follow a particular path through the trial as "cohorts", particularly if the groups are formed successively over time. The term "cohort" is used with different meanings in different protocols and does not always correspond to an Arm.

2. Subjects Not Assigned to an Arm

Some trial subjects may drop out of the study before they reach all of the branch points in the trial design. In the Demographics domain, values of ARM and ARMCD must be supplied for such subjects, but the special values used for these subjects should not be included in the Trial Arms dataset; only complete Arm paths should be described in the Trial Arms dataset. DM Assumption 4 describes special ARM and ARMCD values used for subjects who do not reach the first branch point in a trial. When a trial design includes two or more branches, special values of ARM and ARMCD may be needed for subjects who pass through the first branch point, but drop out before the final branch point. See DM Example 7 for an example showing ARM and ARMCD values for such a trial.

3. Defining Epochs

The series of examples for the Trial Arms dataset provides a variety of scenarios and guidance about how to assign Epoch in those scenarios. In general, assigning Epochs for blinded trials is easier than for unblinded trials. The blinded view of the trial will generally make the possible choices clear. For unblinded trials, the comparisons that will be made between Arms can guide the definition of Epochs. For trials that include many variant paths within an Arm, comparisons of Arms will mean that subjects on a variety of paths will be included in the comparison, and this is likely to lead to definition of broader Epochs.

4. Rule Variables

The Branch and Transition columns shown in the example tables are variables with a Role of "Rule." The values of a Rule variable describe conditions under which something is planned to happen. At the moment, values of Rule variables are text. At some point in the future, it is expected that these will become executable code. Other Rule variables are present in the Trial Elements and Trial Visits datasets.

7.2.2 Trial Elements

TE – Description/Overview

A trial design domain that contains the element code that is unique for each element, the element description, and the rules for starting and ending an element.

The Trial Elements [TE] dataset contains the definitions of the Elements that appear in the Trial Arms [TA] dataset. An Element may appear multiple times in the Trial Arms table because it appears either 1] in multiple Arms, 2] multiple times within an Arm, or 3] both. However, an Element will appear only once in the Trial Elements table.

Each row in the TE dataset may be thought of as representing a "unique Element" in the sense of "unique" used when a case report form template page for a collecting certain type of data is often referred to as "unique page." For instance, a case report form might be described as containing 87 pages, but only 23 unique pages. By analogy, the trial design matrix for Example Trial 1 has nine Study Cells, each of which contains one Element, but the same trial design matrix contains only five unique Elements, so the trial Elements dataset for that trial has only five records.

An Element is a building block for creating Study Cells and an Arm is composed of Study Cells. Or, from another point of view, an Arm is composed of Elements: That is, the trial design assigns subjects to Arms, which are comprised of a sequence of steps called Elements.

Trial Elements represent an interval of time that serves a purpose in the trial and are associated with certain activities affecting the subject. "Week 2 to Week 4" is not a valid Element. A valid Element has a name that describes the purpose of the Element and includes a description of the activity or event that marks the subject's transition into the Element as well as the conditions for leaving the Element.

TE – Specification

te.xpt, Trial Elements — Trial Design, Version 3.2. One record per planned Element, Tabulation.

Variable NameVariable LabelTypeControlled Terms, Codelist or Format1RoleCDISC NotesCoreSTUDYIDStudy IdentifierChar
IdentifierUnique identifier for a study.ReqDOMAINDomain AbbreviationCharTEIdentifierTwo-character abbreviation for the domain.ReqETCDElement CodeChar*TopicETCD [the companion to ELEMENT] is limited to 8 characters and does not have special character restrictions. These values should be short for ease of use in programming, but it is not expected that ETCD will need to serve as a variable name.ReqELEMENTDescription of ElementChar*Synonym QualifierThe name of the Element.ReqTESTRLRule for Start of ElementChar
RuleExpresses rule for beginning Element.ReqTEENRLRule for End of ElementChar
RuleExpresses rule for ending Element. Either TEENRL or TEDUR must be present for each Element.PermTEDURPlanned Duration of ElementCharISO 8601TimingPlanned Duration of Element in ISO 8601 format. Used when the rule for ending the Element is applied after a fixed duration.Perm

¹ In this column, * indicates the variable may be subject to controlled terminology, and CDISC/NCI codelist code values are enclosed in [parenthesis].

TE – Assumptions

  1. There are no gaps between Elements. The instant one Element ends, the next Element begins. A subject spends no time "between" Elements.
  2. ELEMENT, the Description of the Element, usually indicates the treatment being administered during an Element, or, if no treatment is being administered, the other activities that are the purpose of this period of time, such as Screening, Follow-up, Washout. In some cases, this may be quite passive, such as Rest, or Wait [for disease episode].
  3. TESTRL, the Rule for Start of Element, identifies the event that marks the transition into this Element. For Elements that involve treatment, this is the start of treatment.
  4. For Elements that do not involve treatment, TESTRL can be more difficult to define. For washout and follow-up Elements, which always follow treatment Elements, the start of the Element may be defined relative to the end of a preceding treatment. For example, a washout period might be defined as starting 24 or 48 hours after the last dose of drug for the preceding treatment Element or Epoch. This definition is not totally independent of the Trial Arms dataset, since it relies on knowing where in the trial design the Element is used, and that it always follows a treatment Element. Defining a clear starting point for the start of a non-treatment Element that always follows another non-treatment Element can be particularly difficult. The transition may be defined by a decision-making activity such as enrollment or randomization. For example, every Arm of a trial that involves treating disease episodes might start with a screening Element followed by an Element that consists of waiting until a disease episode occurs. The activity that marks the beginning of the wait Element might be randomization.
  5. TESTRL for a treatment Element may be thought of as "active" while the start rule for a non-treatment Element, particularly a follow-up or washout Element, may be "passive." The start of a treatment Element will not occur until a dose is given, no matter how long that dose is delayed. Once the last dose is given, the start of a subsequent non-treatment Element is inevitable, as long as another dose is not given.
  6. Note that the date/time of the event described in TESTRL will be used to populate the date/times in the Subject Elements dataset, so the date/time of the event should be one that will be captured in the CRF.
  7. Specifying TESTRL for an Element that serves the first Element of an Arm in the Trial Arms dataset involves defining the start of the trial. In the examples in this document, obtaining informed consent has been used as "Trial Entry."
  8. TESTRL should be expressed without referring to Arm. If the Element appears in more than one Arm in the Trial Arms dataset, then the Element description [ELEMENT] must not refer to any Arms.
  9. TESTRL should be expressed without referring to Epoch. If the Element appears in more than one Epoch in the Trial Arms dataset, then the Element description [ELEMENT] must not refer to any Epochs.
  10. For a blinded trial, it is useful to describe TESTRL in terms that separate the properties of the event that are visible to blinded participants from the properties that are visible only to those who are unblinded. For treatment Elements in blinded trials, wording such as the following is suitable, "First dose of study drug for a treatment Epoch, where study drug is X."
  11. Element end rules are rather different from Element start rules. The actual end of one Element is the beginning of the next Element. Thus the Element end rule does not give the conditions under which an Element does end, but the conditions under which it should end or is planned to end.
  12. At least one of TEENRL and TEDUR must be populated. Both may be populated.
  13. TEENRL describes the circumstances under which a subject should leave this Element. Element end rules may depend on a variety of conditions. For instance, a typical criterion for ending a rest Element between oncology chemotherapy-treatment Elements would be, "15 days after start of Element and after WBC values have recovered." The Trial Arms dataset, not the Trial Elements dataset, describes where the subject moves next, so TEENRL must be expressed without referring to Arm.
  14. TEDUR serves the same purpose as TEENRL for the special [but very common] case of an Element with a fixed duration. TEDUR is expressed in ISO 8601. For example, a TEDUR value of P6W is equivalent to a TEENRL of "6 weeks after the start of the Element."
  15. Note that Elements that have different start and end rules are different Elements and must have different values of ELEMENT and ETCD. For instance, Elements that involve the same treatment but have different durations are different Elements. The same applies to non-treatment Elements. For instance, a washout with a fixed duration of 14 days is different from a washout that is to end after 7 days if drug cannot be detected in a blood sample, or after 14 days otherwise.

TE – Examples

Below are Trial Elements datasets for Example Trials 1 [Example Trial 1] and 2 [Example Trial 2]. Both of these trials are assumed to have fixed-duration Elements. The wording in TESTRL is intended to separate the description of the event that starts the Element into the part that would be visible to a blinded participant in the trial [e.g., "First dose of a treatment Epoch"] from the part that is revealed when the study is unblinded [e.g., "where dose is 5 mg"]. Care must be taken in choosing these descriptions to be sure that they are "Arm and Epoch neutral." For instance, in a crossover trial such as Example Trial 3 [Example Trial 3], where an Element may appear in one of multiple Epochs, the wording must be appropriate for all the possible Epochs. The wording for Example Trial 2 uses the wording "a treatment Epoch." The SDS Team is considering adding a separate variable to the Trial Elements dataset that would hold information on the treatment that is associated with an Element. This would make it clearer which Elements are "treatment Elements", and therefore, which Epochs contain treatment Elements, and thus are "treatment Epochs".

Example

This example shows the TE dataset for Example Trial 1.

te.xpt

RowSTUDYIDDOMAINETCDELEMENTTESTRLTEENRLTEDUR1EX1TESCRNScreenInformed consent1 week after start of ElementP7D2EX1TERIRun-InEligibility confirmed2 weeks after start of ElementP14D3EX1TEPPlaceboFirst dose of study drug, where drug is placebo2 weeks after start of ElementP14D4EX1TEADrug AFirst dose of study drug, where drug is Drug A2 weeks after start of ElementP14D5EX1TEBDrug BFirst dose of study drug, where drug is Drug B2 weeks after start of ElementP14D

Example

This example shows the TE dataset for Example Trial 2.

te.xpt

RowSTUDYIDDOMAINETCDELEMENTTESTRLTEENRLTEDUR1EX2TESCRNScreenInformed consent2 weeks after start of ElementP14D2EX2TEPPlaceboFirst dose of a treatment Epoch, where dose is placebo2 weeks after start of ElementP14D3EX2TE55 mgFirst dose of a treatment Epoch, where dose is 5 mg drug2 weeks after start of ElementP14D4EX2TE1010 mgFirst dose of a treatment Epoch, where dose is 10 mg drug2 weeks after start of ElementP14D5EX2TERESTRest48 hrs after last dose of preceding treatment Epoch1 week after start of ElementP7D6EX2TEFUFollow-up48 hrs after last dose of third treatment Epoch3 weeks after start of ElementP21D

Example

The Trial Elements dataset for Example Trial 4 illustrates Element end rules for Elements that are not all of fixed duration. The Screen Element in this study can be up to 2 weeks long, but may end earlier, so is not of fixed duration. The Rest Element has a variable length, depending on how quickly WBC recovers. Note that the start rules for the A and B Elements have been written to be suitable for a blinded study.

te.xpt

RowSTUDYIDDOMAINETCDELEMENTTESTRLTEENRLTEDUR1EX4TESCRNScreenInformed ConsentScreening assessments are complete, up to 2 weeks after start of Element
2EX4TEATrt AFirst dose of treatment Element, where drug is Treatment A5 days after start of ElementP5D3EX4TEBTrt BFirst dose of treatment Element, where drug is Treatment B5 days after start of ElementP5D4EX4TERESTRestLast dose of previous treatment cycle + 24 hrsAt least 16 days after start of Element and WBC recovered
5EX4TEFUFollow-upDecision not to treat further4 weeksP28D

7.2.2.1 Trial Elements Issues

1. Granularity of Trial Elements

Deciding how finely to divide trial time when identifying trial Elements is a matter of judgment, as illustrated by the following examples:

  1. Example Trial 2 was represented using three treatment Epochs separated by two washout Epochs and followed by a follow-up Epoch. It might have been modeled using three treatment Epochs that included both the 2-week treatment period and the 1-week rest period. Since the first week after the third treatment period would be included in the third treatment Epoch, the Follow-up Epoch would then have a duration of 2 weeks.
  2. In Example Trials 4, 5, and 6, separate Treatment and Rest Elements were identified. However, the combination of treatment and rest could be represented as a single Element.
  3. A trial might include a dose titration, with subjects receiving increasing doses on a weekly basis until certain conditions are met. The trial design could be modeled in any of the following ways:
    • Using several one-week Elements at specific doses, followed by an Element of variable length at the chosen dose,
    • As a titration Element of variable length followed by a constant dosing Element of variable length
    • One Element with dosing determined by titration

    The choice of Elements used to represent this dose titration will depend on the objectives of the trial and how the data will be analyzed and reported. If it is important to examine side effects or lab values at each individual dose, the first model is appropriate. If it is important only to identify the time to completion of titration, the second model might be appropriate. If the titration process is routine and is of little interest, the third model might be adequate for the purposes of the trial.

2. Distinguishing Elements, Study Cells, and Epochs

It is easy to confuse Elements, which are reusable trial building blocks, with Study Cells, which contain the Elements for a particular Epoch and Arm, and with Epochs, which are time periods for the trial as a whole. In part, this is because many trials have Epochs for which the same Element appears in all Arms. In other words, in the trial design matrix for many trials, there are columns [Epochs] in which all the Study Cells have the same contents. Furthermore, it is natural to use the same name [e.g., Screen or Follow-up] for both such an Epoch and the single Element that appears within it.

Confusion can also arise from the fact that, in the blinded treatment portions of blinded trials, blinded participants do not know which Element a subject is in, but do know what Epoch the subject is in.

In describing a trial, one way to avoid confusion between Elements and Epochs is to include "Element" or "Epoch" in the values of ELEMENT or EPOCH when these values [such as Screening or Follow-up] would otherwise be the same. It becomes tedious to do this in every case, but can be useful to resolve confusion when it arises or is likely to arise.

The difference between Epoch and Element is perhaps clearest in crossover trials. In Example Trial 2, as for most crossover trials, the analysis of PK results would include both treatment and period effects in the model. "Treatment effect" derives from Element [Placebo, 5 mg, or 10 mg], while "Period effect" derives from the Epoch [1st, 2nd, or 3rd Treatment Epoch].

3. Transitions Between Elements

The transition between one Element and the next can be thought of as a three-step process:

Step NumberStep QuestionHow step question is answered by information in the Trial Design datasets1Should the subject leave the current Element?Criteria for ending the current Element are in TEENRL in the TE dataset.2Which Element should the subject enter next?If there is a branch point at this point in the trial, evaluate criteria described in TABRANCH [e.g., randomization results] in the TA dataset otherwise, if TATRANS in the TA dataset is populated in this Arm at this point, follow those instructions otherwise, move to the next Element in this Arm as specified by TAETORD in the TA dataset.3What does the subject do to enter the next Element?The action or event that marks the start of the next Element is specified in TESTRL in the TE dataset

Note that the subject is not "in limbo" during this process. The subject remains in the current Element until Step 3, at which point the subject transitions to the new Element. There are no gaps between Elements.

From this table, it is clear that executing a transition depends on information that is split between the Trial Elements and the Trial Arms datasets.

It can be useful, in the process of working out the Trial Design datasets, to create a dataset that supplements the Trial Arms dataset with the TESTRL, TEENRL, and TEDUR variables, so that full information on the transitions is easily accessible. However, such a working dataset is not an SDTM dataset, and should not be submitted.

The following table shows a fragment of such a table for Example Trial 4. Note that for all records that contain a particular Element, all the TE variable values are exactly the same. Also, note that when both TABRANCH and TATRANS are blank, the implicit decision in Step 2 is that the subject moves to the next Element in sequence for the Arm.

ta.xpt

RowARMEPOCHTAETORDELEMENTTESTRLTEENRLTEDURTABRANCHTATRANS1AScreen1ScreenInformed ConsentScreening assessments are complete, up to 2 weeks after start of Element
Randomized to A
2ATreatment2Trt AFirst dose of treatment in Element, where drug is Treatment A5 days after start of ElementP5D

3ATreatment3RestLast dose of previous treatment cycle + 24 hrs16 days after start of Element and WBC recovers

If disease progression, go to Follow-up Epoch4ATreatment4Trt AFirst dose of treatment in Element, where drug is Treatment A5 days after start of ElementP5D

Note that both the second and fourth rows of this dataset involve the same Element, Trt A, and so TESTRL is the same for both. The activity that marks a subject's entry into the fourth Element in Arm A is "First dose of treatment Element, where drug is Treatment A." This is not the subject's very first dose of Treatment A, but it is their first dose in this Element.

7.3 Schedule for Assessments [TV, TD, and TM]

This subsection contains the Trial Design datasets that describe:

  • The protocol-defined planned schedule of subject encounters at the healthcare facility where the study is being conducted: Section 7.3.1, Trial Visits [TV]
  • The planned schedule of efficacy assessments related to the disease under study: Section 7.3.2, Trial Disease Assessments [TD]
  • The things [events, interventions, or findings] which, if and when they happen, are the occasion for assessments planned in the protocol: Section 7.3.3, Trial Disease Milestones [TM]

The TV and TD datasets provide the planned scheduling of assessments to which a subject's actual visits and disease assessments can be compared.

7.3.1 Trial Visits

TV – Description/Overview

A trial design domain that contains the planned order and number of visits in the study within each arm.

Visits are defined as "clinical encounters" and are described using the timing variables VISIT, VISITNUM, and VISITDY.

Protocols define Visits in order to describe assessments and procedures that are to be performed at the Visits.

TV – Specification

tv.xpt, Trial Visits — Trial Design, Version 3.2. One record per planned Visit per Arm, Tabulation.

Variable NameVariable LabelTypeControlled Terms, Codelist or Format1RoleCDISC NotesCoreSTUDYIDStudy IdentifierChar
IdentifierUnique identifier for a study.ReqDOMAINDomain AbbreviationCharTVIdentifierTwo-character abbreviation for the domain.ReqVISITNUMVisit NumberNum
Topic

  1. Clinical encounter number
  2. Numeric version of VISIT, used for sorting.
ReqVISITVisit NameChar
Synonym Qualifier
  1. Protocol-defined description of clinical encounter.
  2. May be used in addition to VISITNUM and/or VISITDY as a text description of the clinical encounter.
PermVISITDYPlanned Study Day of VisitNum
Timing
  1. Planned study day of VISIT.
  2. Due to its sequential nature, used for sorting.
PermARMCDPlanned Arm CodeChar*Record Qualifier
  1. ARMCD is limited to 20 characters and does not have special character restrictions. The maximum length of ARMCD is longer than for other "short" variables to accommodate the kind of values that are likely to be needed for crossover trials. For example, if ARMCD values for a seven-period crossover were constructed using two-character abbreviations for each treatment and separating hyphens, the length of ARMCD values would be 20.
  2. If the timing of Visits for a trial does not depend on which Arm a subject is in, then ARMCD should be null.
ExpARMDescription of Planned ArmChar*Synonym Qualifier
  1. Name given to an Arm or Treatment Group.
  2. If the timing of Visits for a trial does not depend on which Arm a subject is in, then Arm should be left blank.
PermTVSTRLVisit Start RuleChar
RuleRule describing when the Visit starts, in relation to the sequence of Elements.ReqTVENRLVisit End RuleChar
RuleRule describing when the Visit ends, in relation to the sequence of Elements.Perm

¹ In this column, * indicates the variable may be subject to controlled terminology, and CDISC/NCI codelist code values are enclosed in [parenthesis].

TV – Assumptions

  1. Although the general structure of the Trial Visits dataset is "One Record per Planned Visit per Arm", for many clinical trials, particularly blinded clinical trials, the schedule of Visits is the same for all Arms, and the structure of the Trial Visits dataset will be "One Record per Planned Visit". If the schedule of Visits is the same for all Arms, ARMCD should be left blank for all records in the TV dataset. For trials with trial Visits that are different for different Arms, such as Example Trial 7 [see Trial Arms [TA], under section 7.2, Experimental Design [TA and TE]], ARMCD and ARM should be populated for all records. If some Visits are the same for all Arms, and some Visits differ by Arm, then ARMCD and ARM should be populated for all records, to assure clarity, even though this will mean creating near-duplicate records for Visits that are the same for all Arms.
  2. A Visit may start in one Element and end in another. This means that a Visit may start in one Epoch and end in another. For example, if one of the activities planned for a Visit is the administration of the first dose of study drug, the Visit might start in the screen Epoch, and end in a treatment Epoch.
  3. TVSTRL describes the scheduling of the Visit and should reflect the wording in the protocol. In many trials, all Visits are scheduled relative to the study's Day 1, RFSTDTC. In such trials, it is useful to include VISITDY, which is, in effect, a special case representation of TVSTRL.
  4. Note that there is a subtle difference between the following two examples. In the first case, if Visit 3 were delayed for some reason, Visit 4 would be unaffected. In the second case, a delay to Visit 3 would result in Visit 4 being delayed as well.
    1. Case 1: Visit 3 starts 2 weeks after RFSTDTC. Visit 4 starts 4 weeks after RFSTDTC.
    2. Case 2: Visit 3 starts 2 weeks after RFSTDTC. Visit 4 starts 2 weeks after Visit 3.
  5. Many protocols do not give any information about Visit ends because Visits are assumed to end on the same day they start. In such a case, TVENRL may be left blank to indicate that the Visit ends on the same day it starts. Care should be taken to assure that this is appropriate, since common practice may be to record data collected over more than one day as occurring within a single Visit. Screening Visits may be particularly prone to collection of data over multiple days. The examples for this domain show how TVENRL could be populated.
  6. The values of VISITNUM in the TV dataset are the valid values of VISITNUM for planned Visits. Any values of VISITNUM that appear in subject-level datasets that are not in the TV dataset are assumed to correspond to unplanned Visits. This applies, in particular, to the subject-level Subject Visits [SV] dataset; see SV under Section 5, Models for Special Purpose Domains, for additional information about handling unplanned Visits. If a subject-level dataset includes both VISITNUM and VISIT, then records that include values of VISITNUM that appear in the TV dataset should also include the corresponding values of VISIT from the TV dataset.

TV – Examples

Example

The diagram below shows Visits by means of numbered "flags" with Visit Numbers. Each "flag" has two supports, one at the beginning of the Visit, the other at the end of the Visit. Note that Visits 2 and 3 span Epoch transitions. In other words, the transition event that marks the beginning of the Run-in Epoch [confirmation of eligibility] occurs during Visit 2, and the transition event that marks the beginning of the Treatment Epoch [the first dose of study drug] occurs during Visit 3.

Example Trial 1, Parallel Design Planned Visits

Two Trial Visits datasets are shown for this trial. The first shows a somewhat idealized situation, where the protocol has given specific timings for the Visits. The second shows a more usual situation, where the timings have been described only loosely.

tv.xpt

RowSTUDYIDDOMAINVISITNUMTVSTRLTVENRL1EX1TV1Start of Screen Epoch1 hour after start of Visit2EX1TV230 minutes before end of Screen Epoch30 minutes after start of Run-in Epoch3EX1TV330 minutes before end of Run-in Epoch1 hour after start of Treatment Epoch4EX1TV41 week after start of Treatment Epoch1 hour after start of Visit5EX1TV52 weeks after start of Treatment Epoch1 hour after start of Visit

tv.xpt

RowSTUDYIDDOMAINVISITNUMTVSTRLTVENRL1EX1TV1Start of Screen Epoch
2EX1TV2On the same day as, but before, the end of the Screen EpochOn the same day as, but after, the start of the Run-in Epoch3EX1TV3On the same day as, but before, the end of the Run-in EpochOn the same day as, but after, the start of the Treatment Epoch4EX1TV41 week after start of Treatment Epoch
5EX1TV52 weeks after start of Treatment EpochAt Trial Exit

Although the start and end rules in this example reference the starts and ends of Epochs, the start and end rules of some Visits for trials with Epochs that span multiple Elements will need to reference Elements rather than Epochs. When an Arm includes repetitions of the same Element, it may be necessary to use TAETORD as well as an Element name to specify when a Visit is to occur.

7.3.1.1 Trial Visits Issues

1. Identifying Trial Visits

In general, a trial's Visits are defined in its protocol. The term "Visit" reflects the fact that data in outpatient studies is usually collected during a physical visit by the subject to a clinic. Sometimes a Trial Visit defined by the protocol may not correspond to a physical visit. It may span multiple physical visits, as when screening data may be collected over several clinic visits but recorded under one Trial Visit name [VISIT] and number [VISITNUM]. A Trial Visit may also represent only a portion of an extended physical visit, as when a trial of in-patients collects data under multiple Trial Visits for a single hospital admission.

Diary data and other data collected outside a clinic may not fit the usual concept of a Trial Visit, but the planned times of collection of such data may be described as "Visits" in the Trial Visits dataset if desired.

2. Trial Visit Rules

Visit start rules are different from Element start rules because they usually describe when a Visit should occur, while Element start rules describe the moment at which an Element is considered to start. There are usually gaps between Visits, periods of time that do not belong to any Visit, so it is usually not necessary to identify the moment when one Visit stops and another starts. However, some trials of hospitalized subjects may divide time into Visits in a manner more like that used for Elements, and a transition event may need to be defined in such cases.

Visit start rules are usually expressed relative to the start or end of an Element or Epoch, e.g., "1-2 hours before end of First Wash-out" or "8 weeks after end of 2nd Treatment Epoch". Note that the Visit may or may not occur during the Element used as the reference for Visit start rule. For example, a trial with Elements based on treatment of disease episodes might plan a Visit 6 months after the start of the first treatment period, regardless of how many disease episodes have occurred.

Visit end rules are similar to Element end rules, describing when a Visit should end. They may be expressed relative to the start or end of an Element or Epoch, or relative to the start of the Visit.

The timings of Visits relative to Elements may be expressed in terms that cannot be easily quantified. For instance, a protocol might instruct that at a baseline Visit the subject be randomized, given the study drug, and instructed to take the first dose of study Drug X at bedtime that night. This baseline Visit is thus started and ended before the start of the treatment Epoch, but we don't know how long before the start of the treatment Epoch the Visit will occur. The trial start rule might contain the value, "On the day of, but before, the start of the Treatment Epoch."

3. Visit Schedules Expressed with Ranges

Ranges may be used to describe the planned timing of Visits [e.g., 12-16 days after the start of 2nd Element], but this is different from the "windows" that may be used in selecting data points to be included in an analysis associated with that Visit. For example, although Visit 2 was planned for 12-16 days after the start of treatment, data collected 10-18 days after the start of treatment might be included in a "Visit 1" analysis. The two ranges serve different purposes.

4. Contingent Visits

Some data collection is contingent on the occurrence of a "trigger" event, or disease milestone [see the Trial Disease Milestones [TM] dataset under Section 7.3, Schedule for Assessments [TV, TD, and TM]]. When such planned data collection involves an additional clinic visit, a "contingent" Visit may be included in the trial visits table, with start a rule that describes the circumstances under which it will take place. Since values of VISITNUM must be assigned to all records in the Trial Visits dataset, a contingent Visit included in the Trial Visits dataset must have a VISITNUM, but the VISITNUM value may not be a "chronological" value, due to the uncertain timing of a contingent Visit. If contingent visits are not included in the TV dataset, then they would be treated as unplanned visits in the Subject Visits [SV] domain.

7.3.2 Trial Disease Assessments

TD – Description/Overview

A trial design domain that provides information on the protocol-specified disease assessment schedule, to be used for comparison with the actual occurrence of the efficacy assessments in order to determine whether there was good compliance with the schedule.

TD – Specification

td.xpt, Trial Disease Assessments — Trial Design, Version 3.2. One record per planned constant assessment period, Tabulation.

Variable NameVariable LabelTypeControlled Terms, Codelist or Format1RoleCDISC NotesCoreSTUDYIDStudy IdentifierChar
IdentifierUnique identifier for a study.ReqDOMAINDomain AbbreviationCharTDIdentifierTwo-character abbreviation for the domain.ReqTDORDERSequence of Planned Assessment ScheduleNum
TimingA number given to ensure ordinal sequencing of the planned assessment schedules within a trial.ReqTDANCVARAnchor Variable NameChar
TimingA reference to the date variable name that provides the start point from which the planned disease assessment schedule is measured. This must be a referenced from the ADaM ADSL dataset, e.g. "ANCH1DT". Note: TDANCVAR will contain the name of a reference date variable.ReqTDSTOFFOffset from the AnchorCharISO 8601TimingA fixed offset from the date provided by the variable referenced in TDANCVAR. This is used when the timing of planned cycles does not start on the exact day referenced in the variable indicated in TDANCVAR. The value of this variable will be either zero or a positive value and will be represented in ISO 8601 character format.ReqTDTGTPAIPlanned Assessment IntervalCharISO 8601TimingThe planned interval between disease assessments represented in ISO 8601 character format.ReqTDMINPAIPlanned Assessment Interval MinimumCharISO 8601TimingThe lower limit of the allowed range for the planned interval between disease assessments represented in ISO 8601 character format.ReqTDMAXPAIPlanned Assessment Interval MaximumCharISO 8601TimingThe upper limit of the allowed range for the planned interval between disease assessments represented in ISO 8601 character format.ReqTDNUMRPTMaximum Number of Actual AssessmentsNum
Record QualifierThis variable must represent the maximum number of actual assessments for the analysis that this disease assessment schedule describes. In a trial where the maximum number of assessments is not defined explicitly in the protocol [e.g., assessments occur until death], TDNUMRPT should represent the maximum number of disease assessments that support the efficacy analysis encountered by any subject across the trial at that point in time.Req

¹ In this column, * indicates the variable may be subject to controlled terminology, and CDISC/NCI codelist code values are enclosed in [parenthesis].

TD – Assumptions

  1. The purpose of the TD domain is to provide information on planned scheduling of disease assessments when the scheduling of disease assessments is not necessarily tied to the scheduling of visits. In oncology studies, good compliance with the disease-assessment schedule is essential to reduce the risk of "assessment time bias". The TD domain makes possible an evaluation of "assessment time bias" from SDTM, in particular, for studies with progression-free survival [PFS] endpoints. TD has limited utility within oncology and was developed specifically with RECIST in mind and where an assessment time bias analysis is appropriate. It is understood that extending this approach to Cheson and other criteria may not be appropriate or may pose difficulties. It is also understood that this approach may not be necessary in non-oncology studies, although it is available for use if appropriate.
  2. A planned schedule of assessments will have a defined start point and the TDANCVAR variable is used to identify the variable in ADSL that holds the "anchor" date. By default, the anchor variable for the first pattern is ANCH1DT. An anchor date must be provided for each pattern of assessments and each anchor variable must exist in ADSL. TDANCVAR is therefore a Required variable. Anchor date variable names should adhere to ADaM variable naming conventions [e.g. ANCH1DT, ANCH2DT, etc]. One anchor date may be used to anchor more than one pattern of disease assessments. When that is the case, the appropriate offset for the start of a subsequent pattern, represented as an ISO 8601 duration value, should be provided in the TDSTOFF variable.
  3. The TDSTOFF variable is used in conjunction with the anchor date value [from the anchor date variable identified in TDANCVAR]. If the pattern of disease assessments does not start exactly on a date collected on the CRF, this variable will represent the offset between the anchor date value and the start date of the pattern of disease assessments. This may be a positive or negative interval value represent in an ISO 8601 format.
  4. This domain should not be created when the disease assessment schedule may vary for individual subjects, for example when completion of the first phase of a study is event driven.

TD – Examples

Example

This example shows a study where the disease assessment schedule changes over the course of the study. In this example, there are three distinct disease-assessment schedule patterns. A single anchor date variable [TDANCVAR] provides the anchor date for each pattern. The offset variable [TDSTOFF] used in conjunction with the anchor date variable provides the start point of each pattern of assessments..

  • The first disease-assessment schedule pattern starts at the reference start date [identified in the ADSL ANCH1DT variable] and repeats every 8 weeks for a total of six repeats [i.e., Week 8, Week 16, Week 24, Week 32, Week 40, and Week 48]. Note that there is an upper and lower limit around the planned disease assessment target where the first assessment [8 Weeks] could occur as early as Day 53 and as late as Week 9. This upper and lower limit [-3 days, +1 week] would be applied to all assessments during that pattern.
  • The second disease assessment schedule starts from Week 48 and repeats every 12 weeks for a total of 4 repeats [ i.e., Week 60, Week 72, Week 84, Week 96], with respective upper and lower limits of -1 week and + 1 week.
  • The third disease assessment schedule starts from Week 96 and repeats every 24 weeks [i.e. Week 120, Week 144, etc.] ,with respective upper and lower limits of -1 week and + 1 week, for an indefinite length of time. The schematic above shows that, for the third pattern, assessments will occur until disease progression, and this therefore leaves the pattern open ended. However, when data is included in an analysis, the total number of repeats can be identified and the highest number of repeat assessments for any subject in that pattern must be recorded in the TDNUMRPT variable on the final pattern record.

td.xpt

RowSTUDYIDDOMAINTDORDERTDANCVARTDSTOFFTDTGTPAITDMINPAITDMAXPAITDNUMRPT1ABC123TD1ANCH1DTP0DP8WP53DP9W62ABC123TD2ANCH1DTP60WP12WP11WP13W43ABC123TD3ANCH1DTP120WP24WP23WP25W12

Example

This example is the same as Example 1, except that there is a rest period of 14 days prior to the start of the second disease-assessment schedule. This example also shows how three different reference/anchor dates can be used.

  • The Rest is not represented as a row in this domain since no disease assessments occur during the Rest. Note that although the Rest epoch in this example is not important for TD, it is important that it is represented in other trial design datasets.
  • The second pattern of assessments starts on the date identified in the ADSL variable ANCH2DT and repeats every 12 weeks for a total of 4 repeats with respective upper and lower limits of -1 week and + 1 week,.
  • The third disease assessment schedule pattern follows on from the second pattern starting on the date identified in the ADSL variable ANCH3DT and repeats every 24 weeks with respective upper and lower limits of -1 week and + 1 week. The schematic above for the final disease-assessment pattern indicates that assessments will occur until disease progression, and this therefore leaves the pattern open ended. However, when data is included in an analysis, the total number of repeats can be identified and the highest number of repeat assessments for any subject in that pattern must be recorded in the TDNUMRPT variable on the final pattern record. In this instance, the maximum number of observed assessments was 17.

td.xpt

RowSTUDYIDDOMAINTDORDERTDANCVARTDSTOFFTDTGTPAITDMINPAITDMAXPAITDNUMRPT1ABC123TD1ANCH1DTP0DP8WP53DP9W62ABC123TD2ANCH2DTP0DP12WP11WP13W43ABC123TD3ANCH3DTP0DP24WP23WP25W17

Example

This example shows a study where subjects are randomized to standard treatment or an experimental treatment. The subjects who are randomized to standard treatment are given the option to receive experimental treatment after the end of the standard treatment [e.g., disease progression on standard treatment]. In the randomized treatment Epoch, the disease assessment schedule changes over the course of the study. At the start of the extension treatment Epoch, subjects are re-baselined, i.e., an extension baseline disease assessment is performed and the disease assessment schedule is restarted.

In this example, there are three distinct disease-assessment schedule patterns.

  • The first disease-assessment schedule pattern starts at the reference start date [identified in the ADSL ANCH1DT variable] and repeats every 8 weeks for a total of six repeats [ i.e., Week 8, Week 16, Week 24, Week 32, Week 40, and Week 48], with respective upper and lower limits of - 3 days and + 1 week.
  • The second disease assessment schedule starts from Week 48 and repeats every 12 weeks [i.e., Week 60, Week 72, etc.], with respective upper and lower limits of -1 week and + 1 week, for an indefinite length of time. The schematic above shows that, for the second pattern, assessments will occur until disease progression, and this therefore leaves the pattern open ended.
  • The third disease assessment schedule starts at the extension reference start date [identified in the ADSL ANCH2DT variable] from Week 96 and repeats every 24 weeks [i.e., Week 120, Week 144, etc.], with respective upper and lower limits of -1 week and + 1 week, for an indefinite length of time. The schematic above shows that, for the third pattern, assessments will occur until disease progression, and this therefore leaves the pattern open ended.

For open-ended patterns, the total number of repeats can be identified when the data analysis is performed; the highest number of repeat assessments for any subject in that pattern must be recorded in the TDNUMRPT variable on the final pattern record.

td.xpt

RowSTUDYIDDOMAINTDORDERTDANCVARTDSTOFFTDTGPAITDMINPAITDMAXPAITDNUMRPT1ABC123TD1ANCH1DTP0DP8WP53DP9W62ABC123TD2ANCH1DTP60WP12WP11WP13W173ABC123TD3ANCH2DTP0DP12WP11WP13W17

7.3.3 Trial Disease Milestones

TM – Description/Overview

A trial design domain that is used to describe disease milestones, which are observations or activities anticipated to occur in the course of the disease under study, and which trigger the collection of data.

TM – Specification

tm.xpt, Trial Disease Milestones — Trial Design, Version 1.0. One record per Disease Milestone type, Tabulation.

Variable NameVariable LabelTypeControlled Terms, Codelist or Format1RoleCDISC NotesCoreSTUDYIDStudy IdentifierChar
IdentifierUnique identifier for a study.ReqDOMAINDomainCharTMIdentifierTwo-character abbreviation for the domain, which must be TM.ReqMIDSTYPEDisease Milestone TypeChar
TopicThe type of Disease Milestone. Example: "HYPOGLYCEMIC EVENT".ReqTMDEFDisease Milestone DefinitionChar
RuleDefinition of the Disease Milestone.ReqTMRPTDisease Milestone Repetition IndicatorChar[NY]Record QualifierIndicates whether this is a Disease Milestone that can occur only once ["N"] or a type of Disease Milestone that can occur multiple times ["Y"].Req

¹ In this column, * indicates the variable may be subject to controlled terminology, and CDISC/NCI codelist code values are enclosed in [parenthesis].

TM – Assumptions

  1. Disease Milestones may be things that would be expected to happen before the study, or may be things that are anticipated to happen during the study. The occurrence of Disease Milestones for particular subjects are represented in the Subject Disease Milestones [SM] dataset.
  2. The dataset contains a record for each type of Disease Milestone. The Disease Milestone is defined in TMDEF.

TM – Examples

Example

In this diabetes study, initial diagnosis of diabetes and the hypoglycemic events that occur during the trial have been identified as Disease Milestones of interest.

Row 1:Shows that the initial diagnosis is given the MIDSTYPE of "DIAGNOSIS" and is defined in TMDEF. It is not repeating [occurs only once].Row 2:Shows that hypoglycemic events are given the MIDSTYPE of "HYPOGLYCEMIC EVENT", and a definition in TMDEF. For an actual study, the definition would be expected to include a particular threshold level, rather than the text "threshold level" used in this example. A subject may experience multiple hypoglycemic events as indicated by TMRPT = "Y".

tm.xpt

RowSTUDYIDDOMAINMIDSTYPETMDEFTMRPT1XYZTMDIAGNOSISInitial diagnosis of diabetes, the first time a physician told the subject they had diabetesN2XYZTMHYPOGLYCEMIC EVENTHypoglycemic Event, the occurrence of a glucose level below [threshold level]Y

7.4 Trial Summary and Eligibility [TI and TS]

This subsection contains the Trial Design datasets that describe:

  • The characteristics of the trial: Section 7.4.1, Trial Summary [TS]
  • Subject eligibility criteria for trial participation: Section 7.4.2, Trial Inclusion/Exclusion Criteria [TI]

The TI and TS datasets are tabular synopses of parts of the study protocol.

7.4.1 Trial Inclusion/Exclusion Criteria

TI – Description/Overview

A trial design domain that contains one record for each of the inclusion and exclusion criteria for the trial. This domain is not subject oriented.

It contains all the inclusion and exclusion criteria for the trial, and thus provides information that may not be present in the subject-level data on inclusion and exclusion criteria. The IE domain [described in Section 6.3.4, Inclusion/Exclusion Criteria Not Met] contains records only for inclusion and exclusion criteria that subjects did not meet.

TI – Specification

ti.xpt, Trial Inclusion/Exclusion Criteria — Trial Design, Version 3.2. One record per I/E crierion, Tabulation.

Variable NameVariable LabelTypeControlled Terms, Codelist or Format1RoleCDISC NotesCoreSTUDYIDStudy IdentifierChar
IdentifierUnique identifier for a study.ReqDOMAINDomain AbbreviationCharTIIdentifierTwo-character abbreviation for the domain.ReqIETESTCDIncl/Excl Criterion Short NameChar*TopicShort name IETEST. It can be used as a column name when converting a dataset from a vertical to a horizontal format. The value in IETESTCD cannot be longer than 8 characters, nor can it start with a number [e.g., "1TEST" is not valid]. IETESTCD cannot contain characters other than letters, numbers, or underscores. The prefix "IE" is used to ensure consistency with the IE domain.ReqIETESTInclusion/Exclusion CriterionChar*Synonym QualifierFull text of the inclusion or exclusion criterion. The prefix "IE" is used to ensure consistency with the IE domain.ReqIECATInclusion/Exclusion CategoryChar[IECAT]Grouping QualifierUsed for categorization of the inclusion or exclusion criteria.ReqIESCATInclusion/Exclusion SubcategoryChar*Grouping QualifierA further categorization of the exception criterion. Can be used to distinguish criteria for a sub-study or to categorize as major or minor exceptions. Examples: "MAJOR", "MINOR".PermTIRLInclusion/Exclusion Criterion RuleChar
RuleRule that expresses the criterion in computer-executable form. See Assumption 4.PermTIVERSProtocol Criteria VersionsChar
Record QualifierThe number of this version of the Inclusion/Exclusion criteria. May be omitted if there is only one version.Perm

¹ In this column, * indicates the variable may be subject to controlled terminology, and CDISC/NCI codelist code values are enclosed in [parenthesis].

TI – Assumptions

  1. If inclusion/exclusion criteria were amended during the trial, then each complete set of criteria must be included in the TI domain. TIVERS is used to distinguish between the versions.
  2. Protocol version numbers should be used to identify criteria versions, though there may be more versions of the protocol than versions of the inclusion/exclusion criteria. For example, a protocol might have versions 1, 2, 3, and 4, but if the inclusion/exclusion criteria in version 1 were unchanged through versions 2 and 3, and changed only in version 4, then there would be two sets of inclusion/exclusion criteria in TI: one for version 1 and one for version 4.
  3. Individual criteria do not have versions. If a criterion changes, it should be treated as a new criterion, with a new value for IETESTCD. If criteria have been numbered and values of IETESTCD are generally of the form INCL00n or EXCL00n, and new versions of a criterion have not been given new numbers, separate values of IETESTCD might be created by appending letters, e.g., INCL003A, INCL003B.
  4. IETEST contains the text of the inclusion/exclusion criterion. However, since entry criteria are rules, the variable TIRL has been included in anticipation of the development of computer executable rules.
  5. If a criterion text is 200 characters, put meaningful text in IETEST and describe the full text in the study metadata. See Section 4.5.3.1, Test Name [--TEST] Greater than 40 Characters, for further information.

TI – Examples

Example

This example shows records for a trial that had two versions of inclusion/exclusion criteria.

Rows 1-3:Show the two inclusion criteria and one exclusion criterion for version 1 of the protocol.Rows 4-6:Show the inclusion/exclusion criteria for version 2.2 of the protocol, which changed the minimum age for entry from 21 to 18.

ti.xpt

RowSTUDYIDDOMAINIETESTCDIETESTIECATTIVERS1XYZTIINCL01Has disease under studyINCLUSION12XYZTIINCL02Age 21 or greaterINCLUSION13XYZTIEXCL01Pregnant or lactatingEXCLUSION14XYZTIINCL01Has disease under studyINCLUSION2.25XYZTIINCL02AAge 18 or greaterINCLUSION2.26XYZTIEXCL01Pregnant or lactatingEXCLUSION2.2

7.4.2 Trial Summary

TS – Description/Overview

A trial design domain that contains one record for each trial summary characteristic. This domain is not subject oriented.

The Trial Summary [TS] dataset allows the sponsor to submit a summary of the trial in a structured format. Each record in the Trial Summary dataset contains the value of a parameter, a characteristic of the trial. For example, Trial Summary is used to record basic information about the study such as trial phase, protocol title, and trial objectives. The Trial Summary dataset contains information about the planned and actual trial characteristics.

TS – Specification

ts.xpt, Trial Summary Information — Trial Design, Version 3.2. One record per trial summary parameter value, Tabulation.

Variable NameVariable LabelTypeControlled Terms, Codelist or Format1RoleCDISC NotesCoreSTUDYIDStudy IdentifierChar
IdentifierUnique identifier for a study.ReqDOMAINDomain AbbreviationCharTSIdentifierTwo-character abbreviation for the domain.ReqTSSEQSequence NumberNum
IdentifierSequence number given to ensure uniqueness within a dataset. Allows inclusion of multiple records for the same TSPARMCD.ReqTSGRPIDGroup IDChar
IdentifierUsed to tie together a group of related records.PermTSPARMCDTrial Summary Parameter Short NameChar[TSPARMCD]TopicTSPARMCD [the companion to TSPARM] is limited to 8 characters and does not have special character restrictions. These values should be short for ease of use in programming, but it is not expected that TSPARMCD will need to serve as variable names. Examples: "AGEMIN", "AGEMAX".ReqTSPARMTrial Summary ParameterChar[TSPARM]Synonym QualifierTerm for the Trial Summary Parameter. The value in TSPARM cannot be longer than 40 characters. Examples: "Planned Minimum Age of Subjects", "Planned Maximum Age of Subjects".ReqTSVALParameter ValueChar*Result QualifierValue of TSPARM. Example: "ASTHMA" when TSPARM value is "Trial Indication". TSVAL can only be null when TSVALNF is populated. Text over 200 characters can be added to additional columns TSVAL1-TSVALn. See Assumption 8.ExpTSVALNFParameter Null FlavorCharISO 21090 NullFlavor enumerationResult QualifierNull flavor for the value of TSPARM, to be populated if and only if TSVAL is null.PermTSVALCDParameter Value CodeChar*Result QualifierThis is the code of the term in TSVAL. For example, "6CW7F3G59X" is the code for Gabapentin; "C49488" is the code for Y. The length of this variable can be longer than 8 to accommodate the length of the external terminology.ExpTSVCDREFName of the Reference TerminologyChar
Result QualifierThe name of the Reference Terminology from which TSVALCD is taken. For example; CDISC, SNOMED, ISO 8601.ExpTSVCDVERVersion of the Reference TerminologyChar
Result QualifierThe version number of the Reference Terminology, if applicable.Exp

¹ In this column, * indicates the variable may be subject to controlled terminology, and CDISC/NCI codelist code values are enclosed in [parenthesis].

TS – Assumptions

  1. The intent of this dataset is to provide a summary of trial information. This is not subject-level data.
  2. A list of values for TSPARM and TSPARMCD can be found in CDISC controlled terminology, available at //www.cancer.gov/research/resources/terminology/cdisc.
  3. Further information about the parameters is included Appendix C1, Trial Summary Codes. TSVAL may have controlled terminology depending on the value of TSPARMCD. Conditions for including parameters are included in Appendix C1, Trial Summary Codes.
  4. Controlled terminology for TSPARM is extensible. The meaning of any added parameters should be explained in the metadata for the TS dataset.
  5. For some trials, there will be multiple records in the Trial Summary dataset for a single parameter. For example, a trial that addresses both Safety and Efficacy could have two records with TSPARMCD = "TTYPE", one with the TSVAL = "SAFETY" and the other with TSVAL = "EFFICACY".

    TSSEQ has a different value for each record for the same parameter.

    Note that this is different from datasets that contain subject data, where the --SEQ variable has a different value for each record for the same subject.

  6. The method for treating text > 200 characters in Trial Summary is similar to that used for the Comments [CO] special purpose domain [Section 5.1, Comments]. If TSVAL is > 200 characters, then it should be split into multiple variables, TSVAL-TSVALn. See Section 4.5.3.2, Text Strings Greater than 200 Characters in Other Variables.
  7. Since TS does not contain subject-level data, there is no restriction analogous to the requirement in subject-level datasets that the blocks bound by TSGRPID are within a subject. TSGRPID can be used to tie together any block of records in the dataset. TSGRPID is most likely to be used when the TS dataset includes multiple records for the same parameter. For example, if a trial compared a dose of 50 mg twice a day with a dose of 100 mg once a day, a record with TSPARMCD = "DOSE" and TSVAL = "50" and a record with TSPARMCD = "DOSFREQ" and TSVAL = "BID" could be assigned one GRPID, while a record with TSPARMCD = "DOSE" and TSVAL = "100" and a record with TSPARMCD = "DOSFREQ" and TSVAL = "Q24H" could be assigned a different GRPID.
  8. The order of parameters in the examples of TS datasets should not be taken as a requirement. There are no requirements or expectations about the order of parameters within the TS dataset.
  9. Not all protocols describe objectives in a way that specifically designates each objective as "primary" or "secondary". If the protocol does not provide information about which objectives meet the definition of TSPARM = "OBJPRIM" [i.e., "The principle purpose of the trial"], then all objectives should be given as values of TSPARM = "OBJPRIM". The Trial Summary Parameter "Trial Secondary Objective" is defined as "The auxiliary purpose of the trial". A protocol may use multiple designations for objectives that are not primary [e.g., Secondary, Tertiary, and Exploratory], but all these non-primary objectives should be given as values of TSPARM = "OBJSEC".
  10. As per the definitions, the Primary Outcome Measure is associated with the Primary Objective and the Secondary Outcome Measure is associated with the Secondary Objective. It is possible for the same Outcome measure to be associated with more than one objective. For example two objectives could use the same outcome measure at different time points, or using different analysis methods.
  11. If a primary objective is assessed by means of multiple outcome measures, then all of these outcome measures should be provided as values of TSPARM = "OUTMSPR". Similarly, all outcome measures used to assess secondary objectives should be provided as values of TSPARM = "OUTMSSEC".
  12. There is a code value for TSVALCD only when there is controlled terminology for TSVAL. For example; when TSPARMCD = "PLANSUB" or TSPARMCD = "TITLE", then TSVALCD will be null.
  13. Trial Indication: A clinical pharmacology study on healthy volunteers, whose sole purpose is to collect pharmacokinetic data, would have no trial indication, so TSVAL would be null and TSVALNF would be "NA". A vaccine study on healthy subjects, whose intended purpose is to prevent influenza infection, would have INDIC = "Influenza". If the trial is to treat, diagnosis, or prevent a disease, then INDIC is "If Applicable".
  14. TSVALNF contains a "null flavor," a value that provides additional coded information when TSVAL is null. For example, for TSPARM = "MAXAGE", there is no value if a study does not specify a maximum age. In this case, the appropriate null flavor is "PINF", which stands for "positive infinity". In a clinical pharmacology study conducted in healthy volunteers for a drug which indications are not yet established, the appropriate null flavor for TINDC would be "NA", which stands for "not applicable". TSVALNF can also be used in a case where the value of a particular parameter is unknown.
  15. Dun and Bradstreet [D&B] maintains its "data universal numbering system," known as DUNS. It issues unique 9-digit numbers to businesses. Each sponsor organization has a DUNS number. A UNII [Unique Ingredient Identifier] is an identifier for a single defined substance. The UNII is a non- proprietary, free, unique, unambiguous, non semantic, alphanumeric identifier based on a substance's molecular structure and/or descriptive information.

TS – Examples

Example

This example shows all of the parameters that are required or expected in the Trial Summary dataset. Use controlled terminology for TSVAL, available at: //www.cancer.gov/research/resources/terminology/cdisc.

ts.xpt

RowSTUDYIDDOMAINTSSEQTSGRPIDTSPARMCDTSPARMTSVALTSVALNFTSVALCDTSVCDREFTSVCDVER1XYZTS1
ADDONAdded on to Existing TreatmentsY
C49488CDISC2011-06-102XYZTS1
AGEMAXPlanned Maximum Age of SubjectsP70Y

ISO 8601
3XYZTS1
AGEMINPlanned Minimum Age of SubjectsP18M

ISO 8601
4XYZTS1
LENGTHTrial LengthP3M

ISO 8601
5XYZTS1
PLANSUBPlanned Number of Subjects300



6XYZTS1
RANDOMTrial is RandomizedY
C49488CDISC2011-06-107XYZTS1
SEXPOPSex of ParticipantsBOTH
C49636CDISC2011-06-108XYZTS1
STOPRULEStudy Stop RulesINTERIM ANALYSIS FOR FUTILITY



9XYZTS1
TBLINDTrial Blinding SchemaDOUBLE BLIND
C15228CDISC2011-06-1010XYZTS1
TCNTRLControl TypePLACEBO
C49648CDISC2011-06-1011XYZTS1
TDIGRPDiagnosis GroupNeurofibromatosis Syndrome [Disorder]
19133005SNOMED
12XYZTS1
INDICTrial Disease/Condition IndicationTonic-Clonic Epilepsy [Disorder]
352818000SNOMED
13XYZTS1
TINDTPTrial Intent TypeTREATMENT
C49656CDISC2011-06-1014XYZTS1
TITLETrial TitleA 24 Week Study of Oral Gabapentin vs. Placebo as add-on Treatment to Phenytoin in Subjects with Epilepsy due to Neurofibromatosis



15XYZTS1
TPHASETrial Phase ClassificationPhase II Trial
C15601CDISC2011-06-1016XYZTS1
TTYPETrial TypeEFFICACY
C49666CDISC2011-06-1017XYZTS2
TTYPETrial TypeSAFETY
C49667CDISC2011-06-1018XYZTS1
CURTRTCurrent Therapy or TreatmentPhenytoin
6158TKW0C5UNII
19XYZTS1
OBJPRIMTrial Primary ObjectiveReduction in the 3-month seizure frequency from baseline



20XYZTS1
OBJSECTrial Secondary ObjectivePercent reduction in the 3-month seizure frequency from baseline



21XYZTS2
OBJSECTrial Secondary ObjectiveReduction in the 3-month tonic-clonic seizure frequency from baseline



22XYZTS1
SPONSORClinical Study SponsorPharmaco
1234567DUNS
23XYZTS1
TRTInvestigational Therapy or TreatmentGabapentin
6CW7F3G59XUNII
24XYZTS1
RANDQTRandomization Quotient0.67



25XYZTS1
STRATFCTStratification FactorSEX



26XYZTS1
REGIDRegistry IdentifierNCT123456789
NCT123456789ClinicalTrials.GOV
27XYZTS2
REGIDRegistry IdentifierXXYYZZ456
XXYYZZ456EUDRAC
28XYZTS1
OUTMSPRIPrimary Outcome MeasureSEIZURE FREQUENCY



29XYZTS1
OUTMSSECSecondary Outcome MeasureSEIZURE FREQUENCY



30XYZTS2
OUTMSSECSecondary Outcome MeasureSEIZURE DURATION



31XYZTS1
OUTMSEXPExploratory Outcome MeasureSEIZURE INTENSITY



32XYZTS1
PCLASPharmacological ClassAnti-epileptic Agent
N0000175753MED-RT
33XYZTS1
FCNTRYPlanned Country of Investigational SitesUnited States of America
USAISO 3166
34XYZTS2
FCNTRYPlanned Country of Investigational SitesCanada
CANISO 3166
35XYZTS3
FCNTRYPlanned Country of Investigational SitesMexico
MEXISO 3166
36XYZTS1
ADAPTAdaptive DesignN
C49487CDISC2011-06-1037XYZTS1DateDesc1DCUTDTCData Cutoff Date2011-04-01

ISO 8601
38XYZTS1DateDesc1DCUTDESCData Cutoff DescriptionDATABASE LOCK



39XYZTS1
INTMODELIntervention ModelPARALLEL
C82639CDISC
40XYZTS1
NARMSPlanned Number of Arms3



41XYZTS1
STYPEStudy TypeINTERVENTIONAL
C98388CDISC
42XYZTS1
INTTYPEIntervention TypeDRUG
C1909CDISC
43XYZTS1
SSTDTCStudy Start Date2009-03-11

ISO 8601
44XYZTS1
SENDTCStudy End Date2011-04-01

ISO 8601
45XYZTS1
ACTSUBActual Number of Subjects304



46XYZTS1
HLTSUBJIHealthy Subject IndicatorN
C49487CDISC2011-06-1047XYZTS1
SDMDURStable Disease Minimum DurationP3W

ISO 8601
48XYZTS1
CRMDURConfirmed Response Minimum DurationP28D

ISO 8601

Example

This example shows an example of how to implement the null flavor in TSVALNF when the value in TSVAL is missing. Note that when TSVAL is null, TSVALCD is also null, and no code system is specified in TSVCDREF and TSVCDVER.

Row 1:Shows that there was no upper limit on planned age of subjects, as indicated by TSVALNF = "PINF", the null value that means "positive infinity".Row 2:Shows that Trial Phase Classification is not applicable, as indicated by TSVALNF = "NA".

ts.xpt

RowSTUDYIDDOMAINTSSEQTSGRPIDTSPARMCDTSPARMTSVALTSVALNFTSVALCDTSVCDREFTSVCDVER1XYZTS1
AGEMAXPlanned Maximum Age of Subjects
PINF


2XYZTS2
TPHASETrial Phase Classification
NA


7.4.2.1 Use of Null Flavor

The variable TSVALNF is based on the idea of a "null flavor" as embodied in the ISO 21090 standard, "Health Informatics – Harmonized data types for information exchange." A null flavor is an ancillary piece of data that provides additional information when its primary piece of data is null [has a missing value]. There is controlled terminology for the null flavor data item which includes such familiar values as Unknown, Other, and Not Applicable among its fourteen terms.

The proposal to include a null flavor variable to supplement the TSVAL variable in the Trial Summary dataset arose when it was realized that the Trial Summary model did not have a good way to represent the fact that a protocol placed no upper limit on the age of study subjects. When the trial summary parameter is AGEMAX, then TSVAL should have a value expressed as an ISO8601 time duration [e.g., P43Y for 43 years old or P6M for 6 months old]. While it would be possible to allow a value such as NONE or UNBOUNDED to be entered in TSVAL, validation programs would then have to recognize this special term as an exception to the expected data format. Therefore, it was decided that a separate null flavor variable that uses the ISO 21090 null flavor terminology would be a better solution.

It was also decided to specify the use of a null flavor variable with this updated release of trial summary as a way of testing the use of such a variable in a limited setting. As its title suggests, the ISO 21090 standard was developed for use with healthcare data, and it is expected that it will eventually see wide use in the clinical data from which clinical trial data is derived. CDISC already uses this data type standard in the BRIDG model and the CDISC SHARE project. The null flavor, in particular, is a solution to the widespread problem of needing or wanting to convey information that will help in the interpretation of a missing value. Although null flavors could certainly be eventually used for this purpose in other cases, such as with subject data, doing so at this time would be extremely disruptive and premature. The use of null flavors for the one variable TSVAL should provide an opportunity for sponsors and reviewers to learn about the null flavors and to evaluate their usefulness in one concrete setting.

The controlled terminology for null flavor, which supersedes use of Appendix C1, Trial Summary Codes, is included below

NullFlavor Enumeration. OID: 2.16.840.1.113883.5.10081NINo informationThe value is exceptional [missing, omitted, incomplete, improper]. No information as to the reason for being an exceptional value is provided. This is the most general exceptional value. It is also the default exceptional value.2INVInvalidThe value as represented in the instance is not a member of the set of permitted data values in the constrained value domain of a variable.3OTHOtherThe actual value is not a member of the set of permitted data values in the constrained value domain of a variable [e.g., concept not provided by required code system].4PINFPositive infinityPositive infinity of numbers4NINFNegative infinityNegative infinity of numbers3UNCUnencodedNo attempt has been made to encode the information correctly, but the raw source information is represented [usually in original Text].3DERDerivedAn actual value may exist, but it must be derived from the information provided [usually an expression is provided directly].2UNKUnknownA proper value is applicable, but not known.3ASKUAsked but unknownInformation was sought but not found [e.g., patient was asked but didn't know].4NAVTemporarily unavailableInformation is not available at this time, but is expected to be available later.3NASKNot askedThis information has not been sought [e.g., patient was not asked].3QSSufficient quantityThe specific quantity is not known, but is known to be non-zero and is not specified because it makes up the bulk of the material. For example, if directions said, "Add 10 mg of ingredient X, 50 mg of ingredient Y, and sufficient quantity of water to 100 ml", the null flavor "QS" would be used to express the quantity of water.3TRCTraceThe content is greater than zero, but too small to be quantified.2MSKMasked

There is information on this item available, but it has not been provided by the sender due to security, privacy or other reasons. There may be an alternate mechanism for gaining access to this information.

WARNING — Use of this null flavor does provide information that may be a breach of confidentiality, even though no detailed data are provided. Its primary purpose is for those circumstances where it is necessary to inform the receiver that the information does exist without providing any detail.

2NANot applicableNo proper value is applicable in this context [e.g., last menstrual period for a male].

The numbers in the first column of the table above describe the hierarchy of these values, i.e.:

  • No information
    • Invalid
      • Other
        • Positive infinity
        • Negative infinity
      • Unencoded
      • Derived
    • Unknown
      • Asked but unknown
        • Temporarily unavailable
      • Not asked
      • Quantity sufficient
      • Trace
    • Masked
    • Not applicable

The one value at level 1, No information, is the least informative. It merely confirms that the primary piece of data is null.

The values at level 2 provide a little more information, distinguishing between situations where the primary piece of data is not applicable and those where it is applicable but masked, unknown, or "invalid", i.e., not in the correct format to be represented in the primary piece of data.

The values at levels 3 and 4 provide successively more information about the situation. For example, for the MAXAGE case that provided the impetus for the creation of the TSVALNF variable, the value PINF means that there is information about the maximum age, but it is not something that can be expressed, as in the ISO8601 quantity of time format required for populating TSVAL. The null flavor PINF provides the most complete information possible in this case, i.e., that the maximum age for the study is unbounded.

7.5 How to Model the Design of a Clinical Trial

The following steps allow the modeler to move from more-familiar concepts, such as Arms, to less-familiar concepts, such as Elements and Epochs. The actual process of modeling a trial may depart from these numbered steps. Some steps will overlap; there may be several iterations; and not all steps are relevant for all studies.

  1. Start from the flow chart or schema diagram usually included in the trial protocol. This diagram will show how many Arms the trial has, and the branch points, or decision points, where the Arms diverge.
  2. Write down the decision rule for each branching point in the diagram. Does the assignment of a subject to an Arm depend on a randomization? On whether the subject responded to treatment? On some other criterion?
  3. If the trial has multiple branching points, check whether all the branches that have been identified really lead to different Arms. The Arms will relate to the major comparisons the trial is designed to address. For some trials, there may be a group of somewhat different paths through the trial that are all considered to belong to a single Arm.
  4. For each Arm, identify the major time periods of treatment and non-treatment a subject assigned to that Arm will go through. These are the Elements, or building blocks, of which the Arm is composed.
  5. Define the starting point of each Element. Define the rule for how long the Element should last. Determine whether the Element is of fixed duration.
  6. Re-examine the sequences of Elements that make up the various Arms and consider alternative Element definitions. Would it be better to "split" some Elements into smaller pieces or "lump" some Elements into larger pieces? Such decisions will depend on the aims of the trial and plans for analysis.
  7. Compare the various Arms. In most clinical trials, especially blinded trials, the pattern of Elements will be similar for all Arms, and it will make sense to define Trial Epochs. Assign names to these Epochs. During the conduct of a blinded trial, it will not be known which Arm a subject has been assigned to, or which treatment Elements they are experiencing, but the Epochs they are passing through will be known.
  8. Identify the Visits planned for the trial. Define the planned start timings for each Visit, expressed relative to the ordered sequences of Elements that make up the Arms. Define the rules for when each Visit should end.
  9. If this is an oncology trial or another trial with disease assessments that are not necessarily tied to visits, find the planned timing of disease assessments in the protocol and record it in the Trial Disease Assessments dataset.
  10. If the protocol includes data collection that is triggered by the occurrence of certain events, interventions, or findings, record those triggers in the Trial Disease Milestones dataset. Note that disease milestones may be pre-study [such as disease diagnosis] or on-study.
  11. Identify the inclusion and exclusion criteria to be able to populate the TI dataset. If inclusion and exclusion criteria were amended so that subjects entered under different versions, populate TIVERS to represent the different versions.
  12. Populate the TS dataset with summary information.

8 Representing Relationships and Data

The defined variables of the SDTM general observation classes could restrict the ability of sponsors to represent all the data they wish to submit. Collected data that may not entirely fit includes relationships between records within a domain, records in separate domains, and sponsor-defined "variables". As a result, the SDTM has methods to represent distinct types of relationships, all of which are described in more detail in subsequent sections. These include the following:

  • Section 8.1, Relating Groups of Records Within a Domain Using the --GRPID Variable, describes representing a relationship between a group of records for a given subject within the same domain.
  • Section 8.2, Relating Peer Records, describes representing relationships between independent records [usually in separate domains] for a subject, such as a concomitant medication taken to treat an adverse event.
  • Section 8.3, Relating Datasets, describes representing a relationship between two [or more] datasets where records of one [or more] dataset[s] are related to record[s] in another dataset [or datasets].
  • Section 8.4, Relating Non-Standard Variables Values to a Parent Domain, describes the method for representing the dependent relationship where data that cannot be represented by a standard variable within the demographics domain [DM] or a general-observation-class domain record [or records] can be related back to that record [or records].
  • Section 8.5, Relating Comments to a Parent Domain, describes representing a dependent relationship between a comment in the Comments domain [see also Section 5, Comments] and a parent record [or records] in other domains, such as a comment recorded in association with an adverse event.
  • Section 8.6, How to Determine Where Data Belong in SDTM-Compliant Data Tabulations, discusses the concept of related datasets and whether to place additional data in a separate domain or a Supplemental Qualifier special purpose dataset, and the concept of modeling findings data that refer to data in another general observation class domain.
  • Section 8.7, Relating Study Subjects, describes representing collected relationships between persons, both of whom are study subjects. For example "MOTHER, BIOLOGICAL", "CHILD, BIOLOGICAL", "TWIN, DIZOGOTIC".

All relationships make use of the standard domain identifiers, STUDYID, DOMAIN, and USUBJID. In addition, the variables IDVAR and IDVARVAL are used for identifying the record-level merge/join keys. These keys are used to tie information together by linking records. The specific set of identifiers necessary to properly identify each type of relationship is described in detail in the following sections. Examples of variables that could be used in IDVAR include the following:

  • The Sequence Number [--SEQ] variable uniquely identifies a record for a given USUBJID within a domain. The variable --SEQ is required in all domains except DM. For example, if a subject has 25 adverse events in the Adverse Event [AE] domain, then 25 unique AESEQ values should be established for this subject. Conventions for establishing and maintaining --SEQ values are sponsor-defined. Values may or may not be sequential depending on data processes and sources.
  • The Reference Identifier [--REFID] variable can be used to capture a sponsor-defined or external identifier, such as an identifier provided in an electronic data transfer. Some examples are lab-specimen identifiers and ECG identifiers. --REFID is permissible in all general-observation-class domains, but is never required. Values for --REFID are sponsor-defined and can be any alphanumeric strings the sponsor chooses, consistent with their internal practices.
  • The Grouping Identifier [--GRPID] variable, used to link related records for a subject within a domain, is explained below in Section 8.1, Relating Groups of Records Within a Domain Using the --GRPID Variable.

8.1 Relating Groups of Records Within a Domain Using the --GRPID Variable

The optional grouping identifier variable --GRPID is Permissible in all domains that are based on the general observation classes. It is used to identify relationships between records within a USUBJID within a single domain. An example would be Intervention records for a combination therapy where the treatments in the combination varies from subject to subject. In such a case, the relationship is defined by assigning the same unique character value to the --GRPID variable. The values used for --GRPID can be any values the sponsor chooses; however, if the sponsor uses values with some embedded meaning [rather than arbitrary numbers], those values should be consistent across the submission to avoid confusion. It is important to note that --GRPID has no inherent meaning across subjects or across domains.

Using --GRPID in the general observation class domains can reduce the number of records in the RELREC, SUPP--, and CO datasets, when those datasets are submitted to describe relationships/associations for records or values to a "group" of general observation class records.

8.1.1 --GRPID Example

The following table illustrates --GRPID used in the Concomitant Medications [CM] domain to identify a combination therapy. In this example, both subjects 1234 and 5678 have reported two combination therapies, each consisting of three separate medications. The components of a combination all have the same value for CMGRPID.

This example illustrates how CMGRPID groups information only within a subject within a domain.

Rows 1-3:Show three medications taken by subject "1234". GMGRPID = "COMBO THPY 1" has been used to group these medications.Rows 4-6:Show three different medications taken by subject "1234" with CMGRPID = "COMBO THPY 2".Rows 7-9:Show three medications taken by subject "5678". CMGRPID = "COMBO THPY 1" has been used to group these medications. Note that the medications with GMGRPID "COMBO THPY 1" are completely different for subjects "1234" and "5678".Rows 10-12:Show three different medications taken by subject "5678" with CMGRPID = "COMBO THPY 2". Again, the medications with "COMBO THPY 2" are completely different for subjects "1234" and "5678".

cm.xpt

RowSTUDYIDDOMAINUSUBJIDCMSEQCMGRPIDCMTRTCMDECODCMDOSECMDOSUCMSTDTCCMENDTC11234CM12341COMBO THPY 1Verbatim Med AGeneric Med A100mg2004-01-172004-01-1921234CM12342COMBO THPY 1Verbatim Med BGeneric Med B50mg2004-01-172004-01-1931234CM12343COMBO THPY 1Verbatim Med CGeneric Med C200mg2004-01-172004-01-1941234CM12344COMBO THPY 2Verbatim Med DGeneric Med D150mg2004-01-212004-01-2251234CM12345COMBO THPY 2Verbatim Med EGeneric Med E100mg2004-01-212004-01-2261234CM12346COMBO THPY 2Verbatim Med FGeneric Med F75mg2004-01-212004-01-2271234CM56781COMBO THPY 1Verbatim Med GGeneric Med G37.5mg2004-03-172004-03-2581234CM56782COMBO THPY 1Verbatim Med HGeneric Med H60mg2004-03-172004-03-2591234CM56783COMBO THPY 1Verbatim Med IGeneric Med I20mg2004-03-172004-03-25101234CM56784COMBO THPY 2Verbatim Med JGeneric Med J100mg2004-03-212004-03-22111234CM56785COMBO THPY 2Verbatim Med KGeneric Med K50mg2004-03-212004-03-22121234CM56786COMBO THPY 2Verbatim Med LGeneric Med L10mg2004-03-212004-03-22

8.2 Relating Peer Records

The Related Records [RELREC] special purpose dataset is used to describe relationships between records for a subject [as described in this section], and relationships between datasets [as described in Section 8.3, Relating Datasets]. In both cases, relationships represented in RELREC are collected relationships, either by explicit references or check boxes on the CRF, or by design of the CRF, such as vital signs captured during an exercise stress test.

A relationship is defined by adding a record to RELREC for each record to be related and by assigning a unique character identifier value for the relationship. Each record in the RELREC special purpose dataset contains keys that identify a record [or group of records] and the relationship identifier, which is stored in the RELID variable. The value of RELID is chosen by the sponsor, but must be identical for all related records within USUBJID. It is recommended that the sponsor use a standard system or naming convention for RELID [e.g., all letters, all numbers, capitalized].

Records expressing a relationship are specified using the key variables STUDYID, RDOMAIN [the domain code of the record in the relationship], and USUBJID, along with IDVAR and IDVARVAL. Single records can be related by using a unique-record-identifier variable such as --SEQ in IDVAR. Groups of records can be related by using grouping variables such as --GRPID in IDVAR. IDVARVAL would contain the value of the variable described in IDVAR. Using --GRPID can be a more efficient method of representing relationships in RELREC, such as when relating an adverse event [or events] to a group of concomitant medications taken to treat the adverse event[s].

The RELREC dataset should be used to represent either:

  • Explicit relationships, such as concomitant medications taken as a result of an adverse event.
  • Information of a nature that necessitates using multiple datasets, as described in Section 8.3, Relating Datasets.

8.2.1 RELREC Dataset

relrec.xpt, Related Records — Relationships, Version 3.3. One record per related record, group of records or dataset, Tabulation.

Variable NameVariable LabelTypeControlled Terms, Codelist or Format1RoleCDISC NotesCoreSTUDYIDStudy IdentifierChar
IdentifierUnique identifier for a study.ReqRDOMAINRelated Domain AbbreviationChar[DOMAIN]IdentifierAbbreviation for the domain of the parent record[s].ReqUSUBJIDUnique Subject IdentifierChar
IdentifierIdentifier used to uniquely identify a subject across all studies for all applications or submissions involving the product.ExpIDVARIdentifying VariableChar*IdentifierName of the identifying variable in the general-observation-class dataset that identifies the related record[s]. Examples include --SEQ and --GRPID.ReqIDVARVALIdentifying Variable ValueChar
IdentifierValue of identifying variable described in IDVAR. If --SEQ is the variable being used to describe this record, then the value of --SEQ would be entered here.ExpRELTYPERelationship TypeChar[RELTYPE]Record QualifierIdentifies the hierarchical level of the records in the relationship. Values should be either ONE or MANY. Used only when identifying a relationship between datasets [as described in Section 8.3, Relating Datasets].ExpRELIDRelationship IdentifierChar
Record QualifierUnique value within USUBJID that identifies the relationship. All records for the same USUBJID that have the same RELID are considered "related/associated." RELID can be any value the sponsor chooses, and is only meaningful within the RELREC dataset to identify the related/associated Domain records.Req

¹ In this column, * indicates the variable may be subject to controlled terminology, and CDISC/NCI codelist code values are enclosed in [parenthesis].

8.2.2 RELREC Dataset Examples

Example

This example illustrates the use of the RELREC dataset to relate records stored in separate domains for USUBJID = "123456". This example represents a situation in which an adverse event is related both to concomitant medications and to lab tests, but there is no relationship between the lab values and the concomitant medications.

Rows 1-3:Show the representation of a relationship between an AE record and two concomitant medication records.Rows 4-6:Show the representation of a relationship between the same AE record and two laboratory findings records.

relrec.xpt

RowSTUDYIDRDOMAINUSUBJIDIDVARIDVARVALRELTYPERELID1EFC1234AE123456AESEQ5
12EFC1234CM123456CMSEQ11
13EFC1234CM123456CMSEQ12
14EFC1234AE123456AESEQ5
25EFC1234LB123456LBSEQ47
26EFC1234LB123456LBSEQ48
2

Example

Example 2 is the same scenario as Example 1. In this case, however, the way the data were collected indicated that the concomitant medications and laboratory findings were all in a single relationship with the adverse event.

relrec.xpt

RowSTUDYIDRDOMAINUSUBJIDIDVARIDVARVALRELTYPERELID1EFC1234AE123456AESEQ5
12EFC1234CM123456CMSEQ11
13EFC1234CM123456CMSEQ12
14EFC1234LB123456LBSEQ47
15EFC1234LB123456LBSEQ48
1

Example

Example 3 is the same scenario as Example 2. However, the sponsor grouped the two concomitant medications using CMGRPID = "COMBO 1", allowing the relationship among these five records to be represented with four, rather than five, records in the RELREC dataset.

relrec.xpt

RowSTUDYIDRDOMAINUSUBJIDIDVARIDVARVALRELTYPERELID1EFC1234AE123456AESEQ5
12EFC1234CM123456CMGRPIDCOMBO1
13EFC1234LB123456LBSEQ47
14EFC1234LB123456LBSEQ48
1

Additional examples may be found in the domain examples such as Section 6.2.4, Disposition, Example 4, and all of the Pharmacokinetics examples in Section 6.3.11.3, Relating PP Records to PC Records.

8.3 Relating Datasets

The Related Records [RELREC] special purpose dataset can also be used to identify relationships between datasets [e.g., a one-to-many or parent-child relationship]. The relationship is defined by including a single record for each related dataset that identifies the key[s] of the dataset that can be used to relate the respective records.

Relationships between datasets should only be recorded in the RELREC dataset when the sponsor has found it necessary to split information between datasets that are related, and that may need to be examined together for analysis or proper interpretation. Note that it is not necessary to use the RELREC dataset to identify associations from data in the SUPP-- datasets or the CO dataset to their parent general-observation-class dataset records or special purpose domain records, as both these datasets include the key variable identifiers of the parent record[s] that are necessary to make the association.

8.3.1 RELREC Dataset Relationship Example

Example

This example illustrates RELREC records used to represent the relationship between records in two datasets that have a one-to-many relationship. In the example below, all the records in one domain are being related to all of the records in the other, so both USUBJID and IDVARVAL are null.

relrec.xpt

RowSTUDYIDRDOMAINUSUBJIDIDVARIDVARVALRELTYPERELID1EFC1234TU
TULNKID
ONE12EFC1234TR
TRLNKID
MANY1

In the sponsor's operational database, these datasets may have existed as either separate datasets that were merged for analysis, or one dataset that may have included observations from more than one general observation class [e.g., Events and Findings]. The value in IDVAR must be the name of the key used to merge/join the two datasets. In the above example, the --LNKID variable is used as the key to identify the related observations. The values for the --LNKID variable in the two datasets are sponsor defined. Although other variables may also serve as a single merge key when the corresponding values for IDVAR are equal, --GRPID, --SPID, --REFID, --LNKID, or --LNKGRP are typically used for this purpose.

The variable RELTYPE identifies the type of relationship between the datasets. The allowable values are ONE and MANY [controlled terminology is expected]. This information defines how a merge/join would be written, and what would be the result of the merge/join. The possible combinations are the following:

  1. ONE and ONE. This combination indicates that there is NO hierarchical relationship between the datasets and the records in the datasets. Only one record from each dataset will potentially have the same value of the IDVAR within USUBJID.
  2. ONE and MANY. This combination indicates that there IS a hierarchical [parent-child] relationship between the datasets. One record within USUBJID in the dataset identified by RELTYPE = "ONE" will potentially have the same value of the IDVAR with many [one or more] records in the dataset identified by RELTYPE = "MANY".
  3. MANY and MANY. This combination is unusual and challenging to manage in a merge/join, and may represent a relationship that was never intended to convey a usable merge/join, such as described in Section 6.3.12.3, Relating PP Records to PC Records.

Since IDVAR identifies the keys that can be used to merge/join records between the datasets, --SEQ cannot be used because --SEQ only has meaning within a subject within a dataset, not across datasets.

8.4 Relating Non-Standard Variables Values to a Parent Domain

The SDTM does not allow the addition of new variables. Therefore, the Supplemental Qualifiers special purpose dataset model is used to capture non-standard variables and their association to parent records in general-observation-class datasets [Events, Findings, Interventions] and Demographics. Supplemental Qualifiers are represented as separate SUPP-- datasets for each dataset containing sponsor-defined variables [see Section 8.4.2, Submitting Supplemental Qualifiers in Separate Datasets, for more on this topic].

SUPP-- represents the metadata and data for each non-standard variable/value combination. As the name "Supplemental Qualifiers" suggests, this dataset is intended to capture additional Qualifiers for an observation. Data that represent separate observations should be treated as separate observations. The Supplemental Qualifiers dataset is structured similarly to the RELREC dataset, in that it uses the same set of keys to identify parent records. Each SUPP-- record also includes the name of the Qualifier variable being added [QNAM], the label for the variable [QLABEL], the actual value for each instance or record [QVAL], the origin [QORIG] of the value [see Section 4.1.8, Origin Metadata], and the Evaluator [QEVAL] to specify the role of the individual who assigned the value [such as ADJUDICATION COMMITTEE or SPONSOR]. Controlled terminology for certain expected values for QNAM and QLABEL is included in Appendix C2, Supplemental Qualifiers Name Codes.

SUPP-- datasets are also used to capture attributions. An attribution is typically an interpretation or subjective classification of one or more observations by a specific evaluator, such as a flag that indicates whether an observation was considered to be clinically significant. Since it is possible that different attributions may be necessary in some cases, SUPP-- provides a mechanism for incorporating as many attributions as are necessary. A SUPP-- dataset can contain both objective data [where values are collected or derived algorithmically] and subjective data [attributions where values are assigned by a person or committee]. For objective data, the value in QEVAL will be null. For subjective data [when QORIG = "Assigned"], the value in QEVAL should reflect the role of the person or institution assigning the value [e.g., "SPONSOR" or "ADJUDICATION COMMITTEE"].

The combined set of values for the first six columns [STUDYID…QNAM] should be unique for every record. That is, there should not be multiple records in a SUPP-- dataset for the same QNAM value, as it relates to IDVAR/IDVARVAL for a USUBJID in a domain. For example, if two individuals provide a determination on whether an Adverse Event is Treatment Emergent [e.g., the investigator and an independent adjudicator], then separate QNAM values should be used for each set of information, perhaps "AETRTEMI" and "AETRTEMA". This is necessary to ensure that reviewers can join/merge/transpose the information back with the records in the original domain without risk of losing information.

Just as use of the optional grouping identifier variable, --GRPID, can be a more efficient method of representing relationships in RELREC, it can also be used in a SUPP-- dataset to identify individual qualifier values [SUPP-- records] related to multiple general-observation-class domain records that could be grouped, such as relating an attribution to a group of ECG measurements.

8.4.1 Supplemental Qualifiers – SUPP-- Datasets

supp--.xpt, Supplemental Qualifiers for [domain name] — Relationships, Version 3.3. One record per IDVAR, IDVARVAL, and QNAM value per subject, Tabulation.

Variable NameVariable LabelTypeControlled Terms, Codelist or Format1RoleCDISC NotesCoreSTUDYIDStudy IdentifierChar
IdentifierStudy identifier of the parent record[s].ReqRDOMAINRelated Domain AbbreviationChar[DOMAIN]IdentifierTwo-character abbreviation for the domain of the parent record[s].ReqUSUBJIDUnique Subject IdentifierChar
IdentifierUnique subject identifier of the parent record[s].ReqIDVARIdentifying VariableChar*IdentifierIdentifying variable in the dataset that identifies the related record[s]. Examples: --SEQ, --GRPID.ExpIDVARVALIdentifying Variable ValueChar
IdentifierValue of identifying variable of the parent record[s].ExpQNAMQualifier Variable NameChar*TopicThe short name of the Qualifier variable, which is used as a column name in a domain view with data from the parent domain. The value in QNAM cannot be longer than 8 characters, nor can it start with a number [e.g., "1TEST" is not valid]. QNAM cannot contain characters other than letters, numbers, or underscores. This will often be the column name in the sponsor's operational dataset.ReqQLABELQualifier Variable LabelChar
Synonym QualifierThis is the long name or label associated with QNAM. The value in QLABEL cannot be longer than 40 characters. This will often be the column label in the sponsor's original dataset.ReqQVALData ValueChar
Result QualifierResult of, response to, or value associated with QNAM. A value for this column is required; no records can be in SUPP-- with a null value for QVAL.ReqQORIGOriginChar
Record QualifierSince QVAL can represent a mixture of collected [on a CRF], derived, or assigned items, QORIG is used to indicate the origin of this data. Examples include "CRF", "Assigned", or "Derived". See Section 4.1.8, Origin Metadata.ReqQEVALEvaluatorChar*Record QualifierUsed only for results that are subjective [e.g., assigned by a person or a group]. Should be null for records that contain objectively collected or derived data. Some examples include "ADJUDICATION COMMITTEE", "STATISTICIAN", "DATABASE ADMINISTRATOR", "CLINICAL COORDINATOR", etc.Exp

¹ In this column, * indicates the variable may be subject to controlled terminology, and CDISC/NCI codelist code values are enclosed in [parenthesis].

A record in a SUPP-- dataset relates back to its parent record[s] via the key identified by the STUDYID, RDOMAIN, USUBJID, and IDVAR/IDVARVAL variables. An exception is SUPP-- dataset records that are related to Demographics [DM] records, where both IDVAR and IDVARVAL will be null because the key variables STUDYID, RDOMAIN, and USUBJID are sufficient to identify the unique parent record in DM [DM has one record per USUBJID].

All records in the SUPP-- datasets must have a value for QVAL. Transposing source variables with missing/null values may generate SUPP-- records with null values for QVAL, causing the SUPP-- datasets to be extremely large. When this happens, the sponsor must delete the records where QVAL is null prior to submission.

See Section 4.5.3, Text Strings That Exceed the Maximum Length for General-Observation-Class Domain Variables, for information on representing data values greater than 200 characters in length.

See Appendix C2, Supplemental Qualifiers Name Codes, for controlled terminology for QNAM and QLABEL for some of the most common Supplemental Qualifiers. Additional QNAM values may be created as needed, following the guidelines provided in the CDISC Notes for QVAL.

8.4.2 Submitting Supplemental Qualifiers in Separate Datasets

There is a one-to-one correspondence between a domain dataset and its Supplemental Qualifier dataset. The single SUPPQUAL dataset option that was introduced in SDTMIG v3.1 was deprecated. The set of Supplemental Qualifiers for each domain is included in a separate dataset with the name SUPP-- where "--" denotes the source domain which the Supplemental Qualifiers relate back to. For example, Demographics Qualifiers would be submitted in suppdm.xpt. When data have been split into multiple datasets [see Section 4.1.7, Splitting Domains], longer names such as SUPPFAMH may be needed. In cases where data about Associated Persons [see Associated Persons Implementation Guide] have been collected, Supplemental Qualifiers for Findings About events or interventions for an associated person may need to be represented. A dataset name with the SUPP fragment, e.g., SUPPAPFAMH, would be too long. In this case only, the "SUPP" portion should be shortened to "SQ", resulting in a dataset name such as SQAPFAMH.

8.4.3 SUPP-- Examples

The examples below llustrate how a set of SUPP-- datasets could be used to relate non-standard information to a parent domain.

Example

The two rows of suppae.xpt add qualifying information to adverse event data [RDOMAIN = "AE"]. IDVAR defines the key variable used to link this information to the AE data [AESEQ]. IDVARVAL specifies the value of the key variable within the parent AE record that the SUPPAE record applies to. The remaining columns specify the supplemental variables' names [AESOSP and AETRTEM], labels, values, origin, and who made the evaluation.

suppae.xpt

RowSTUDYIDRDOMAINUSUBJIDIDVARIDVARVALQNAMQLABELQVALQORIGQEVAL11996001AE99-401AESEQ1AESOSPOther Medically Important SAESpontaneous AbortionCRF
21996001AE99-401AESEQ1AETRTEMTreatment Emergent FlagNDerivedSPONSOR

Example

This example illustrates how the language used for a questionnaire might be represented. The parent domain [RDOMAIN] is QS, and IDVAR is QSCAT. QNAM holds the name of the Supplemental Qualifier variable being defined [QSLANG]. The language recorded in QVAL applies to all of the subject's records where IDVAR [QSCAT] equals the value specified in IDVARVAL. In this case, IDVARVAL has values for two questionnaires [BPI and ADAS-COG] for two separate subjects. QVAL identifies the questionnaire language version [French or German] for each subject.

suppqs.xpt

RowSTUDYIDRDOMAINUSUBJIDIDVARIDVARVALQNAMQLABELQVALQORIGQEVAL11996001QS99-401QSCATBPIQSLANGQuestionnaire LanguageFRENCHCRF
21996001QS99-401QSCATADAS-COGQSLANGQuestionnaire LanguageFRENCHCRF
31996001QS99-802QSCATBPIQSLANGQuestionnaire LanguageGERMANCRF
41996001QS99-802QSCATADAS-COGQSLANGQuestionnaire LanguageGERMANCRF

Additional examples may be found in the domain examples, such as in Section 5.2 Demographics, Examples 3 and 4, in Section 6.3.3, ECG Test Results, Example 1, and in Section 6.3.6, Laboratory Test Results, Example 1.

8.4.4 When Not to Use Supplemental Qualifiers

The following are examples of data that should not be submitted as Supplemental Qualifiers:

  • Subject-level objective data that fit in Subject Characteristics [SC]. Examples include "National Origin" and "Twin Type".
  • Findings interpretations that should be added as an additional test code and result. An example of this would be a record for ECG interpretation where EGTESTCD = "INTP", and the same EGGRPID or EGREFID value would be assigned for all records associated with that ECG, See Section 4.5.5, Clinical Significance for Findings Observation Class Data.
  • Comments related to a record or records contained within a parent dataset. Although they may have been collected in the same record by the sponsor, comments should instead be captured in the CO special purpose domain.
  • Data not directly related to records in a parent domain. Such records should instead be captured in either a separate general observation class domain or special purpose domain.

8.5 Relating Comments to a Parent Domain

The Comments [CO] special purpose domain, which is described in Section 5.1, Comments, is used to capture unstructured free text comments. It allows for the submission of comments related to a particular domain [e.g., Adverse Events] or those collected on separate general-comment log-style pages not associated with a domain. Comments may be related to a Subject, a domain for a Subject, or to specific parent records in any domain. The Comments special purpose domain is structured similarly to the Supplemental Qualifiers [SUPP--] dataset, in that it uses the same set of keys [STUDYID, RDOMAIN, USUBJID, IDVAR, and IDVARVAL] to identify related records.

All comments except those collected on log-style pages not associated with a domain are considered child records of subject data captured in domains. STUDYID, USUBJID, and DOMAIN [with the value CO] must always be populated. RDOMAIN, IDVAR, and IDVARVAL should be populated as follows:

  1. Comments related only to a subject in general [likely collected on a log-style CRF page/screen] would have RDOMAIN, IDVAR, IDVARVAL null, as the only key needed to identify the relationship/association to that subject is USUBJID.
  2. Comments related only to a specific domain [and not to any specific record[s]] for a subject would populate RDOMAIN with the domain code for the domain with which they are associated. IDVAR and IDVARVAL would be null.
  3. Comments related to specific domain record[s] for a subject would populate the RDOMAIN, IDVAR, and IDVARVAL variables with values that identify the specific parent record[s].

If additional information was collected further describing the comment relationship to a parent record[s], and it cannot be represented using the relationship variables, RDOMAIN, IDVAR and IDVARVAL, this can be done by two methods:

  1. Values may be placed in COREF, such as the CRF page number or name.
  2. Timing variables may be added to the CO special purpose domain, such as VISITNUM and/or VISIT. See CO Assumption 5 for a complete list of Identifier and Timing variables that can be added to the CO special purpose domain.

As with Supplemental Qualifiers [SUPP--] and Related Records [RELREC], --GRPID and other grouping variables can be used as the value in IDVAR to identify comments with relationships to multiple domain records, for example as a comment that applies to a group of concomitant medications, perhaps taken as a combination therapy. The limitation on this is that a single comment may only be related to a group of records in one domain [RDOMAIN can have only one value]. If a single comment relates to records in multiple domains, the comment may need to be repeated in the CO special purpose domain to facilitate the understanding of the relationships.

Examples for Comments data can be found in Section 5.1, Comments.

8.6 How to Determine Where Data Belong in SDTM-Compliant Data Tabulations

8.6.1 Guidelines for Determining the General Observation Class

Section 2.6, Creating a New Domain, discusses when to place data in an existing domain and how to create a new domain. A key part of the process of creating a new domain is determining whether an observation represents an Event, Intervention, or Finding. Begin by considering the content of the information in the light of the definitions of the three general observation classes [see Section 2.3, The General Observation Classes], rather than by trying to deduce the class from the information's physical structure; physical structure can sometimes be misleading. For example, from a structural standpoint, one might expect Events observations to include a start and stop date. However, Medical History data [data about previous conditions or events] is Events data regardless of whether dates were collected.

An Intervention is something that is done to a subject [possibly by the subject] that is expected to have a physiological effect. This concept of an intended effect makes Interventions relatively easy to recognize, although there are gray areas around some testing procedures. For example, exercise stress tests are designed to produce and then measure certain physiological effects. The measurements from such a testing procedure are Findings, but some aspects of the procedure might be modeled as Interventions.

An Event is something that happens to a subject spontaneously. Most, although not all, Events data captured in clinical trials is about medical events. Since many medical events must, by regulation, be treated as adverse events, a new Events domain will be created only for events that are clearly not adverse events; the existing Medical History and Clinical Events domain are the appropriate places to store most medical events that are not adverse events. Many aspects of medical events, including tests performed to evaluate them, interventions that may have caused them, and interventions given to treat them, may be collected in clinical trials. Where to place data on assessments of events can be particularly challenging, and is discussed further in Section 8.6.3, Guidelines for Differentiating Between Events, Findings, and Findings About Events.

Findings general observation class data are measurements, tests, assessments, or examinations performed on a subject in the clinical trial. They may be performed on the subject as a whole [e.g., height, heart rate], or on a "specimen" taken from a subject [e.g., a blood sample, an ECG tracing, a tissue sample]. Sometimes the relationship between a subject and a finding is less direct; a finding may be about an event that happened to the subject or an intervention they received. Findings about Events and Interventions are discussed further in Section 8.6.3, Guidelines for Differentiating Between Events, Findings, and Findings About Events.

8.6.2 Guidelines for Forming New Domains

It may not always be clear whether a set of data represents one topic or more than one topic, and thus whether it should be combined into one domain or split into two or more domains. This implementation guide shows examples of both.

In some cases, a single data structure works well for a variety of types of data. For example, all questionnaire data are placed in the QS domain, with particular questionnaires identified by QSCAT [see Section 6.3.13, Questionnaires, Ratings, and Scales [QRS] Domains]. Although some operational databases may store urinalysis data in a separate dataset, SDTM places all lab data in the LB domain [see Section 6.3.6, Laboratory Test Results] with urinalysis tests identified using LBSPEC.

In other cases, a particular topic may be very broad and/or require more than one data structure [and therefore require more than one dataset]. Two examples in this implementation guide are the topics of microbiology and pharmacokinetics. Both have been modeled using two domain datasets [see Section 6.3.7, Microbiology Domains, and Section 6.3.11, Pharmacokinetics Domains]. This is because, within these scientific areas, there is more than one topic, and each topic results in a different data structure. For example, the topic for PC is plasma [or other specimen] drug concentration as a function of time, and the structure is one record per analyte per time point per reference time point [e.g., dosing event] per subject. PP contains characteristics of the time-concentration curve such as AUC, Cmax, Tmax, half-life, and elimination rate constant; the structure is one record per parameter per analyte per reference time point per subject.

8.6.3 Guidelines for Differentiating Between Events, Findings, and Findings About Events

This section discusses Events, Findings, and Findings about Events. The relationship between Interventions, Findings, and Findings about Interventions would be handled similarly.

The Findings About domain was specially created to store findings about events. This section discusses Events and Findings generally, but it is particularly useful for understanding the distinction between the CE and FA domains.

There may be several sources of confusion about whether a particular piece of data belongs in an Event record or a Findings record. One generally thinks of an event as something that happens spontaneously, and has a beginning and end; however, one should consider the following:

  • Events of interest in a particular trial may be pre-specified, rather than collected as free text.
  • Some events may be so long lasting in that they are perceived as "conditions" rather than "events", and their beginning and end dates are not of interest.
  • Some variables or data items one generally expects to see in an Events record may not be present. For example, a post-marketing study might collect the occurrence of certain adverse events, but no dates.
  • Properties of an Event may be measured or assessed, and these are then treated as Findings About Events, rather than as Events.
  • Some assessments of events [e.g., severity, relationship to study treatment] have been built into the SDTM Events model as Qualifiers, rather than being treated as Findings About Events.
  • Sponsors may choose how they define an Event. For example, adverse event data may be submitted using one record that summarizes an event from beginning to end, or using one record for each change in severity.

The structure of the data being considered, although not definitive, will often help determine whether the data represent an Event or a Finding. The questions below may assist sponsors in deciding where data should be placed in SDTM.

QuestionInterpretation of AnswersIs this a measurement, with units, etc.?

  • "Yes" answer indicates a Finding.
  • "No" answer is inconclusive.
Are the data collected in a CRF for each visit, or an overall CRF log-form?
  • Collection forms that are independent of visits suggest Event or Intervention general observation class data.
  • Data collected at visits are usually for items that can be controlled by the study schedule, namely planned Findings or planned [study] Interventions or Events.
  • Data collected at an initial visit may fall into any of the three general observation classes.
What date/times are collected?
  • If the dates collected are start and/or end dates, then data are probably about an Event or Intervention.
  • If the dates collected are dates of assessments, then data probably represent a Finding.
  • If dates of collection are different from other dates collected, it suggests that data are historical or are about an Event or Intervention that happened independently of the study schedule for data collection.
Is verbatim text collected and then coded?
  • A "Yes" answer suggests that this is Events or Interventions general observation class data. However, Findings general observation classdata from an examination that identifies abnormalities may also be coded. Note that for Events and Interventions general observation classdata, the topic variable is coded, whereas for Findings general observation classdata, it is the result that is coded.
  • A "No" answer is inconclusive. It does not rule out Events or Interventions general observation classdata, particularly if Events or Interventions are pre-specified; it also does not rule out Findings general observation class data.
If this is data about an event, does it apply to the event as a whole?
  • A "Yes" answer suggests this is traditional Events general observation classdata, and it should have a record in an Events domain.
  • A "No" answer suggests that there are multiple time-based findings about an event, and that these data should be treated as Findings About data.

The Events general observation class is intended for observations about a clinical event as a whole. Such observations typically include what the condition was [captured in --TERM, the topic variable] and when it happened [captured in its start and/or end dates]. Other qualifier values collected [severity, seriousness, etc.] apply to the totality of the event. Note that sponsors may choose how they define the "event as a whole."

Data that do not describe the event as a whole should not be stored in the record for that event or in a --SUPP record tied to that event. If there are multiple assessments of an event, then each should be stored in a separate FA record.

When data related to an event do not fit into one of the existing Event general observation class Qualifiers, the first question to consider is whether the data represent information about the event itself, or about something [a Finding or Intervention] that is associated with the event.

  • If the data consist of a finding or intervention that is associated with the event, it is likely that it can be stored in a relevant Findings or Intervention general observation class dataset, with the connection to the Event record being captured using RELREC. For example, if a subject had a fever of 102 that was treated with aspirin, the fever would be stored in an adverse event record, the temperature could be stored in a vital signs record, and the aspirin could be stored in a concomitant medication record; RELREC might be used to link those records.
  • If the data item contains information about the event, then consider storing it as a Supplemental Qualifier. However, a number of circumstances may rule out the use of a Supplemental Qualifier:
    • The data are measurements that need units, normal ranges, etc.
    • The data are about the non-occurrence or non-evaluation of a pre-specified adverse event, data that may not be stored in the Adverse Event domain, since each record in the AE domain must represent a reportable event that occurred.

If a Supplemental Qualifier is not appropriate, the data may be stored in Findings About. Section 6.4, Findings About Events or Interventions, provides additional information and examples.

8.7 Relating Study Subjects

RELSUB – Description/Overview

RELSUB describes collected relationships between study subjects.

Some studies include subjects who are related to each other, and in some cases it is important to record those relationships. Studies in which pregnant women are treated, and both the mother and her child[ren] are study subjects are the most common case in which relationships between subjects are collected. There are also studies of genetically based diseases where subjects who are related to each other are enrolled, and the relationships between subjects are recorded.

RELSUB – Specification

relsub.xpt, Related Subjects — Relationships, Version 1.0. One record per relationship per related subject per subject, Tabulation.

Variable NameVariable LabelTypeControlled Terms, Codelist or Format1RoleCDISC NotesCoreSTUDYIDStudy IdentifierChar
IdentifierUnique identifier for a study.ReqUSUBJIDUnique Subject IdentifierChar
IdentifierIdentifier used to uniquely identify a subject across all studies for all applications or submissions involving the product. Either USUBJID or POOLID must be populated.ExpPOOLIDPool IdentifierChar
IdentifierIdentifier used to identify a pool of subjects. If POOLID is entered, POOLDEF records must exist for each subject in the pool and USUBJID must be null. Either USUBJID or POOLID must be populated.PermRSUBJIDRelated Subject or Pool IdentifierChar
IdentifierIdentifier used to identify a related subject or pool of subjects. RSUBJID will be populated with either the USUBJID of the related subject or the POOLID of the related pool.ReqSRELSubject RelationshipChar[RELSUB]Record QualifierDescribes the relationship of the subject identified in USUBJID or the pool identified in POOLID to the subject or pool identified in RSUBJID.Req

¹ In this column, * indicates the variable may be subject to controlled terminology, and CDISC/NCI codelist code values are enclosed in [parenthesis].

RELSUB – Assumptions

  1. RELSUB is used to represent relationships between persons, both of whom are study subjects. A relationship between a study subject and a person who is not a study subject may not be represented in RELSUB. A relationship between a study subject and person who is not a study subject may only be reported in APRELSUB. The existence of the RELSUB dataset should not affect whether relationships are collected; that should remain a decision based on the needs of the particular study.
  2. The variable POOLID was developed for non-clinical studies, where assessments may be made for groups of animals, and identifiers are needed for those groups [pools]. It is included here because POOLID can be used for human clinical trials, if necessary. If POOLID is submitted, the POOLDEF dataset must be submitted.
  3. If POOLID is submitted, then in any record, one and only one of USUBJID and POOLID must be populated.
  4. If a study does not include the use of POOLID, then USUBJID must be populated in every record.
  5. RSUBJID must be a USUBJID value present in the DM domain. RSUBJID must be populated in every record.
  6. Values of SREL should be taken from the CDISC controlled terminology codelist RELSUB wherever possible. However, if an appropriate term does not exist in the codelist, another term may be used. The SREL term should not be less specific than the verbatim term collected. For instance, it would be inappropriate to record a relationship using the term "RELATIVE, FIRST DEGREE" when the collected relationship was "brother".
  7. Every relationship between two study subjects is represented in RELSUB as two directional relationships, one with the first subject's identifier in USUBJID and the second subject's identifier in RSUBJID, and one with the second subject's identifier in USUBJID and the first subject's identifier in RSUBJID. The SREL values in the two records will describe the same relationship, but from the viewpoint of each subject, for instance, "MOTHER, BIOLOGICAL" and "CHILD, BIOLOGICAL."
  8. All collected relationships between subjects should be recorded in RELSUB. In some cases, two subjects may have more than one relationship. For instance, a woman might be both maternal aunt and wet nurse to an infant. When there are multiple relationships between two subjects, each relationship will be represented by two records in RELSUB.

RELSUB – Examples

Example

The following data are from a hemophilia study [HEM021] in which the study subjects are a pair of fraternal [dizogotic] twins and their mother.

Some expected and required variables not needed to illustrate the example are not shown.

Row 1:Subject is the mother.Rows 2-3:Subjects are the children.

dm.xpt

RowSTUDYIDDOMAINUSUBJIDBRTHDTCAGEAGEUSEX1HEM021DMHEM021-0011941-05-1660YEARSF2HEM021DMHEM021-0021965-04-1235YEARSM3HEM021DMHEM021-0031965-04-1235YEARSM

The RELSUB table for the three subjects whose demography data is shown above.

Rows 1-2:The relationship of the mother to the two children.Rows 3, 5:The relationships of the children to the mother.Rows 4, 6:The relationships of the children to each other.

relsub.xpt

RowSTUDYIDUSUBJIDRSUBJIDSREL1HEM021HEM021-001HEM021-002MOTHER, BIOLOGICAL2HEM021HEM021-001HEM021-003MOTHER, BIOLOGICAL3HEM021HEM021-002HEM021-001CHILD, BIOLOGICAL4HEM021HEM021-002HEM021-003TWIN, DIZOGOTIC5HEM021HEM021-003HEM021-001CHILD, BIOLOGICAL6HEM021HEM021-003HEM021-002TWIN, DIZOGOTIC

9 Study References

There are occasions when it is necessary to establish study-specific terminology that will be used in subject data. Three such situations have been identified thus far:

  • Identifiers for devices
  • Identifiers for non-host organisms
  • Identifiers for pharmacogenomic/genetic biomarkers

9.1 Device Identifiers

The Device Identifiers [DI] dataset establishes identifiers for devices, which are used to populate the variable SPDEVID. The dataset was introduced as part of the SDTMIG for Medical Devices [SDTMIG-MD]. It was originally classified as a special purpose domain, but in SDTM v1.7, it is classified as a study reference dataset. The SDTMIG-MD includes the domain specification and assumptions and provides examples of its use.

9.2 Non-host Organism Identifiers

OI – Description/Overview

The Non-host Organism Identifiers domain is for storing the levels of taxonomic nomenclature of microbes or parasites that have been either experimentally determined in the course of a study or are previously known, as in the case of lab strains used as reference in the study.

The biological classification of a non-host organism typically stops at the taxonomic rank of "species". Scientific taxonomic nomenclature below the rank of species is not clearly defined, lacks a globally-accepted standard terminology, and is frequently organism-dependent. Therefore the OI domain addresses organism taxonomy with a series of parameters that name the taxa appropriate to the organism and the granularity with which the organism has been identified in the particular study.

OI – Specification

oi.xpt, Non-host Organism Identifiers — Study Reference, Version 1.0. One record per taxon per non-host organism, Tabulation.

Variable NameVariable LabelTypeControlled Terms, Codelist or Format1RoleCDISC NotesCoreSTUDYIDStudy IdentifierChar
IdentifierUnique identifier for a study.ReqDOMAINDomain AbbreviationCharOIIdentifierTwo-character abbreviation for the domain.ReqNHOIDNon-host Organism IdentifierChar
IdentifierSponsor-defined identifier for a non-host organism. NHOID should be populated with an intuitive name based on the identity of the organism as reported by the lab. It must be unique for each unique organism as defined by the specific values of the organism's entire known taxonomy described by pairs of OIPARMCD and OIVAL .ReqOISEQSequence NumberNum
IdentifierSequence number to given to ensure uniqueness within a parameter within an organism [NHOID] within dataset.ReqOIPARMCDNon-host Organism ID Element Short NameChar*TopicShort name of the taxon being described. Examples: "GROUP", "GENTYP", "SUBTYP".ReqOIPARMNon-host Organism ID Element NameChar*Synonym QualifierName of the taxon being described. Examples: "Group", "Genotype", "Subtype".ReqOIVALNon-host Organism ID Element ValueChar*Result QualifierValue for the taxon in OIPARMCD/OIPARM for the organism identified by NHOID.Req

¹ In this column, * indicates the variable may be subject to controlled terminology, and CDISC/NCI codelist code values are enclosed in [parenthesis].

OI – Assumptions

  1. Non-host organisms include viruses and organisms such as pathogens or parasites, but could also be non-pathogenic organisms such as normal intestinal flora. Non-host organism identifiers are not to be used for host species identification, such as for animals used in pre-clinical studies, nor should they be used to represent other, non-taxonomy characteristics of non-host species, such as drug susceptibility, growth rates, etc.
  2. NHOID is sponsor defined, with the following constraints:
    1. A unique NHOID must represent a unique identity as represented in its combination of OIPARMCD/OIVAL pairs. If two organisms share the same first two levels of taxonomy with regard to OIPARMCD/OIVAL, but one is identified to a third level and the other is not, they should be assigned two unique NHOIDs.
    2. Study sponsors should populate NHOID with intuitive name values based on either:
      1. the name of the organism as reported by a lab or specified by the investigator, or
      2. published references/databases where applicable and appropriate [e.g., when reference strain H77 is used in a HCV study, NHOID for this strain should be populated with "H77" or "HCV1a-H77"].
  3. NHOID can be used in any domain where observations about these organisms are being represented, allowing end-users to determine what is known about the organism's identity by merging on NHOID, or by otherwise referring to the OI domain.
  4. OIPARMCD and OIPARM must represent parameters for the identification of non-host organisms with regard to nomenclature only.
    1. Mostly, this will represent taxonomic ranks [i.e., Species] as well as commonly used grouping terms [taxa that aren't officially ranked] such as "subtype", "group", "strain", etc.
    2. They may also include other nomenclature terms that are less widely known but are used frequently for organism identification in a specific field of study [e.g., "spoligotype" in tuberculosis].
    3. They should be listed in the OI dataset in hierarchical order of least to most specific with increasing OISEQ values.
  5. Variables not listed in the OI domain table above should not be used in OI data sets.

OI – Examples

Example

This example shows taxonomic identifiers for HIV and HCV. NHOID is a unique non-host organism ID used to link findings on that organism in other datasets with details about its identification in OI. OIPARM shows the name of the individual taxa identified and OIVAL shows the experimentally determined values of those taxa.

Rows 1-4:Show the taxonomy for the HIV organism given the NHOID of HIV1MC. This virus has been identified as HIV-1, Group M, Subtype C.Rows 5-8:Show the taxonomy for the HIV organism given the NHOID of HIV1MB, which was used as a reference. This virus has been identified as HIV-1, Group M, Subtype B.Rows 9-11:Show the taxonomy for the HCV organism given the NHOID of HCV2C. This virus has been identified as HCV 2c.Rows 12-14:Show the taxonomy for the HCV organism given the NHOID of H77. This virus is a known reference strain of HCV 1a.

oi.xpt

RowSTUDYIDDOMAINNHOIDOISEQOIPARMCDOIPARMOIVAL1STUDY123OIHIV1MC1SPCIESSpeciesHIV2STUDY123OIHIV1MC2TYPEType13STUDY123OIHIV1MC3GROUPGroupM4STUDY123OIHIV1MC4SUBTYPSubtypeC5STUDY123OIHIV1MB1SPCIESSpeciesHIV6STUDY123OIHIV1MB2TYPEType17STUDY123OIHIV1MB3GROUPGroupM8STUDY123OIHIV1MB4SUBTYPSubtypeB9STUDY123OIHCV2C1SPCIESSpeciesHCV10STUDY123OIHCV2C2GENTYPGenotype211STUDY123OIHCV2C3SUBTYPSubtypeC12STUDY123OIH771SPCIESSpeciesHCV13STUDY123OIH772GENTYPGenotype114STUDY123OIH773SUBTYPSubtypeA

9.3 Pharmacogenomic/Genetic Biomarker Identifiers

The Pharmacogenomic/Genetic Biomarker Identifiers [PB] dataset establishes identifiers for pharmacogenomic/genetic biomarkers which are composed of groups of genetic variations. The dataset was introduced as part of the SDTMIG for Pharmacogenomic/Genetics [SDTMIG-PGx]. It was originally classified as a special purpose domain, but it is to be reclassified as a study reference dataset. The SDTMIG-PGx includes the domain specification and assumptions and provides examples illustrating its use.

Appendices

Appendix A: CDISC SDS Extended Leadership Team

The CDISC SDS Extended Leadership Team would like to thank the many volunteers who contributed to the development, review, and publication of SDTMIG v3.3. Additionally, this publication would not have been possible without the support of the Foundational Team Leads, Global Governance Group, Regulatory Liaisons, and CDISC.

SDS Extended Leadership TeamNameCompanyAmy AdyanthayaBiogenEllina BabouchkinaQuality Data ServicesAnthony ChowCDISCChristine Connolly - Current Leadership TeamEMD SeronoGary CunninghamThe Griesser GroupChris Gemma - Current Leadership TeamCDISCDan Godoy - Past Leadership TeamMedImmuneTom GuinterIndependentMike Hamidi - Current Leadership TeamPRA Health Sciences [formerly Merck & Co.]Sterling HardyMerck & Co. [formerly Bristol-Myers Squibb]Joyce HernandezIndependentMarcelina HungriaDIcore GroupKristin KellyPinnacle 21Éanna KielySyneosSteve KopkoCDISCBess LeRoyCDISCRichard LewisTalentMineStetson LineClinventiveTodd MaileyGlaxoSmithKlineBarrie Nelson - Past Leadership TeamNurocorJon NevilleCDISCAmy Palmer - Past Leadership TeamCDISCMelanie PaulesGlaxoSmithKlineCarlo Radovskyetera solutionsJanet Reich - Current Leadership TeamAmgenDonna SattlerEli LillyCary SmoakS-cubedSusan TierneyIQVIAMadhavi VemuriJanssen ResearchGary WalkerIQVIADarcy WoldCDISCDiane Wold - Past Leadership TeamCDISCFred Wood - Past Leadership TeamTalentMine

Appendix B: Glossary and Abbreviations

The following abbreviations and terms are used in this document. Additional definitions can be found in the CDISC Glossary available at //www.cdisc.org/standards/semantics/glossary.

ADaMCDISC Analysis Dataset ModelATC codeAnatomic Therapeutic Chemical code from WHO DrugCDISCClinical Data Interchange Standards ConsortiumCRFCase report form [sometimes case record form]CRTCase report tabulationcSDRGClinical Study Data Reviewers GuideCTCAECommon Terminology Criteria for Adverse EventsDatasetA collection of structured data in a single fileDefine-XMLCDISC standard for transmitting metadata that describes any tabular dataset structure.DomainA collection of observations with a topic-specific commonalityeDTElectronic Data TransferFDAFood and Drug AdministrationHL7Health Level 7ICHInternational Conference on Harmonisation of Technical Requirements for Registration of Pharmaceuticals for Human UseICH E2AICH guidelines on Clinical Safety Data Management: Definitions and Standards for Expedited ReportingICH E2BICH guidelines on Clinical Safety Data Management: Data Elements for Transmission of Individual Cases Safety ReportsICH E3ICH guidelines on Structure and Content of Clinical Study ReportsICH E9ICH guidelines on Statistical Principles for Clinical TrialsISOInternational Organization for StandardizationISO 8601ISO character representation of dates, date/times, intervals, and durations of time. The SDTM uses the extended format.ISO 3166ISO codelist for representing countries; the Alpha-3 codelist uses 3-character codes.LOINCLogical Observation, Identifiers, Names, and CodesMedDRAMedical Dictionary for Regulatory ActivitiesNCINational Cancer Institute [NIH]SDSSubmission Data Standards. Also the name of the team that created the SDTM and SDTMIG.SDTMStudy Data Tabulation ModelSDTMIGStudy Data Tabulation Model Implementation Guide: Human Clinical Trials [this document]SDTMIG-APStudy Data Tabulation Model Implementation Guide: Associated PersonsSDTMIG-MDStudy Data Tabulation Model Implementation Guide for Medical DevicesSDTMIG-PGxStudy Data Tabulation Model Implementation Guide: Pharmacogenomics/GeneticsSENDStandard for Exchange of Non-Clinical DataSF-36A multi-purpose, short-form health survey with 36 questionsSNOMEDSystematized Nomenclature of Medicine [a dictionary]SOCSystem Organ Class [from MedDRA]TDMTrial Design ModelUUIDUniversally Unique IdentifierWHODRUGWorld Health Organization Drug DictionaryXMLeXtensible Markup Language

Appendix C: Controlled Terminology

CDISC Terminology is centrally managed by the CDISC Controlled Terminology Team, supporting the terminology needs of all CDISC foundational standards [SDTM, CDASH, ADaM, SEND] and all disease/therapeutic area standards.

New/modified terms have a three-month development period during which the Controlled Terminology Team evaluates the requests received, incorporates as much as possible for each quarterly release, and has a quarterly public review comment period followed by a publication release.

Visit the CDISC Controlled Terminology page [//www.cdisc.org/terminology] to find the most recently published terminology packages [final or under review], or visit the NCI Enterprise Vocabulary Services CDISC Terminology website at //www.cancer.gov/research/resources/terminology/cdisc for access to the full list of CDISC terminology.

Note that the SDTM terminology was previously provided separately for questionnaires and other domains. However, as of the 2015-12-18 release, these were merged into a single publication.

SDTM Implementation Guides [v3.1.3 or earlier] included several appendices regarding Controlled Terminology. Starting with SDTMIG 3.2, Appendix C was simplified to contain only a couple of important Terminology Code Lists that are specific to this Implementation Guide.

Appendix C1: Trial Summary Codes

The Parameter table includes text to indicate if the parameter should be included in the dataset.

To make this domain useful, a minimum number of trial summary parameters should be provided as shown below. The column titled "Record with this Parameter" indicates whether the parameter should be included in the dataset. If a record is included, either TSVAL or TSVALNF must be populated.

Most of the new parameters are coming from //www.clinicaltrials.gov/ and the controlled terminology shown below is aligned with that source. All definitions of the parameters are maintained in NCI EVS.

The Notes column provides some additional information about the specific parameter or its values.

TSPARMCDTSPARMTSVAL [Codelist Name or Format]Record with this ParameterNotesADDONAdded on to Existing TreatmentsNo Yes ResponseRequired
AGEMAXPlanned Maximum Age of SubjectsISO 8601RequiredIf there is no maximum age, TSVALNF = "PINF".AGEMINPlanned Minimum Age of SubjectsISO 8601RequiredIf there is no minimum age, populate TSVAL with "P0Y".LENGTHTrial LengthISO 8601Required
PLANSUBPlanned Number of SubjectsnumberRequired
RANDOMTrial Is RandomizedNo Yes ResponseRequired
SEXPOPSex of ParticipantsSex of ParticipantsRequired
STOPRULEStudy Stop RulestextRequiredProtocol-specified stopping rule. If there is no stopping rule, record "NONE" in this field.TBLINDTrial Blinding SchemaTrial Blinding SchemaRequired
TCNTRLControl TypeControl TypeRequired
TDIGRPDiagnosis GroupSNOMED CTConditionally RequiredIf the study population is healthy subjects [i.e., healthy subjects flag is "Y"], this parameter is not expected. If the healthy subject flag is "N", then this parameter would contain the diagnosis/medical problem of the study population. [Validation rule: IF healthy volunteers = "N" then TDIGRP must be present and not null.]INDICTrial Disease/Condition IndicationSNOMED CTIf ApplicableIf applicable. Don't include if the sole purpose is to collect PK data. See TS Assumption 13.
Use as many rows as needed.TINDTPTrial Intent TypeTrial Indication TypeConditionally RequiredIf study type is "INTERVENTIONAL", this parameter is required. A study in healthy volunteers may have TSVAL null and TSVALNF = "NA".TITLETrial TitletextRequiredUse as many rows as needed.TPHASETrial Phase ClassificationTrial PhaseRequired
TTYPETrial TypeTrial TypeRequiredUse as many rows as needed.CURTRTCurrent Therapy or TreatmentSRS Preferred Substance Name [or Device Name]Conditionally RequiredRequired when ADDON equals "Y".
Use as many rows as needed for combination or multiple therapies.OBJPRIMTrial Primary ObjectivetextRequired
OBJSECTrial Secondary ObjectivetextIf ApplicableIf applicable.
Use as many rows as needed.SPONSORClinical Study SponsorDUNSRequired
COMPTRTComparative Treatment NameSRS Preferred Substance NameIf ApplicableIf applicable.
Don't include if there are no active comparators.
Use as many rows as needed.TRTInvestigational Therapy or TreatmentUNIIConditionally RequiredIf study type is "INTERVENTIONAL", this parameter is required.RANDQTRandomization QuotientnumberConditionally RequiredRequired only when there is only one investigational treatment. The value is always a number between 0 and 1. There are cases where the ratio is 1 [e.g., crossover study or open label study where all subjects are exposed to investigational therapy].STRATFCTStratification FactorAny allowable variable nameIf ApplicableIf applicable.
Use as many rows as needed, one for each factor.REGIDRegistry IdentifierCLINICALTRIALS.GOV / EUDRACRequiredUse as many rows as needed, one for each registry IDOUTMSPRIPrimary Outcome MeasuretextRequiredUse as many rows as needed.OUTMSSECSecondary Outcome MeasuretextIf ApplicableIf applicable [i.e, if the trial has a secondary outcome measure].
Use as many rows as needed.OUTMSEXPExploratory Outcome MeasuretextIf ApplicableIf applicable [i.e., if the trial has exploratory outcome measure].
Use as many rows as needed.PCLASPharmacological ClassMED-RTConditionally RequiredIf study type is "INTERVENTIONAL" and if Intervention Type is one for which pharmacological class is applicable, this parameter is required.FCNTRYPlanned Country of Investigational SitesISO 3166-1 alpha-3RequiredUse as many rows as needed, one for each country.ADAPTAdaptive DesignNo Yes ResponseRequiredDoes the protocol include any adaptive design features?DCUTDTCData Cutoff DateISO 8601RequiredUse GRPID to associate the Data Cutoff Date to Data Cutoff Description.DCUTDESCData Cutoff DescriptiontextRequiredUse GRPID to associate the Data Cutoff Date to Data Cutoff Description.INTMODELIntervention ModelIntervention ModelConditionally RequiredIf study type is "INTERVENTIONAL", this parameter is required.NARMSPlanned Number of ArmsnumberRequired
STYPEStudy TypeStudy TypeRequired
INTTYPEIntervention TypeIntervention TypeConditionally RequiredIf study type is "INTERVENTIONAL", this parameter is required.SSTDTCStudy Start DateISO 8601Required
SENDTCStudy End DateISO 8601Required
ACTSUBActual Number of SubjectsnumberRequired
HLTSUBJIHealthy Subject IndicatorNo Yes ResponseRequiredIf the healthy subject indicator is "N", then TDIGRP value should be provided.SDMDURStable Disease Minimum DurationISO 8601If ApplicableIf applicable.CRMDURConfirmed Response Minimum DurationISO 8601If ApplicableIf applicable.

Appendix C2: Supplemental Qualifiers Name Codes

The following table contains an initial set of standard name codes for use in the Supplemental Qualifiers [SUPP--]special purpose datasets. There are no specific conventions for naming QNAM and some sponsors may choose to include the 2-character domain in the QNAM variable name. Note that the 2-character domain code is not required in QNAM since it is present in the variable RDOMAIN in the SUPP-- datasets.

QNAMQLABELApplicable DomainsAESOSPOther Medically Important SAEAEAETRTEMTreatment Emergent FlagAE--CLSIGClinically SignificantFindings--REASReasonAll general observation classes

Appendix D: CDISC Variable-Naming Fragments

The CDISC SDS group has defined a standard list of fragments to use as a guide when naming variables in SUPP-- datasets [as QNAM] or assigning --TESTCD values that could conceivably be treated as variables in a horizontal listing derived from a v3.x dataset. In some cases, more than one fragment is used for a given keyword. This is necessary when a shorter fragment must be used for a --TESTCD or QNAM that incorporates several keywords that must be combined while still meeting the 8-character variable naming limit of SAS transport files. When using fragments, the general rule is to use the fragment[s] that best conveys the meaning of the variable within the 8-character limit; thus, the longer fragment should be used when space allows. If the combination of fragments still exceeds 8 characters, a character should be dropped where most appropriate [while avoiding naming conflicts if possible] to fit within the 8-character limit.

In other cases the same fragment may be used for more than one meaning, but these would not normally overlap for the same variable.

Keyword[s]FragmentACTIONACNADJUSTMENTADJANALYSIS DATASETADASSAYASBASELINEBLBIRTHBRTHBODYBODCANCERCANCATEGORYCATCHARACTERCCLASSCLASCLINICALCLCODECDCOMMENTCOMCONCOMITANTCONCONDITIONCNDCONGENITALCONGDATE TIME - CHARACTERDTCDAYDYDEATHDTHDECODEDECODDERIVEDDRVDESCRIPTIONDESCDISABILITYDISABDOSE, DOSAGEDOS, DOSEDURATIONDURELAPSEDELELEMENTETEMERGENTEMENDEND, ENETHNICITYETHNICEVALUATIONEVLEVALUATOREVALEXTERNALXFASTINGFASTFILENAMEFNFLAGFLFORMULATION, FORMFRMFREQUENCYFRQGRADEGRGROUPGRPHOSPITALIZATIONHOSPIDENTIFIERIDINDICATIONINDCINDICATORINDINTERPRETATIONINTPINTERVALINTINVESTIGATORINVLIFE-THREATENINGLIFELOCATIONLOCLOINC CODELOINCLOWER LIMITLOMEDICALLY-IMPORTANT EVENTMIENAMENAMNON-STUDY THERAPYNSTNORMAL RANGENRNOT DONENDNUMBERNUMNUMERICNOBJECTOBJONGOINGONGOORDERORDORIGINORIGORIGINALOROTHEROTH, OOUTCOMEOUTOVERDOSEODPARAMETERPARMPATTERNPATTPOPULATIONPOPPOSITIONPOSQUALIFIERQUALREASONREASREFERENCEREF, RFREGIMENRGMRELATEDREL, RRELATIONSHIPRELRESULTRESRULERLSEQUENCESEQSERIOUSS, SERSEVERITYSEVSIGNIFICANTSIGSPECIMENSPEC, SPCSPONSORSPSTANDARDST, STDSTARTSTSTATUSSTATSUBCATEGORYSCATSUBJECTSUBJSUPPLEMENTALSUPPSYSTEMSYSTEXTTXTTIMETMTIME POINTTPTTOTALTOTTOXICITYTOXTRANSITIONTRANSTREATMENTTRTUNIQUEUUNITUUNPLANNEDUPUPPER LIMITHIVALUEVALVARIABLEVARVEHICLEV

Appendix E: Revision History

This appendix summarizes revisions since the last production version.

  • A Diff file with details of changes to domain specification tables is available as a member benefit on the CDISC SHARE Exports page in the Members Only Area of the CDISC website [//www.cdisc.org/members-only/share-exports], and those changes are not repeated here.
  • The development of the standard was moved into the CDISC wiki, which affected formatting.
  • Other formatting changes include:
    • Enclosing all example values in double quotation marks ["]
    • Linking codelists in specification tables to the specific codelist in the NCI-EVS website
    • Hyperlinking references to sections of the document
    • Referring to "Define-XML Document" instead of "define.xml" or "define.xml file"
  • Example content has been highlighted using a gray vertical line on the left side of the text.
  • Many small changes to wording, intended to clarify meaning, were made, but are not detailed here.
  • Updated assumptions for most domains:
    • Domain definitions, where present in an assumption, were removed to eliminate redundancy.
    • Revised assumptions describing variables "generally not used" in a domain to clarify the variables that can be added to those in the domain specification, and that those listed as "generally not used" are not prohibited.
  • Terms in examples were updated to use current controlled terminology, where applicable.
  • For some variables in domain specification tables, more than one codelist is referenced.

Section numberSectionChange1.1PurposeRemoved outdated language.1.2Organization of this DocumentAdded new Section 9 for Study Reference Datasets.1.3Relationship to Prior CDISC DocumentsUpdated to include new domains and Section 9.1.4How to Read this Implementation GuideAdded mentions of other SDTM implementation guides SDTMIG-AP, SDTMIG-MD, and SDTMIG-PGx.1.4.1How to Read a Domain SpecificationNew section1.5How to Submit CommentsDeleted. The CDISC Discussion Forum has been decommissioned. A replacement will be communicated when details are available.2.2Datasets and DomainsRemoved information available in the Define-XML Specification.2.3The General Observation ClassesSwitched order with what is now Section 2.4.2.4Datasets Other Than General Observation Class DomainsUpdated to include references to new Section 9.2.5The SDTM Standard Domain Models

  • Added information on domain versions.
  • Updated advice on inclusion of permissible variables.
2.6Creating a New DomainRemoved list of domains, as it was redundant.2.7SDTM Variables Not Allowed in SDTMIGUpdated to include new variables in SDTM which would not be used in human clinical trials, or should be used with caution.3.1Standard Metadata for Dataset Contents and AttributesRemoved information available in the Define-XML Specification.3.2Using the CDISC Domain Models in Regulatory Submissions — Dataset MetadataRevised content related to the Define-XML Specification.3.2.1Dataset-Level MetadataUpdated to include all domains.3.2.2ConformanceRemoved variable labels from conformance criteria.4Assumptions for Domain ModelsRevised section numbering to remove the unnecessary second level numbering ".1" which occurred in all sub-sections. For instance, what was Section 4.1.2.3 is now Section 4.2.3.4.1.3.1EPOCH Variable GuidanceNew section4.1.5SDTM Core DesignationsUpdated advice on inclusion of permissible variables. [Parallels changes in Section 2.5.]4.1.7.1Example of Splitting QuestionnairesRevised examples; replaced some old examples with new ones.4.1.8.1Origin Metadata for VariablesRemoved information available in the Define-XML Specification.4.1.9Assigning Natural Keys in the MetadataRewrote using a newer example, since the old example using the PE domain was out of date.4.2.6Grouping Variables and CategorizationRevised information about the --LNKID and --LNKGRP variables.4.2.7.1"Specify" Values for Non-Result Qualifier VariablesReplaced Example 3, since previous example was inconsistent with CDISC anatomical location controlled terminology.4.3.1Types of Controlled TerminologyRevised to make the relationship to representation in the specification table clearer.4.3.3Controlled Terminology ValuesAdded advice on subsets of controlled terminology.4.4.1Formats for Date/Time VariablesRevised to include optional components of ISO 8601 time representations for fractional seconds and time zones.4.4.2Date/Time PrecisionAdded example showing fractional seconds.4.4.7Use of Relative Timing VariablesRevised to explain which values of the STENRF codelist can be used with which relative timing variables.4.4.10Representing Time PointsAn erroneous time value in the first row of the LB example was corrected.4.4.11Disease Milestones and Disease Milestone Timing VariablesNew section4.5.1.3Examples of Original and Standard Units and Test Not Done
  • Revised examples by adding --LOBXFL variables, timing variables, removed --DRVFL variables.
  • Removed QS example, since individual questionnaire supplements provide better examples.
4.5.3.2Text Strings Greater than 200 Characters in Other VariablesReformatted to provide rules in a bulleted list. Clarified name and label rules for variables and supplemental qualifiers created to hold text beyond 200 characters.4.5.9Baseline ValuesNew section, describing the new --LOBXFL variable and contrasting it with the old --BLFL variable and the ADaM variable ABLFL.5.2Demographics
  • Revised assumptions and examples affected by new variables ARMNS [Reason Arm and/or Actual Arm is Missing] and ACTARMUD [Description if Unplanned Actual Arm].
  • Revised assumption to say that supplemental qualifiers for population flags should not be used.
5.3Subject Elements
  • Revised examples affected by new DM variables ARMNS [Reason Arm and/or Actual Arm is Missing] and ACTARMUD [Description if Unplanned Actual Arm].
  • Order of variables was changed by moving TAETORD before EPOCH.
5.5Subject Disease MilestonesNew domain6.1Models for Interventions DomainsRevised CDISC note for relative timing variables in interventions domain to include references to section 4.4.7, since not all values of the STENRF codelist are applicable to all relative timing variables.6.1.1Procedure AgentsNew domain6.1.4Meal DataNew domain6.2Models for Events DomainsRevised CDISC note for relative timing variables in events domain to include references to section 4.4.7, since not all values of the STENRF codelist are applicable to all relative timing variables.6.2.3Disposition

Removed the restriction that EPOCH be used only when DSCAT = "DISPOSITION EVENT".

Revised domain specification and assumptions to explicitly recognize and provide advice on:

  • The use of Disposition Events for both study participation disposition and study treatment disposition.
  • Representation of multiple informed consents.
6.3Models for Findings Domains

New groupings of related domains:

  • New Questionnaires, Ratings, and Scales [QRS] grouping
  • New Morphology/Physiology grouping
  • Old Oncology grouping removed
  • New Tumors/Lesions grouping
6.3.7Microbiology DomainsSignificant revisions to handle tests for all types of microorganisms. Previous versions were designed only for tests involving bacteria. Updated Microbiology Susceptibility [MS] domain to replace Viral Resistance [VR] domain in the provisional TAUG-Virology.6.3.9MorphologyAdded statement that this domain will be deprecated in a future version of the SDTMIG. Tests that were represented in the MO domain will be moved to morphology/physiology domains.6.3.10Morphology/Physiology DomainsAdded generic specification table describing characteristics common to all morphology/physiology domains.6.3.10Morphology/Physiology DomainsThe --TSTDTL variable, which was included in the versions of some domains which underwent public review, was removed.6.3.10.1Cardiovascular System FindingsNew morphology/physiology domain6.3.10.2Musculoskeletal System FindingsNew morphology/physiology domain6.3.10.3Nervous System FindingsNew morphology/physiology domain6.3.10.4Ophthalmic ExaminationsNew morphology/physiology domain6.3.10.6Respiratory System FindingsNew morphology/physiology domain6.3.10.7Urinary System FindingsNew morphology/physiology domain6.3.11.3.2PC-PP – Relating RecordsReformatted examples; data in examples did not change.6.3.12Physical Examination

Revised to limit this domain to data collected in a traditional physical examination of the body.

  • Revised assumptions.
  • Included reference to CDASH advice on physical examination data.
  • Removed fundascopic examination from example.
6.3.13.1Functional TestsNew domain6.3.13.2Disease Response and Clin Classification

Expanded scope of the RS domain to include clinical classifications in addition to oncology disease response.

The RS domain was moved from the old Oncology grouping to the Questionnaires, Ratings, and Scales grouping.

6.3.16Tumor/Lesion DomainsScope of the TU and TI domains was expanded to include lesions in addition to tumors.6.4Skin ResponseExamples 1 and 2 revised in consultation with a subject matter expert to be more accurate and realistic.7.1.2Definitions of Trial Design ConceptsPresented definitions in a table.7.2.2Trial ElementsCorrected erroneous domain values in third example.7.3.3Trial Disease MilestonesNew domain8.4.3SUPP-- ExamplesRemoved example showing population flags, since these supplemental qualifiers were removed.9Study ReferencesNew section9.1Device IdentifiersNew section. Provides basic information on domain and refers the user full information on this domain in the SDTMIG-MD.9.2Non-host Organism IdentifiersNew domain9.3Pharmacogenomic/Genetic Biomarker IdentifiersNew section. Provides basic information on domain and refers the user full information on this domain in the SDTMIG-PGx.Appendix ACDISC SDS Extended Leadership TeamReplaced former team list.Appendix CControlled TerminologyUpdated language describing past changes in controlled terminology publication and the SDTMIG Controlled Terminology appendices.Appendix C2Supplemental Qualifiers Name CodesRemoved population flag supplemental qualifiers.Appendix ERevision HistoryReplaced with summary of changes between SDTMIG v3.2 and SDTMIG v3.3.

Appendix F: Representations and Warranties, Limitations of Liability, and Disclaimers

CDISC Patent Disclaimers

It is possible that implementation of and compliance with this standard may require use of subject matter covered by patent rights. By publication of this standard, no position is taken with respect to the existence or validity of any claim or of any patent rights in connection therewith. CDISC, including the CDISC Board of Directors, shall not be responsible for identifying patent claims for which a license may be required in order to implement this standard or for conducting inquiries into the legal validity or scope of those patents or patent claims that are brought to its attention.

Representations and Warranties

"CDISC grants open public use of this User Guide [or Final Standards] under CDISC's copyright."

Each Participant in the development of this standard shall be deemed to represent, warrant, and covenant, at the time of a Contribution by such Participant [or by its Representative], that to the best of its knowledge and ability: [a] it holds or has the right to grant all relevant licenses to any of its Contributions in all jurisdictions or territories in which it holds relevant intellectual property rights; [b] there are no limits to the Participant's ability to make the grants, acknowledgments, and agreements herein; and [c] the Contribution does not subject any Contribution, Draft Standard, Final Standard, or implementations thereof, in whole or in part, to licensing obligations with additional restrictions or requirements inconsistent with those set forth in this Policy, or that would require any such Contribution, Final Standard, or implementation, in whole or in part, to be either: [i] disclosed or distributed in source code form; [ii] licensed for the purpose of making derivative works [other than as set forth in Section 4.2 of the CDISC Intellectual Property Policy ["the Policy"]]; or [iii] distributed at no charge, except as set forth in Sections 3, 5.1, and 4.2 of the Policy. If a Participant has knowledge that a Contribution made by any Participant or any other party may subject any Contribution, Draft Standard, Final Standard, or implementation, in whole or in part, to one or more of the licensing obligations listed in Section 9.3, such Participant shall give prompt notice of the same to the CDISC President who shall promptly notify all Participants.

No Other Warranties/Disclaimers. ALL PARTICIPANTS ACKNOWLEDGE THAT, EXCEPT AS PROVIDED UNDER SECTION 9.3 OF THE CDISC INTELLECTUAL PROPERTY POLICY, ALL DRAFT STANDARDS AND FINAL STANDARDS, AND ALL CONTRIBUTIONS TO FINAL STANDARDS AND DRAFT STANDARDS, ARE PROVIDED "AS IS" WITH NO WARRANTIES WHATSOEVER, WHETHER EXPRESS, IMPLIED, STATUTORY, OR OTHERWISE, AND THE PARTICIPANTS, REPRESENTATIVES, THE CDISC PRESIDENT, THE CDISC BOARD OF DIRECTORS, AND CDISC EXPRESSLY DISCLAIM ANY WARRANTY OF MERCHANTABILITY, NONINFRINGEMENT, FITNESS FOR ANY PARTICULAR OR INTENDED PURPOSE, OR ANY OTHER WARRANTY OTHERWISE ARISING OUT OF ANY PROPOSAL, FINAL STANDARDS OR DRAFT STANDARDS, OR CONTRIBUTION.

Limitation of Liability

IN NO EVENT WILL CDISC OR ANY OF ITS CONSTITUENT PARTS [INCLUDING, BUT NOT LIMITED TO, THE CDISC BOARD OF DIRECTORS, THE CDISC PRESIDENT, CDISC STAFF, AND CDISC MEMBERS] BE LIABLE TO ANY OTHER PERSON OR ENTITY FOR ANY LOSS OF PROFITS, LOSS OF USE, DIRECT, INDIRECT, INCIDENTAL, CONSEQUENTIAL, OR SPECIAL DAMAGES, WHETHER UNDER CONTRACT, TORT, WARRANTY, OR OTHERWISE, ARISING IN ANY WAY OUT OF THIS POLICY OR ANY RELATED AGREEMENT, WHETHER OR NOT SUCH PARTY HAD ADVANCE NOTICE OF THE POSSIBILITY OF SUCH DAMAGES.

Note: The CDISC Intellectual Property Policy can be found at: cdisc_policy_003_intellectual_property_v201408.pdf.

How many 5 digit numbers can be formed from 12345 if repetition is allowed?

Yes the answer is 120.

How many three digit numbers can be formed from the digits 1234 and 5 if repetition is not allowed?

so 60[ans.]

How many 3

Therefore 120 such numbers are possible.

How many 3

ANSWER: 120 three-digit numbers can be formed WITHOUT REPETITION OF DIGITS.

Chủ Đề