This article provides my recommendation on the how to best represent data from multiple cohort trials. And you can try my fun little quiz too.

In the past and again recently, clients have asked if there is a way we can submit separate sets of data for a single protocol with multiple parts. My answer has always been and will remain, “No.” This question is raised when a sponsor’s programming resource has created tables, listings, and figures (TLFs) from Study Data Tabulation Model (SDTM) and Analysis Dataset Model (ADaM) datasets developed for a single part or cohort of the protocol. The data strategy was designed with good intentions. That is, to quickly get to the TLFs needed to ensure study participant safety. To make sure I wasn’t being creative enough in my answer, I reached out to my network to see if they had any ideas to submit the individual data packages. I did not get a single solution other than to pool the data and ensure traceability to the analysis results. In this article I will talk about these types of studies and how to be clear and transparent in both data and metadata so that study design and analyses can be well understood by a health authority reviewer.
These studies are typically First-In-Human (FIH) multiple expansion cohort trials. What does that mean? The NCIthesaurus provides the following definition as: An Expansion Cohort Trial is “A predominantly First-in-Human (FIH) trial with a single protocol with an initial dose-escalation phase followed by three or more additional subject cohorts with cohort-specific objectives.” The CDISC-GLOSS Definition for code C191276 adds: “NOTE: The objectives of these expansion cohorts can include assessment of antitumor activity in a disease-specific setting, assessment of a dose with acceptable safety in specific populations (e.g., pediatric or elderly subjects, subjects with organ impairment, subjects with specific tumor types), evaluation of alternative doses or schedules, establishment of dose and schedule for the investigational drug administered with another oncology drug, or evaluation of the predictive value of a potential biomarker. In general, comparison of activity between cohorts is not planned except when a prespecified randomization and analysis plan are part of the protocol design.”
Last year, FDA released guidance regarding the design and conduct of these types of trials: Expansion Cohorts: Use in First-in-Human Clinical Trials to Expedite Development of Oncology Drugs and Biologics Guidance for Industry. March 2022. Expansion cohorts allow for the expedited advancement from determining tolerated dose to evaluations that typically happen in Phase 2 trials. It is very important to monitor toxicity in these types of studies. The design of a multi-part study impacts the data collection and CDISC data package creation. Typically, the cohorts enroll rapidly, making the need for streamlined data collection, evaluation, dissemination, and ingestion necessary to ensure the safety of the participants.
The modeling of the design of an expansion cohort trial can get complicated. The CDISC SDTM Implementation Guide (SDTM-IG) Human Clinical Trials (Version 3.3 Final) provides a step-by-step instruction in section 7.5 “How to Model the Design of a Clinical Trial”. This is not specific to any type of trial but is good advice for multi-part trials. Beyond what is stated in the SDTM-IG, I present below some additional considerations for designing CDISC domains to be transparent about the part of the study the data belongs to.
In my example below, I reference a faux study that has two Single Ascending Dose (SAD) cohorts, two Multiple Ascending Dose (MAD) cohorts, and one Food Effects (FE) cohort with a cross-over design.
SDTM and ADaM Domains
- Subject IDs can be assigned according to their part of the trial. For example, utilizing the format Subject IDs for the SAD part of the study can start with 1, for MAD with 2, and for FE with 3. If you had the format <site>-<subject> it could look like this for Site 5002: SAD: 5002-101 ; MAD: 5002-201 ; FE: 5002-301
- The value of ARM and ACTARM can be set to incorporate the part as well. Concatenating to the end ‘SAD’, ‘MAD’, and ‘FE’ would indicate not only treatment arm but part.
- In the Subject Element (SE) domain, the single dose or multiple dose aspect can be incorporated into each treatment by concatenating ‘SD’ and ‘MD’ as appropriate.
- In the Trial Inclusion (TI) domain, if there are any inclusion or exclusion criteria that only apply to a particular part, then indicate in the text of the variable, IETEST. In this example, this exclusion criteria only applies to the FE part. “For the food effects part of the study only, subjects who are taking, or have taken enzyme inducers within 30 days before the first dose of study medication.”
- In the Trial Summary (TS) domain, the use of TSGRPID can indicate different values for parameters based on part. TSGRPID can have the values of MAD, SAD, and FE. Here is a table with the possible parameters to consider with faux values as examples:
- ADaM datasets can utilize the variables PARTN and PART:

Metadata acrf.pdf and csdrg.pdf
The CDISC Study Data Tabulation Model Metadata Submission Guidelines: Human Clinical Trials (2.0 Final) section 3.1.1 recommends a strategy for how best to annotate the case report form (acrf.pdf).
“It is recommended that sponsors include and annotate unique forms only. Bookmarking will represent the form as many times as needed to reflect how data were intended for collection. For example, a VS form would be bookmarked in accordance to clinical visits 1, 3, and 5. In this instance, all 3 visits would be bookmarked and linked to the corresponding unique VS form.”
Utilizing unique CRFs for the CRF annotated for SDTM variables (acrf.pdf), bookmark the pages in a way to present how the forms were collected in the trial. That is, create a bookmark section for each part/cohort. Document this method of bookmarking in the Clinical Study Data Reviewer Guide (csdrg.pdf) section “3.3 Annotated CRFs”.
Here is an example scheme for bookmarks for a study with Single Ascending Dose, Multiple Ascending Dose, and Food Effects:
-STUDYID
-CRF SAD Cohort 1
-By Domain
-By Visit
-CRF SAD Cohort 2
-By Domain
-By Visit
-CRF MAD Cohort 3
-By Domain
-By Visit
-CRF MAD Cohort 4
-By Domain
-By Visit
-CRF FE Cohort 5
-By Domain
-By Visit
The csdrg.pdf is used by the health authority reviewer to orient themselves to the clinical study data. This document should clearly represent the design of the study and how the data was collected and mapped to SDTM. I recommend including in section 2.2 Protocol Design, diagrams or schematics from the last version of the protocol that describes the parts, cohorts, and how the study design and randomization scheme were to be executed.
Here is example text to use in the csdrg.pdf to explain how the bookmarking was completed:
3.3 Annotated CRFs
“The annotated CRF contains all unique pages of the eCRF available for this study. The acrf.pdf is bookmarked by forms and by visit for each study part and cohort and reflects how the data were collected.”
Here is an example traceability diagram that can be used in the csdrg:
3.2 Traceability Flow Diagram

In general, I recommend building the CDISC datasets in a stepwise manner. After each part is completed, set that part onto the part before, then run a new subset of tables. Once the study is done, re-run all of the TLFs on final data and use a tool like UltraEdit or Beyond Compare to make sure the TLFs did not change with the final data. This will ensure traceability and avoid re-work. I can see why the piecewise method would be attractive for the purpose of compartmentalizing teams and deliverables, but in the end, if the compound/molecule moves forward, the sponsor would have a lot of clean-up to do!
Staying up to date on FDA guidance helps biometrics professionals understand the importance of modeling their data to fit the needs of the study design. Although the “Expansion Cohorts” guidance is not written with the programmer in mind, it is still a valuable reference for learning why these study designs are useful. Mapping CDISC data in a way that makes the study parts and cohorts transparent aids in enhancing the data’s “fit for purpose” for the health authority reviewer.
Take the quiz to see how much you have learned.