Articles

Crafting Explanations for Clinical Data Validation Issues – Tips and Resources

Ever get that deer in the headlights look when it is time to explain why you still have clinical data validation issues?

Iris has reviewed a LOT of clinical data packages that have been sent to FDA over the years. We’ve seen all sorts of explanations written into data reviewer’s guides for failure to meet data validation rules. Sometimes, the data packages are not able to be modified before submission except for tweaks to the reviewer’s guide which always means making the explanation of validation issues more robust.

This article is about steps to take to make explanations better, not to disrespect good intentioned programmers who did their best to write an explanation. But there are a few that I’m going to point out that really should not be used.

  • ‘Pinnacle 21 bug’ – In recent years I am dubious about this explanation. I investigate the heck out of this claim. P21 validation is quite mature here in the year 2023 and this type of explanation is unexpected. It is most likely that the programmer who wrote it does not understand the nature of the issue and my job is to elucidate the root cause of the issue to be corrected. I really like Kristin Kelly‘s discussion of ‘false positive’ in her PHUSE presentation on explanations.
  • ‘As client instructed’ or ‘As sponsor instructed’ – This explanation would not make any sense to the target audience, an FDA reviewer. Regardless of who wrote the reviewer’s guide, the author is always the sponsor! With a little investigation a more reasonable explanation can be constructed.
  • ‘True duplicate’ – Why are there duplicates in SDTM? I bet the programmer just needs to figure out the keys and put them in the define file appropriately.
  • ‘No codelist’ – Ummmm, why didn’t you make a custom codelist?
  • ‘As collected’ – This does not provide enough information to the FDA reviewer. Provide in the explanation what data was not collected or describe the way it was collected. An explanation should include a rationale for the reason the data could not be collected as expected.
  • ‘Extensible codelist’ – provide the values added to the codelist. This little bit of additional information is meaningful to an FDA reviewer.

Pinnacle 21 has some great tools to help, including Issue Management. However, not everyone has access. Particularly for my small Biotech and Pharma clients who have a Biometrics team of 1 or 2 people and use the free Pinnacle 21 Community version. However, there are some things to consider as you are writing or editing explanations.

  1. Research! Use resources on the internet to better understand the validation issue. Understanding the issue can help determine the root cause, which can better help you craft an explanation. You can start with the additional resources at the bottom of this article. But usually a good search engine can help you find an answer.
  2. Collaborate! Constructing an explanation may need the help from specific subject matter experts. For data or data collection issues, reach out to the data manager. If it is a controlled terminology or dictionary issue, reach out to the person in charge of data standards. The statistician or the programmer may need to look at the specification if the issue is in the define generation. Clinical may need to provide some input into the trial summary domain if the developer of that dataset is not certain about all the options or the best terminology to use. It never hurts to ask!
  3. Be specific! Once you’ve done your research to understand the issue and the data, you can include details that provide a rationale or justification for why the issue exists. At a minimum, the explanation can explain the root cause of the issue.
  4. Write well! Explanations should be past tense, well formed, and using proper grammar and punctuation.
  5. Don’t blame! If your explanation is blaming a person or a department then it needs some wordsmithing. For example, “Data Management said this is how it was entered into the EDC.”
  6. Don’t give excuses! Just like when you are saying sorry, don’t make your explanation into an excuse. If you read it and it sounds whiny, then it needs rewritten. For example, “The database is locked and we can’t change it.”
  7. Use the right configuration! Whether you are using Pinnacle 21 or some other validation software, make sure you are using the right configuration. This includes the latest available version of the software with the latest rules. Also make sure you are referencing the correct standards and controlled terminology in the configuration. And don’t forget to be consistent. See my article on this topic here.
  8. Use the latest report! Nothing is more confusing than comparing explanations in a reviewer’s guide that doesn’t match a re-run of the validation report.

Let me know if you have other tips or resources that I can add to this article.

Author: Lisa Brooks, Iris Statistical Computing

Additional resources:

SI and Conventional Units?

For clinical studies, FDA says “please submit two domains for lab results.” They are asking for both the previously expected SI units and now the conventional units.

Lab results in International System of Units (SI) have been the requirement for FDA data submissions. However, FDA has sometimes requested data in conventional units for labeling purposes. The last FDA Study Data TCG has changed a “may require” to a “please submit” for conventional units. This topic has made its rounds through CDISC Sub-teams and PHUSE working groups. Seems FDA has sorted it out with providing the additional instruction to submit in a custom domain. Has anybody out there added the lc.xpt dataset to their eSubmission data packages yet?

Let me know what you’ve learned.

Lisa

How to Represent First-In-Human Multiple Expansion Cohort Trials in eSubmission Data Packages

This article provides my recommendation on the how to best represent data from multiple cohort trials. And you can try my fun little quiz too.

In the past and again recently, clients have asked if there is a way we can submit separate sets of data for a single protocol with multiple parts. My answer has always been and will remain, “No.” This question is raised when a sponsor’s programming resource has created tables, listings, and figures (TLFs) from Study Data Tabulation Model (SDTM) and Analysis Dataset Model (ADaM) datasets developed for a single part or cohort of the protocol. The data strategy was designed with good intentions. That is, to quickly get to the TLFs needed to ensure study participant safety. To make sure I wasn’t being creative enough in my answer, I reached out to my network to see if they had any ideas to submit the individual data packages. I did not get a single solution other than to pool the data and ensure traceability to the analysis results. In this article I will talk about these types of studies and how to be clear and transparent in both data and metadata so that study design and analyses can be well understood by a health authority reviewer.

These studies are typically First-In-Human (FIH) multiple expansion cohort trials. What does that mean?  The NCIthesaurus provides the following definition as: An Expansion Cohort Trial is “A predominantly First-in-Human (FIH) trial with a single protocol with an initial dose-escalation phase followed by three or more additional subject cohorts with cohort-specific objectives.” The CDISC-GLOSS Definition for code C191276 adds: “NOTE: The objectives of these expansion cohorts can include assessment of antitumor activity in a disease-specific setting, assessment of a dose with acceptable safety in specific populations (e.g., pediatric or elderly subjects, subjects with organ impairment, subjects with specific tumor types), evaluation of alternative doses or schedules, establishment of dose and schedule for the investigational drug administered with another oncology drug, or evaluation of the predictive value of a potential biomarker. In general, comparison of activity between cohorts is not planned except when a prespecified randomization and analysis plan are part of the protocol design.”

Last year, FDA released guidance regarding the design and conduct of  these types of trials: Expansion Cohorts: Use in First-in-Human Clinical Trials to Expedite Development of Oncology Drugs and Biologics Guidance for Industry. March 2022.  Expansion cohorts allow for the expedited advancement from determining tolerated dose to evaluations that typically happen in Phase 2 trials. It is very important to monitor toxicity in these types of studies. The design of a multi-part study impacts the data collection and CDISC data package creation. Typically, the cohorts enroll rapidly, making the need for streamlined data collection, evaluation, dissemination, and ingestion necessary to ensure the safety of the participants.

The modeling of the design of an expansion cohort trial can get complicated. The CDISC SDTM Implementation Guide (SDTM-IG) Human Clinical Trials (Version 3.3 Final) provides a step-by-step instruction in section 7.5 “How to Model the Design of a Clinical Trial”. This is not specific to any type of trial but is good advice for multi-part trials. Beyond what is stated in the SDTM-IG, I present below some additional considerations for designing CDISC domains to be transparent about the part of the study the data belongs to.

In my example below, I reference a faux study that has two Single Ascending Dose (SAD) cohorts, two Multiple Ascending Dose (MAD) cohorts, and one Food Effects (FE) cohort with a cross-over design.

SDTM and ADaM Domains

  1. Subject IDs can be assigned according to their part of the trial. For example, utilizing the format Subject IDs for the SAD part of the study can start with 1, for MAD with 2, and for FE with 3. If you had the format <site>-<subject> it could look like this for Site 5002: SAD: 5002-101 ; MAD: 5002-201 ; FE: 5002-301
  2. The value of ARM and ACTARM can be set to incorporate the part as well. Concatenating to the end ‘SAD’, ‘MAD’, and ‘FE’ would indicate not only treatment arm but part.
  3. In the Subject Element (SE) domain, the single dose or multiple dose aspect can be incorporated into each treatment by concatenating ‘SD’ and ‘MD’ as appropriate.
  4. In the Trial Inclusion (TI) domain, if there are any inclusion or exclusion criteria that only apply to a particular part, then indicate in the text of the variable, IETEST. In this example, this exclusion criteria only applies to the FE part. “For the food effects part of the study only, subjects who are taking, or have taken enzyme inducers within 30 days before the first dose of study medication.”
  5. In the Trial Summary (TS) domain, the use of TSGRPID can indicate different values for parameters based on part.  TSGRPID can have the values of MAD, SAD, and FE. Here is a table with the possible parameters to consider with faux values as examples:
  6. ADaM datasets can utilize the variables PARTN and PART:

Metadata acrf.pdf and csdrg.pdf

The CDISC Study Data Tabulation Model Metadata Submission Guidelines: Human Clinical Trials (2.0 Final) section 3.1.1 recommends a strategy for how best to annotate the case report form (acrf.pdf).

“It is recommended that sponsors include and annotate unique forms only. Bookmarking will represent the form as many times as needed to reflect how data were intended for collection. For example, a VS form would be bookmarked in accordance to clinical visits 1, 3, and 5. In this instance, all 3 visits would be bookmarked and linked to the corresponding unique VS form.”

Utilizing unique CRFs for the CRF annotated for SDTM variables (acrf.pdf), bookmark the pages in a way to present how the forms were collected in the trial. That is, create a bookmark section for each part/cohort. Document this method of bookmarking in the Clinical Study Data Reviewer Guide (csdrg.pdf) section “3.3 Annotated CRFs”.

Here is an example scheme for bookmarks for a study with Single Ascending Dose, Multiple Ascending Dose, and Food Effects:

            -STUDYID

                -CRF SAD Cohort 1

                    -By Domain

                    -By Visit

                -CRF SAD Cohort 2

                    -By Domain

                    -By Visit

                -CRF MAD Cohort 3

                    -By Domain

                    -By Visit

                -CRF MAD Cohort 4

                    -By Domain

                    -By Visit

                -CRF FE Cohort 5

                    -By Domain

                    -By Visit

The csdrg.pdf is used by the health authority reviewer to orient themselves to the clinical study data. This document should clearly represent the design of the study and how the data was collected and mapped to SDTM. I recommend including in section 2.2 Protocol Design, diagrams or schematics from the last version of the protocol that describes the parts, cohorts, and how the study design and randomization scheme were to be executed.

Here is example text to use in the csdrg.pdf to explain how the bookmarking was completed:

3.3 Annotated CRFs

“The annotated CRF contains all unique pages of the eCRF available for this study. The acrf.pdf is bookmarked by forms and by visit for each study part and cohort and reflects how the data were collected.”

Here is an example traceability diagram that can be used in the csdrg:

3.2 Traceability Flow Diagram

In general, I recommend building the CDISC datasets in a stepwise manner. After each part is completed, set that part onto the part before, then run a new subset of tables. Once the study is done, re-run all of the TLFs on final data and use a tool like UltraEdit or Beyond Compare to make sure the TLFs did not change with the final data. This will ensure traceability and avoid re-work. I can see why the piecewise method would be attractive for the purpose of compartmentalizing teams and deliverables, but in the end, if the compound/molecule moves forward, the sponsor would have a lot of clean-up to do!

Staying up to date on FDA guidance helps biometrics professionals understand the importance of modeling their data to fit the needs of the study design. Although the “Expansion Cohorts” guidance is not written with the programmer in mind, it is still a valuable reference for learning why these study designs are useful. Mapping CDISC data in a way that makes the study parts and cohorts transparent aids in enhancing the data’s “fit for purpose” for the health authority reviewer.

Take the quiz to see how much you have learned.

Ensuring Consistent Reference to Standards in FDA eSubmission Data Filings

This article emphasizes the importance being consistent in the references to the versions of data and exchange standards used in an FDA data filing.

Introduction:

Adherence to regulatory data and exchange standards is becoming increasingly critical. When it comes to submitting data to the U.S. Food and Drug Administration (FDA), being consistent with the reference to the standards used will reduce data validation errors and the need to respond to time consuming FDA Information Requests. In this article, we’ll delve into the importance of consistently referring to the correct standards in FDA eSubmission data filings, based on the insights gained from a short quiz.

Understanding the Significance of Data Standardization

Data standardization plays a pivotal role in the world of FDA eSubmissions. It involves the process of formatting and organizing data in a consistent and uniform manner. This not only enhances data quality but also streamlines the submission process, reducing the chances of errors and delays. The FDA mandates specific standards to ensure that data is submitted in a structured format, enabling efficient review and analysis.

The Quiz: A Glimpse into Mastery of FDA eSubmission Standards

The provided quiz serves as a valuable tool for assessing one’s understanding of FDA eSubmission data standards. By successfully answering the quiz questions, individuals can explore their current proficiency in the mechanics of referencing data standards used in a filing. Let’s explore some key takeaways from the quiz that highlight the importance of consistent reference to standards:

  1. Identifying Where Data Standards Are Referenced in Module 5 of the eCTD: Standards are referenced in the data definition (define.xml), data domains themselves (TDM, SDTM, and ADaM), nonclinical, clinical study and analysis reviewer guides (nsdrg.pdf, csdrg.pdf and adrg.pdf), and sometimes they even pop up on annotated case report forms (acrf.pdf).
  2. Staying Ahead of Regulatory Changes: Regulatory standards can evolve over time. The FDA’s Study Data Standards Resources website is where you can find the most recent requirements.
  3. Avoiding Costly Errors: The quiz emphasizes how even a minor inconsistency in the reference to standards can lead to costly errors in the submission process. They can also cause issues to be identified in data validation reports, such as Pinnacle 21. These errors can result in delays in approvals due to FDA Information Requests.

Conclusion: The Path to Successful FDA eSubmissions

In the highly regulated world of pharmaceuticals and medical devices, adherence to FDA eSubmission data standards is non-negotiable. The quiz underscores the significance of consistently referring to the correct standards throughout the submission process. By doing so, companies can navigate the regulatory landscape more effectively, streamline their interactions with the FDA, and ultimately achieve successful approvals for their products.

As the pharmaceutical industry continues to evolve, mastering the art of data standardization and eSubmissions remains a foundational element for success. By leveraging the insights gained from the quiz, companies can pave the way for smoother interactions with regulatory bodies, accelerate the approval process, and bring their life-changing products to market with confidence.

Automation in Electronic Data Submission Readiness for Drug, Biologic and Device Development. Don’t Forget the Deliverable Goes to a Human Reviewer

This article emphasizes the crucial role of human review of electronic data submissions in an era of increasing automation.

Introduction:
In the ever-evolving landscape of clinical research and drug development, the intersection of standards and automation has paved the way for remarkable advancements. Since its inception in 2007, Iris has been at the forefront of the rise of data standards and witnessed the subsequent surge in automation initiatives. This article emphasizes the crucial role of human review of electronic data submissions in an era of increasing automation.

The Power of Standards and Automation:
Data standards have heralded a new era in the field of clinical research, opening doors to unprecedented automation opportunities. The streamlined data lifecycle, from collection to eSubmission preparation, has revolutionized how the industry approaches data management and programming. As health authorities, including the FDA, integrate these standards into their toolkits, the burden on new drug and biologic evaluations is considerably reduced. Notably, the FDA’s exploration of artificial intelligence (AI) further underscores the potential for enhanced streamlining.

AI in Digital Health Technologies:
The intersection of AI and clinical research is a burgeoning field that holds immense promise. The #PHUSE working group, AI in Digital Health Technologies, is dedicated to exploring the potential of AI. A quick search of the PHUSE archives for “AI” reveals several papers can be found on this topic including the use of AI in handling unstructured data, case report form (CRF) design, and validation.

The Human Element in Drug Evaluation:
While the prospect of AI-driven drug evaluations may not be too distant, the current responsibility of assessing the safety and efficacy of new medicines rests in the hands of human reviewers. Despite the industry’s best efforts mistakes happen. Crafting a flawless marketing application remains an aspiration rather than a reality.

Elevating the Burden of Deliverables:
The surge in automation does not diminish the responsibility of sponsors to deliver data packages that are not only machine-readable but also comprehensible to human reviewers. In fact, automation amplifies this responsibility. Clear data definition, meticulously formatted documentation, and information consistency is imperative. Seemingly minor errors, such as incorrect page numbers, document titles, or references, can consume hours of a reviewer’s time. Transparency regarding hard coded data values, dosing mistakes, and validation issues in reviewer’s guides can save a lot of work and time spent responding to information requests. The value of a well-constructed data traceability diagram cannot be overstated. A double check by a human of these elements and more is imperative.

Iris: Nurturing Human-Ready Submissions:
As a pioneer in the field, Iris has emerged as a beacon of expertise in Electronic Data Submission Package Evaluation. With a wealth of experience evaluating numerous data packages for studies and marketing applications, Iris plays a pivotal role in ensuring that they are ready for the human review process. This final step, prior to pressing the proverbial button to submit data, instills sponsors with the confidence that their submissions are primed for the rigors of health authority review.

Conclusion:
The synergy between standards and automation has undeniably revolutionized drug, biologic, and device development. As the industry continues to evolve and AI’s role becomes more pronounced, it’s crucial to remember that the ultimate arbiter of a new medicine’s safety and efficacy remains the human reviewer. The imperative for sponsors to deliver human-readable data packages that reflect clarity, consistency, and transparency has never been more paramount. In this dynamic landscape, Iris stands as a testament to the harmonious blend of technology and human expertise, ensuring that data submissions are poised for a successful human review.


A shout-out to ChatGPT for my second draft of this article. It was the first time I used generative AI, I was not disappointed. Author: Lisa K Brooks and #ChatGPT