How to Use AI for a PRISMA-Compliant Systematic Review
A practical guide to using AI in systematic reviews without breaking PRISMA compliance. Where AI legitimately helps (screening, extraction), where it shouldn't, the reporting requirements, and a step-by-step workflow.
A systematic review used to take a team of three researchers six to nine months. The bottleneck wasn't reading — it was screening. Twelve thousand abstracts pulled from PubMed, Embase, Scopus, and Cochrane, each needing two independent reviewers to decide include or exclude against pre-registered criteria. The time math drove careers around it.
AI changed that math. Modern language models can screen abstracts in seconds, extract study characteristics from full-text PDFs in minutes, and summarize across hundreds of papers in hours. Used carefully, AI cuts the screening phase of a review from months to weeks. Used carelessly, it produces a non-reproducible, non-compliant document that fails peer review.
This guide walks through where AI legitimately helps in a PRISMA-compliant review, where it shouldn't be doing the work, the reporting requirements that come with AI use, and a step-by-step workflow that satisfies PRISMA 2020 and the PRISMA-trAIce extension.
What PRISMA actually requires (quick refresher)
PRISMA 2020 is the standard reporting checklist for systematic reviews. It governs how you describe what you did, not how you do it. The relevant pieces for AI use are:
Search strategy reporting. Document every database searched, every search string used, every date the searches were run. Reproducibility is the standard — another researcher should be able to rerun your search and get the same results.
Screening reporting. Document how many records were screened, by how many independent reviewers, how disagreements were resolved, and how many were excluded at each stage. The classic PRISMA flow diagram lives here.
Data extraction reporting. Document what data was extracted, by whom, and how disagreements were resolved.
Risk of bias assessment. Document the tool used (Cochrane RoB 2, ROBINS-I, etc.) and who performed it.
Reporting any deviations. Anything that didn't go according to the pre-registered protocol must be reported, with reasoning.
The PRISMA-trAIce extension (published 2024, updated 2025) adds AI-specific reporting requirements on top of PRISMA 2020. The short version: anywhere AI was used in the review, you report the tool, the version, the prompts, and how human verification was performed.
Where AI legitimately helps
These are uses where AI accelerates the work without changing what the review is.
Duplicate detection. Records pulled from multiple databases often duplicate. Traditional reference managers (Zotero, EndNote, Covidence) do this fine. AI is overkill here — keep using the standard tools.
Initial title and abstract screening. AI can score each abstract against your inclusion criteria and rank or pre-classify them. Two human reviewers still need to make the final include/exclude decision, but AI pre-classification cuts the human time substantially. This is the highest-value AI use in most reviews.
Full-text retrieval and triage. AI can extract publication metadata, identify whether a full text matches the abstract's claims (occasionally they don't), and flag papers that appear to be conference abstracts, errata, or duplicate publications under different titles.
Data extraction from structured papers. Tables of patient characteristics, dosages, effect sizes — AI can extract these from full-text PDFs into a structured data extraction sheet, which two human reviewers then verify. The verification time is much lower than full manual extraction.
Synthesis and writing support. Drafting the methods section's screening procedure description, drafting the PRISMA flow diagram text, summarizing the characteristics-of-included-studies table — AI helps with writing without changing the substance of the review.
Translation of non-English sources. If your review includes non-English papers, AI translation has become reliable enough to support inclusion of these sources. Document the tool used in methods.
Where AI should NOT do the work
These uses cross the line into substantive decision-making that human reviewers must do.
Final include/exclude decisions. PRISMA requires two independent human reviewers for inclusion/exclusion. AI can pre-classify, rank, and surface candidates — but the binding decision must be human. This is non-negotiable for compliance.
Risk of bias assessment. RoB tools require judgment about study design, blinding, attrition, and reporting. AI can summarize what the paper says about each domain, but the bias rating itself must be human.
Quality assessment and grade of evidence (GRADE). Same logic. AI summarizes; humans rate.
Interpretation of heterogeneity. Whether differences between study results reflect clinical heterogeneity, methodological heterogeneity, or chance is a judgment call that requires clinical and methodological expertise.
Final synthesis and conclusions. The narrative synthesis, the discussion of strengths and limitations, the clinical implications — these are the contributions of the review team. AI can draft initial language, but the substantive judgments are yours.
Detection of fabricated or paper-mill content. Ironically, AI detection of fabricated studies remains unreliable. Human eyes on suspicious papers, plus tools like the Problematic Paper Screener, are the current standard.
The reporting requirements
If you use AI anywhere in the review, PRISMA-trAIce requires you to report it. The structure that satisfies most journals:
In the methods section, screening procedure subsection:
Abstract screening was conducted using a two-stage process. Initial
classification was performed using [Tool Name, version, accessed via
API/web on dates] with the following prompt template: "[exact prompt]".
The classification was used to prioritize abstracts for human review.
All abstracts, regardless of initial classification, were then screened
independently by two reviewers ([author initials]) using [Covidence /
Rayyan / other tool], with disagreements resolved by discussion or by
a third reviewer ([author initials]) when consensus was not reached.
In a calibration exercise conducted on [number] abstracts before the
main screening, the AI classification agreed with the consensus human
decision in [percentage]% of cases. AI was not used for final
inclusion or exclusion decisions.
In the methods section, data extraction subsection:
Data extraction was performed using a structured form (Appendix [X]).
Extraction of [specific data types, e.g., patient characteristics,
intervention details, outcome measurements] was supported by [Tool
Name, version], which extracted candidate values from full-text PDFs.
All extracted values were verified against the source PDFs by two
reviewers ([author initials]). Discrepancies between AI-extracted
values and source documents were corrected against the source in
[percentage]% of cases. The verified data informed the final
synthesis.
In a dedicated "Use of AI" subsection (sometimes required separately):
The following AI tools were used in this review: [list each tool,
version, date range, and specific role]. No AI tool was used for
risk of bias assessment, quality grading, interpretation of
heterogeneity, or synthesis of conclusions. All AI-supported steps
were verified by [number] human reviewers as described above. The
prompts used are provided in Appendix [Y].
In the limitations section:
Acknowledge AI-related limitations: potential systematic bias in pre-classification, reliance on AI tools whose internal workings are not transparent, and the impossibility of fully reproducing AI behavior across model versions.
Summarize and Extract — with Verifiable Outputs
Paste a paper or paste an extraction request. Get back content you can verify against the source — fast.
Try the AI SummarizerThe workflow we recommend
A sequence that satisfies PRISMA-trAIce and uses AI's strengths.
Step 1: Pre-register the protocol. Before any AI use, register the review (PROSPERO for medical reviews; OSF for others). The protocol specifies inclusion criteria, search strategy, screening method, extraction plan, and synthesis approach. Specify in the protocol where AI will be used and how. Pre-registration that mentions AI is much stronger than post-hoc disclosure.
Step 2: Run the calibration exercise. Pick 100-200 abstracts from your search. Have two human reviewers screen them independently. Run AI screening on the same set with your planned prompt. Compute agreement metrics (Cohen's kappa, percent agreement). If AI agreement is below 0.7 kappa or 80% with the consensus human decision, refine the prompt or reconsider AI use.
Step 3: Run the main AI screening pass. With a calibrated prompt, screen the full abstract corpus. Output: a ranked or classified list. Human reviewers see this ranking but make their own independent decisions.
Step 4: Two-reviewer independent screening. Each abstract still gets two human reviewers. The AI classification is metadata, not a vote. Disagreements resolved by discussion or a third reviewer.
Step 5: Full-text screening with AI assistance. AI can flag obvious exclusions at the full-text stage (wrong language, abstract only, retracted papers). Humans make final decisions.
Step 6: Data extraction with AI assistance and verification. AI extracts candidate values; two human reviewers verify against the source. The verification log itself becomes evidence of compliance.
Step 7: Risk of bias — human only. No AI in this step.
Step 8: Synthesis — human-led, AI-assisted writing. Humans interpret. AI helps with summarizing studies for the included-studies table, drafting the methods section, and polishing prose. Substantive interpretation stays human.
Step 9: Disclose comprehensively. Methods section reports AI use as described above. A complete AI-use disclosure statement appears in the front matter or acknowledgments. The full prompts used go in an appendix.
Step 10: Pre-publication audit. Before submission, a second team member audits the AI-supported steps for documentation completeness. Missing prompts, missing version numbers, or missing verification percentages are the common rejection triggers.
Common pitfalls
Hallucinated study characteristics. AI sometimes extracts data that isn't in the source paper — confidence intervals that don't exist, sample sizes that don't match, intervention details fabricated from context. Verification against the source is the only defense. If your team isn't verifying every extracted value, you're going to publish errors.
Prompt drift across the review. A prompt refined mid-review changes the AI's behavior on already-screened items. If you change the prompt, document why and re-screen affected items.
Over-reliance on AI classification. Some teams have effectively delegated inclusion decisions to AI by treating its classification as authoritative. PRISMA requires human decisions. AI input is fine; AI decisions are not.
Forgetting to document deviations. Anything that differs from the pre-registered protocol must be reported. If AI use evolved during the review, document the evolution. Hidden process changes are flagged at peer review.
Inconsistent tool versions. AI models update. The DeepSeek V3 that screened abstracts in January isn't identical to the version available in June. Document the version and date range of each AI tool used.
Translation accuracy assumed, not verified. AI translation is good but not perfect, especially for clinical or technical content. If non-English sources are included, document who verified the translations.
Summarize papers, extract study characteristics, and draft synthesis text. Free tier includes every feature.
Frequently asked questions
Q: Can I include AI-screened abstracts in my PRISMA flow diagram?
Yes, but with specific attribution. The standard PRISMA 2020 flow diagram has fields for records identified, records screened, records assessed for eligibility, and records included. If AI was used in screening, add a note to the diagram or its caption: "Initial AI-supported classification was used to rank abstracts; all abstracts received independent human screening by two reviewers." Some journals now request a more detailed flow diagram that breaks out the AI-supported and human-only steps. The PRISMA-trAIce extension provides templates for this.
Q: How do I cite AI tools used in my systematic review?
Cite the model with its version and the access date. Standard format: "[Model Name], version [X.Y], accessed [date range] via [API endpoint / web interface] (developer: [Company]). URL: [link to documentation if available]." Some journals require a more detailed citation including the exact API parameters used. Check the journal's instructions for authors. AI tool citation conventions are still evolving — when in doubt, include more detail rather than less.
Q: What's the difference between PRISMA 2020 and PRISMA-trAIce?
PRISMA 2020 is the standard reporting checklist for systematic reviews, updated from the 2009 version. PRISMA-trAIce (published 2024) is an extension that adds reporting requirements for AI-supported steps in the review process. Most journals now require both: PRISMA 2020 for general reporting, PRISMA-trAIce for any AI-supported steps. The trAIce checklist has 12 items covering tool documentation, prompt reporting, calibration metrics, and human verification procedures. If you use AI anywhere in a systematic review, address PRISMA-trAIce in your methods section. For a broader workflow guide that complements this one, see Using AI to Speed Up Your Literature Review.
Q: Will using AI in my systematic review reduce my chances of acceptance?
In our experience, disclosed and properly documented AI use does not reduce acceptance rates and often speeds review (the methods are clearer and more defensible). What reduces acceptance is undisclosed AI use, AI use that substitutes for required human judgment, or AI-related limitations that aren't acknowledged. The signal editors and reviewers respond to is rigor and transparency, not abstention from AI. A systematic review that uses AI for screening, reports the use in detail, includes calibration metrics, and acknowledges the limitations is treated as a methodologically modern review — not a compromised one.

Ema is a senior academic editor at ProofreaderPro.ai with a PhD in Computational Linguistics. She specializes in text analysis technology and language models, and is passionate about making AI-powered tools that truly understand academic writing. When she's not refining proofreading algorithms, she's reviewing papers on NLP and discourse analysis.