3 Comments

I'm guessing your team has already thought of this but you could take existing papers and use AI to purposely introduce an error and then see if your new AI detector can find the error. For example, change numbers or operations in math problem or change conclusion language like "the chart shows X increasing when Y" to "the chart shows X decreasing when Y". Here are more ideas from ChatGPT:

1. Numerical or Logical Errors

Data inconsistencies: Change numerical values in tables or charts to conflict with reported statistics in the text.

Calculation mistakes: Introduce errors in mathematical derivations or results, such as adding where multiplication is required.

Unit mismatches: Change units (e.g., "10 cm" to "10 m") without adjusting the numbers appropriately.

Rounding issues: Alter significant digits or rounding in reported results.

2. Graph and Table Discrepancies

Graph mislabeling: Swap X and Y axis labels or change graph legends to introduce inconsistencies.

Mismatch with narrative: Alter graphs or tables to conflict with the description in the text.

Formatting errors: Introduce issues like missing axis labels, misaligned data points, or inconsistent scale.

3. Language and Writing Errors

Ambiguous phrasing: Change precise scientific language to something vague or misleading.

Contradictions: Add statements that contradict earlier claims in the paper.

Grammar changes: Introduce errors in sentence structure, missing articles, or subject-verb disagreement.

Tone shifts: Alter conclusions to sound less confident, or modify claims to seem exaggerated.

4. Citations and References

Mismatched citations: Replace a correct citation with an unrelated or invalid one.

Missing citations: Remove citations for claims that require supporting evidence.

Reference typos: Alter author names, years, or journal titles in references.

5. Methodology Problems

Inconsistent methods: Change details of the methodology to conflict with results (e.g., claim to have used one algorithm but show results from another).

Parameter mismatches: Modify key experimental parameters so they no longer align with results.

Misrepresentation of procedures: Change experimental details to make them illogical or infeasible.

6. Ethical and Compliance Errors

Fabrication: Insert made-up data or results that do not follow from the described experiment.

Plagiarism: Introduce text copied from other sources without citation.

7. Domain-Specific Errors

Biological papers: Introduce errors in species names, anatomical terms, or physiological processes.

Physics papers: Modify constants, assumptions, or units in equations.

Social sciences: Alter the interpretation of qualitative data, such as changing survey results or demographic descriptions.

8. Structural and Organizational Errors

Section misplacement: Swap sections like methods and results or conclusions and abstract.

Incomplete sections: Remove critical parts of a section, such as missing details in the methodology.

Duplications: Repeat sections or tables unnecessarily.

9. Logical Fallacies

Non sequiturs: Add conclusions that do not logically follow from the results.

Correlation vs. causation errors: Change phrasing to imply causation where there is only correlation.

10. Formatting and Style Errors

Inconsistent formatting: Change figure numbering or table referencing inconsistently throughout the paper.

Style guide violations: Alter fonts, headings, or other style elements to deviate from the journal’s formatting requirements.

Additional Ideas

To enhance testing, you could also create "graded mistakes," where some errors are more obvious (e.g., a missing table entirely) and others are subtle (e.g., minor rounding issues). Combining multiple error types in a single paper could test the robustness of your "problem finder" AI in identifying multiple issues simultaneously.

Expand full comment

The problem you’re attempting to solve is very tricky as you’ve mentioned. We spent 18 months building out a peer review platform to do just this. We’ve solved nearly all of the problems you’ve encountered. We’ve made it free for everyone to use at paper-wizard.com

Expand full comment

We've run some test with the paper-wizard.com, the agents doing very thorough review indeed. The structured and detailed feedback is amazing, but unfortunately that is also what makes it very different (not necessarily inferior) compared to human feedback.

Human have the creativity and often get carried away with one or two particular issues at time. When human expert does so in peer review they will expand and explain a particular suggestion more. At times this is what was found to be helpful for both authors and editors. The "convincing" others part of the review.

AI that was programmed to assess everything will give structured but very predictable responses. On the other hand, AI that was prompted to be creative, often introduce unnecessary things that often not relevant and at the boundary of hallucinations.

18 months is a tonnes of experience for a niche like this that is very nascent and ever evolving and break neck pace. Looking forward to connect after the holidays. Visit us at ResearchHub.com

Expand full comment