Experimenting with AI-Scientist: An AI-Powered Paper Review Tool

As a researcher, I'm always intrigued by new tools that can enhance the academic process. Recently, I came across an interesting project called AI-Scientist, developed by Sakana AI. This tool promises to review academic papers using artificial intelligence. Curious about its capabilities, I decided to put it to the test with a couple of my own published papers.

Setting Up the Environment

I followed the setup process outlined in a blog post and the official GitHub repository. Here's a quick rundown of the steps:

1. Clone the repository:

   git clone https://github.com/SakanaAI/AI-Scientist.git

2. Install the required dependencies:

   pip install -q anthropic aider-chat backoff openai
   pip install -q pypdf pymupdf4llm
   pip install -q torch numpy transformers datasets tiktoken wandb tqdm

According to the official documentation, texlive-full is required to generate papers, but it is very heavy to use colab.
I just wanted to request a review this time, so skipping it didn't seem like a problem.

3. Set up the OpenAI API key (I used Google Colab's userdata for this):

   import os
   from google.colab import userdata
   api_key = userdata.get('OPENAI_API_KEY')
   os.environ['OPENAI_API_KEY'] = api_key

Running the AI Review

With the environment set up, I was ready to test the AI-Scientist on my papers. I used the following code to perform the review:

import openai
from ai_scientist.perform_review import load_paper, perform_review

client = openai.OpenAI()
model = "gpt-4o-mini-2024-07-18"

paper_txt = load_paper("my-paper.pdf")
review = perform_review(
    paper_txt,
    model,
    client,
    num_reflections=5,
    num_fs_examples=1,
    num_reviews_ensemble=5,
    temperature=0.1,
)

The Results

I tested the AI-Scientist on two of my published papers:

Surprisingly, both papers received a "Reject" decision from the AI reviewer, with overall scores of 4 out of 10. Here's a summary of the feedback for the first paper:

Strengths:

Addresses a relevant topic of Learning Analytics in K-12 education
Identifies distinct engagement patterns
Provides empirical data on students' engagement and performance

Weaknesses:

Lack of methodological details
Insufficient address of potential confounding factors
Limited discussion on broader implications
Inconsistent clarity in writing

Questions posed by the AI:

Requests for more details on clustering methodology
Inquiries about addressing limitations in future work

The feedback for the second paper was similar, highlighting strengths in addressing significant educational issues but pointing out weaknesses in methodology and validation.

Reflections

While it's disheartening to see my published works receive "Reject" decisions from the AI, it's important to consider a few factors:

The AI might be calibrated to very high standards, possibly aiming for top-tier conference or journal quality.
The tool provides valuable feedback that could be used to improve papers before submission.
This experiment demonstrates the potential of AI in academic review processes, but also highlights the need for human judgment in interpreting results.

As we continue to integrate AI tools into academic workflows, it's crucial to view them as assistants rather than replacements for human reviewers. They can offer quick, initial feedback, but the nuanced understanding of research context and significance still requires human expertise.

Have you experimented with AI tools in your research process? I'd love to hear about your experiences in the comments!