AlphaXIV-Open

My own open-source implementation of AlphaXIV that allows users to chat with arXiv papers. This project uses FastAPI for the backend, markitdown for PDF conversion, MiniRAG for indexing, and Google's Gemini API for chat completions.

Demo

alphaxiv-open-demo.mov

Features

Process arXiv papers by URL
Convert PDFs to markdown using Microsoft's markitdown
Index content using MiniRAG
Chat with papers using Google's Gemini API
Support for larger document context lengths

How It Works

AlphaXIV combines several powerful technologies to enable intelligent conversations with academic papers. Here's how it works behind the scenes:

Document Processing Flow

When a user submits an arXiv paper URL, the system processes it through the following steps:

graph TD
    %% Document Processing Flow
    A[User inputs arXiv URL] --> B[Download PDF from arXiv]
    B --> C[Convert PDF to Markdown using Markitdown]
    C --> D[Process and clean Markdown content]
    D --> E[Split content into chunks]

    %% MiniRAG Heterogeneous Graph Indexing
    E --> F1[Extract named entities from chunks]
    E --> F2[Generate embeddings with OpenAI]
    F1 --> G1[Create entity nodes]
    F2 --> G2[Create text chunk nodes]
    G1 --> G3[Build entity-entity connections]
    G1 --> G4[Build entity-chunk connections]
    G2 --> G4
    G3 & G4 --> G5[Generate semantic descriptions for edges]
    G5 --> G6[Construct semantic-aware heterogeneous graph]

    subgraph "PDF Processing"
    B
    C
    D
    E
    end

    subgraph "Heterogeneous Graph Indexing"
    F1
    F2
    G1
    G2
    G3
    G4
    G5
    G6
    end

    class B,C,D,E processing;
    class F1,F2,G1,G2,G3,G4,G5,G6 indexing;

Question Answering Flow

When a user asks a question about a paper, the system processes it through these steps:

graph TD
    %% Query Processing
    H[User asks question] --> I1[Extract entities from query]
    I1 --> I2[Predict potential answer types]
    I1 & I2 --> I3[Map query to graph entities]

    %% Graph-Based Knowledge Retrieval
    I3 --> J1[Identify starting nodes via semantic matching]
    J1 --> J2[Discover answer-aware entity nodes]
    J1 & J2 --> J3[Apply topology-enhanced graph retrieval]
    J3 --> J4[Score and rank reasoning paths]
    J4 --> J5[Retrieve connected text chunks]

    %% Response Generation
    J5 --> K[Combine query + retrieved context]
    K --> L[Send to Gemini API]
    L --> M[Return response to user]

    subgraph "Query Semantic Mapping"
    I1
    I2
    I3
    end

    subgraph "Topology-Enhanced Retrieval"
    J1
    J2
    J3
    J4
    J5
    end

    subgraph "Response Generation"
    K
    L
    M
    end

    class I1,I2,I3 mapping;
    class J1,J2,J3,J4,J5 retrieval;
    class K,L,M generation;

The system operates in five sophisticated phases:

PDF Processing: When you submit an arXiv URL, AlphaXIV downloads the PDF, converts it to Markdown using Microsoft's Markitdown tool, cleans the content, and splits it into manageable chunks.
Heterogeneous Graph Indexing: MiniRAG creates a semantic-aware knowledge graph with two types of nodes:
- Entity nodes extracted from the text (concepts, terms, equations)
- Text chunk nodes containing the original content
The system then builds connections between these nodes (entity-entity and entity-chunk) and generates semantic descriptions for each edge, creating a rich graph structure that captures the paper's knowledge.
Query Semantic Mapping: When you ask a question, the system extracts entities from your query, predicts potential answer types, and maps these to the graph entities, creating an efficient bridge between your question and the knowledge graph.
Topology-Enhanced Retrieval: Unlike traditional vector-based retrieval, MiniRAG uses a sophisticated graph traversal approach:
- Identifies starting nodes through semantic matching
- Discovers potential answer nodes based on predicted types
- Applies topology-enhanced graph retrieval to find meaningful reasoning paths
- Scores and ranks these paths based on relevance and structural importance
- Retrieves the connected text chunks that contain the most relevant information
Response Generation: The retrieved context chunks are combined with your query and sent to Google's Gemini API, which generates a comprehensive response that leverages both the semantic content and the structural relationships captured in the graph.

Prerequisites

Python 3.9+
Markitdown for PDF conversion
MiniRAG for indexing/embedding
Google API key for Gemini
OpenAI API key for embeddings

Why Markitdown and MiniRAG?

Markitdown

Markitdown is Microsoft's powerful document conversion tool that transforms various file formats (including PDFs) into clean, structured Markdown. We chose Markitdown for AlphaXIV because:

High-quality PDF conversion: Markitdown excels at preserving the structure of academic papers, including tables, equations, and figures
Semantic understanding: It maintains the semantic structure of documents, making the content more accessible for RAG systems
LLM integration: Markitdown can work with LLMs to provide descriptions for images found in documents
Extensibility: The plugin system allows for custom document converters if needed

MiniRAG (LightRAG)

MiniRAG (distributed as LightRAG) is a lightweight, efficient Retrieval Augmented Generation system designed for simplicity and performance, based on the research paper "MiniRAG: Towards Extremely Simple Retrieval-Augmented Generation". We chose MiniRAG because:

Graph-based indexing: Unlike traditional vector-based RAG systems, MiniRAG employs a graph-based approach that captures the relationships between document chunks, creating a more semantically rich representation of academic papers (see the knowledge graph visualization above)
Superior semantic understanding: The graph structure preserves the hierarchical nature of academic papers (sections, subsections, references), enabling more contextually relevant retrievals and a deeper understanding of complex academic content
Enhanced retrieval accuracy: By considering both semantic similarity and structural relationships, MiniRAG can retrieve more accurate context for complex scientific queries
Optimized for smaller models: Works efficiently with smaller, free language models while maintaining high performance
Flexible embedding options: Supports various embedding models including OpenAI's text-embedding models
Simple API: Provides a clean, easy-to-use API for document indexing and retrieval
Streaming support: Enables streaming responses for better user experience
Customizable retrieval: Allows fine-tuning of chunk sizes, overlap, and retrieval parameters

When working with academic papers, MiniRAG's approach offers concrete benefits:

Better handling of mathematical content: The graph structure helps maintain relationships between equations and their explanations, providing more coherent responses to technical questions
Improved cross-referencing: When a paper references earlier sections or citations, MiniRAG can follow these connections to retrieve the complete context
More accurate answers to complex queries: For questions that require synthesizing information from multiple sections of a paper, the graph-based retrieval provides more comprehensive context than simple vector similarity

Installation

Option 1: Quick Setup

Clone the repository:

git clone https://github.com/AsyncFuncAI/alphaxiv-open
cd alphaxiv-open

Run the setup script:
```
./setup.sh
```
This will:
- Create a virtual environment
- Install the required dependencies
- Create a .env file from the example
- Create the necessary directories
Edit the .env file and add your Google API key, OpenAI API key, and other configuration options.

Option 2: Setup with MiniRAG (Recommended)

Clone the repository:

git clone https://github.com/AsyncFuncAI/alphaxiv-open.git
cd alphaxiv-open

Run the setup script with the MiniRAG option:
```
./setup.sh --with-minirag
```
This will:
- Create a virtual environment
- Install the required dependencies
- Install MiniRAG (LightRAG)
- Create a .env file from the example
- Create the necessary directories
Edit the .env file and add your Google API key, OpenAI API key, and other configuration options.

Option 3: Manual Setup

Clone the repository:

git clone https://github.com/AsyncFuncAI/alphaxiv-open.git
cd alphaxiv-open

Create a virtual environment:

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies:
```
pip install -r requirements.txt
```
(Optional) Install MiniRAG:
```
pip install lightrag-hku[api]
```
Create a .env file from the example:
```
cp .env.example .env
```
Edit the .env file and add your Google API key, OpenAI API key, and other configuration options.

Usage

Option 1: Without MiniRAG (Basic Functionality)

Start the FastAPI server:
```
uvicorn app.main:app --reload
```
The API will be available at http://localhost:8000.
Use the following endpoints:
- POST /api/papers/process: Process an arXiv paper URL
- POST /api/chat: Chat with a processed paper

In this mode, the application will still process papers and convert them to markdown, but will use a simple keyword-based retrieval system instead of MiniRAG for context retrieval.

Option 2: With MiniRAG (Full Functionality)

Install MiniRAG:
```
pip install lightrag-hku[api]
```
Configure your .env file with your OpenAI API key for embeddings:
```
OPENAI_API_KEY=your_openai_api_key_here
```
Start the MiniRAG server with OpenAI embeddings using the provided script:
```
python start_minirag.py
```
This script will read configuration from your .env file and start the MiniRAG server with the correct settings.

Alternatively, you can start the server manually:
```
lightrag-server --working-dir ./data/storage --chunk-size 1000 --chunk-overlap-size 200 --embedding-dim 1536 --cosine-threshold 0.4 --top-k 10 --embedding-binding openai --embedding-model text-embedding-3-small --openai-api-key your_openai_api_key_here
```
Note:
- Replace your_openai_api_key_here with your actual OpenAI API key, or the server will read it from the environment variable
- The --top-k 10 parameter increases the number of context chunks returned (default is 5), which improves response quality for complex papers
- The --embedding-dim 1536 parameter matches OpenAI's text-embedding-3-small model dimension
In a separate terminal, start the FastAPI server:
```
uvicorn app.main:app --reload
```
The API will be available at http://localhost:8000.
Use the following endpoints:
- POST /api/papers/process: Process an arXiv paper URL
- POST /api/chat: Chat with a processed paper

In this mode, the application will use MiniRAG with OpenAI embeddings for advanced context retrieval, providing better results for complex academic papers.

API Documentation

Once the server is running, you can access the API documentation at:

Swagger UI: http://localhost:8000/docs
ReDoc: http://localhost:8000/redoc

Example Requests

Process a Paper

curl -X POST "http://localhost:8000/api/papers/process" \
     -H "Content-Type: application/json" \
     -d '{"arxiv_url": "https://arxiv.org/abs/2201.08239"}'

Chat with a Paper

curl -X POST "http://localhost:8000/api/chat" \
     -H "Content-Type: application/json" \
     -d '{"paper_id": "2201.08239", "query": "What is the main contribution of this paper?"}'

Project Structure

alphaxiv/
├── app/
│   ├── main.py                  # FastAPI entry point
│   ├── models/
│   │   ├── __init__.py
│   │   └── schemas.py           # Pydantic models
│   └── services/
│       ├── __init__.py
│       ├── arxiv_service.py     # Service for arXiv papers
│       ├── markdown_service.py  # Service for markdown conversion
│       ├── indexing_service.py  # Service for indexing with MiniRAG
│       └── gemini_service.py    # Service for Gemini API
├── data/
│   ├── papers/                  # Storage for papers
│   ├── index/                   # Storage for indices
│   └── storage/                 # Storage for MiniRAG
├── .env.example                 # Example environment variables
├── requirements.txt             # Dependencies
└── README.md                    # This file

Acknowledgements

arXiv for providing access to research papers
Microsoft's markitdown for PDF conversion
MiniRAG for indexing and retrieval
Google's Gemini API for chat completions
OpenAI for text embeddings

Contributing

Contributions are welcome! Feel free to:

Open issues for bugs or feature requests
Submit pull requests to improve the code
Share your feedback and ideas

License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
app		app
frontend		frontend
screenshot		screenshot
.env.example		.env.example
.gitignore		.gitignore
Dockerfile.api		Dockerfile.api
Dockerfile.frontend		Dockerfile.frontend
Dockerfile.lightrag		Dockerfile.lightrag
LICENSE		LICENSE
RAILWAY.md		RAILWAY.md
README.md		README.md
docker-compose.yml		docker-compose.yml
index_paper.py		index_paper.py
railway.toml		railway.toml
requirements.txt		requirements.txt
run.py		run.py
set_document_id.py		set_document_id.py
setup.sh		setup.sh
start_minirag.py		start_minirag.py
test_api.py		test_api.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

AlphaXIV-Open

Demo

Features

How It Works

Document Processing Flow

Question Answering Flow

Prerequisites

Why Markitdown and MiniRAG?

Markitdown

MiniRAG (LightRAG)

Installation

Option 1: Quick Setup

Option 2: Setup with MiniRAG (Recommended)

Option 3: Manual Setup

Usage

Option 1: Without MiniRAG (Basic Functionality)

Option 2: With MiniRAG (Full Functionality)

API Documentation

Example Requests

Process a Paper

Chat with a Paper

Project Structure

Acknowledgements

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

AsyncFuncAI/alphaxiv-open

Folders and files

Latest commit

History

Repository files navigation

AlphaXIV-Open

Demo

Features

How It Works

Document Processing Flow

Question Answering Flow

Prerequisites

Why Markitdown and MiniRAG?

Markitdown

MiniRAG (LightRAG)

Installation

Option 1: Quick Setup

Option 2: Setup with MiniRAG (Recommended)

Option 3: Manual Setup

Usage

Option 1: Without MiniRAG (Basic Functionality)

Option 2: With MiniRAG (Full Functionality)

API Documentation

Example Requests

Process a Paper

Chat with a Paper

Project Structure

Acknowledgements

Contributing

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages