A robust FastAPI-based backend service for candidate management, CV parsing, and workflow automation.
- 🚀 High-performance FastAPI backend
- 🤖 AI-powered CV parsing for PDF and Docx (document loaders & Unstructured OCR)
- 🔒 Rate limiting and security middleware
- 📊 PostgreSQL database with SQLAlchemy ORM (Async)
- 🔄 Asynchronous request handling
- 📝 Structured logging system with correlation IDs
- 🌐 CORS support
- 🔍 Request ID tracking
- 🐳 Docker support
- 📄 Pydantic schema validation
- 📚 Database Migrations using Alembic
- 📈 Pinecone vector database integration
- 📦 Poetry dependency management
- 🔑 Redis for rate limiting
- 🤖 LangChain & LangGraph integration for AI workflows
- 📄 Document processing capabilities
- 🔐 AWS S3 integration for file storage (Not enabled)
- 📝 Streaming and None-Streaming chat endpoints
api/
├── agent/ # AI agent implementation
│ ├── workflow.py # Candidate processing workflow
│ ├── tools.py # Agent tools
│ └── prompts.py # Agent prompts
├── api/ # API routes
│ ├── v1/ # API version 1 endpoints
│ ├── deps.py # Dependencies
│ └── router.py # Main router
├── core/ # Core configuration
│ └── config.py # Settings management
├── crud/ # Database operations
│ ├── candidates.py # Candidate CRUD operations
│ └── sections.py # Section CRUD operations
├── models/ # SQLAlchemy models
│ ├── candidates.py
│ ├── education.py
│ ├── experience.py
│ ├── projects.py
│ └── skills.py
├── schema/ # Pydantic schemas
│ ├── agent.py
│ ├── candidates.py
│ ├── education.py
│ └── responses.py
├── services/ # Business logic
│ └── documents.py # Document processing
└── utils/ # Utilities
├── helpers.py
├── logger.py
├── s3_client.py
└── middlewares/
- Python 3.12+
- Docker and Docker Compose (optional)
- Redis
- PostgreSQL
- OpenAI API key
- Pinecone API key (https://www.pinecone.io/)
- Unstructured API key (https://docs.unstructured.io/api-reference/api-services/free-api)
- Clone the repository:
git clone https://github.com/andrew-sameh/agentic-cv-parser.git
cd agentic-cv-parser
- Install dependencies:
poetry install
- Set up environment variables:
cp .env.example .env
# Edit .env with your configuration
- Run the application:
poetry run python main.py
- Create a virtual environment:
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
- Install dependencies:
pip install -r requirements.txt
- Set up environment variables:
cp .env.example .env
# Edit .env with your configuration
- Run the application:
python main.py
1.Clone the repository
2.Set up environment variables:
cp .env.example .env
# Edit .env with your configuration
3.Start the application:
docker-compose -p cv-parser up -d
# or
docker compose -p cv-parser up -d
The following environment variables need to be configured in your .env
file:
ENV
: Environment (dev/prod)PROJECT_NAME
: Project nameVERSION
: API versionDESCRIPTION
: API descriptionLOG_LEVEL
: Logging levelLOG_JSON_ENABLE
: Enable JSON loggingBACKEND_CORS_ORIGINS
: Allowed CORS origins
DATABASE_USER
: PostgreSQL usernameDATABASE_PASSWORD
: PostgreSQL passwordDATABASE_NAME
: Database nameDATABASE_HOSTNAME
: Database hostDATABASE_PORT
: Database port
REDIS_HOST
: Redis hostREDIS_PORT
: Redis portREDIS_DB
: Redis database number
AWS_S3_BUCKET_NAME
: S3 bucket nameAWS_S3_ACCESS_KEY_ID
: AWS access keyAWS_S3_SECRET_ACCESS_KEY
: AWS secret keyAWS_S3_REGION_NAME
: AWS regionAWS_S3_BASE_FOLDER
: Base folder in S3
OPENAI_API_KEY
: OpenAI API keyLLM_MODEL
: Language model to useEMBEDDING_MODEL
: Embedding modelUNSTRUCTURED_API_KEY
: Unstructured API key
PINECONE_API_KEY
: Pinecone API keyPINECONE_INDEX_NAME
: Pinecone index nameEMBEDDING_SEARCH_TYPE
: Search typeEMBEDDING_SCORE_THRESHOLD
: Similarity thresholdEMBEDDING_TOPK
: Top K results
1.Start PostgreSQL and Redis using Docker:
docker-compose up -d db redis
2.Initialize the database:
alembic upgrade head
Once the application is running, visit:
- Swagger UI:
http://localhost:8000/
- ReDoc:
http://localhost:8000/redoc
The agent system (agent/
) handles chat functionalities.
API endpoints are organized in the api/
directory with versioning support.
SQLAlchemy models in models/
define the database schema for:
- Candidates
- Education
- Experience
- Projects
- Skills
- Certifications
- Document processing (
services/documents.py
) handles document parsing and extraction.
- S3 integration for file storage (not used)
- Structured logging
- Rate limiting
- Request correlation
- Error handling
- Redis integration
In progress
- Fork the repository
- Create a feature branch
- Commit your changes
- Push to the branch
- Create a Pull Request