Key Features:
- OCR with image preprocessing
- Multiple summarization techniques
- Named entity extraction
- Question-answering capability
- Streamlit interactive interface
Dependencies:
streamlit
pytesseract
opencv-python
spacy
nltk
transformers
torch
scikit-learn
Additional Setup:
- Install Tesseract OCR
- Download spaCy model:
python -m spacy download en_core_web_sm
- Install dependencies
Summarization Techniques:
- Extractive: Selects key sentences
- Abstractive: Generates concise summary
- Hybrid: Combines both approaches