Data Ingestion of Guide Commericiale: Difference between revisions
Jump to navigation
Jump to search
| Line 47: | Line 47: | ||
* Run OCR with Pytesseract | * Run OCR with Pytesseract | ||
* Extract text using PDFPlumber | * Extract text using PDFPlumber | ||
* Compare the two approaches, choose the most suitable approach, and pre-process the text output | * Compare the two text extraction approaches, choose the most suitable approach, and pre-process the text output | ||
| align="center" |✓ | | align="center" |✓ | ||
Revision as of 11:50, 28 November 2024
Introduction
Project Timeline & Milestones
| Timeframe | Task | Completion |
|---|---|---|
| Week 4 |
|
✓ |
| Week 5 |
|
✓ |
| Week 6 |
|
✓ |
| Week 7 |
|
✓ |
| Week 8 |
|
✓ |
| Week 9 |
|
✓ |
| Week 10 |
|
✓ |
| Week 11 |
|
✓ |
| Week 12 |
|
|
| Week 13 |
|
|
| Week 14 |
|