Document Pipeline

ongoing Document Processing

Local OCR pipeline for digitizing physical documents. Scans paper records through a local LLM (OLMo 2) to produce structured spreadsheet data, with human review for accuracy.

Local LLM inference (no cloud PII exposure)
TIFF/PDF to structured data conversion
Human-in-the-loop validation workflow

PythonOLMo 2Tesseract

Activity Timeline

No activity recorded yet.

Ask About Projects

Hi! I can answer questions about Ashita's projects, the tech behind them, or how this blog was built. What would you like to know?