Project Study: Government Electronic Archiving System
Introduction
In light of the rapid digital transformation, it is essential to provide intelligent electronic archiving systems to efficiently and securely manage government documents. This project aims to design and implement an electronic archiving system based on modern web technologies, AI-powered Optical Character Recognition (OCR), and an integrated analytical system using the ELK Stack to support decision-making and data analysis.
Part 1: Web-Based Archiving System
The archiving system is the core of document management and consists of three main services:
- Authorization Service
- Archive Service
- Media Service
Technologies Used
- Frontend: React.js
- Backend: Node.js with GraphQL API
- Database: MySQL
Document Structure
Each archived document includes:
- Source stakeholder
- Destination stakeholders
- Title
- Keywords
- Attached files (PDFs, images, etc.)
System Roles
- Super Admin: Full access and system control
- Admin: Manage users and archive content
- Data Writer: Create and modify records
- Data Reader: View documents
- Subscriber: Limited view access
Part 2: Optical Character Recognition (OCR) System
The OCR system utilizes PaddleOCR, an AI-powered technology, to extract text from images and scanned documents.
Workflow
- The user uploads a scanned document via the Media Service.
- The image is processed by the OCR service (PaddleOCR).
- Extracted text is stored in the database for indexing and search.
Suggested Storage
- NoSQL Database (MongoDB): Recommended for storing OCR results due to the unstructured nature of text data.
Part 3: Analytics and Search Using ELK Stack
The system integrates the ELK Stack for real-time indexing, full-text search, and visualization.
Components
- Elasticsearch: Search and indexing engine
- Logstash: Data processing and transformation
- Kibana: Dashboard for data visualization and analytics
Workflow
- Data is exported from MySQL (structured) and MongoDB (unstructured) to Logstash.
- Logstash formats and sends the data to Elasticsearch.
- Kibana provides dashboards for insights, statistics, and full-text search.
Benefits
- Fast and intelligent search capabilities
- Comprehensive document analytics
- Visual reports to support governance and decision-making
System Architecture Diagram (Textual Overview)