The digitization of archives refers to the use of database technology, data compression technology, high-speed scanning technology and other technical means to organize papers such as paper documents and audio-visual files and archived electronic files into a file with an orderly structure. Information library." The digitization of the file can save the file storage space, relieve the pressure on the warehouse, reduce the wear and tear caused by the frequent use of the original file, properly solve the problem of the use of precious archives, and help protect the original file, especially precious. The file is saved. At present, digital plays an important role in the archives business and has become an inevitable trend in the development of archives.
File digitization and scanning technology
The digitization of files realizes the input of texts, mainly by scanning paper files and files into digital form. Scanning processing is a process of batch converting and sorting archived data into image files by medium and high speed scanners and special scanning software, and automatically implementing image compression and storage.
(1) Relevant standards in national regulations
In addition to the "Electronic Document Filing and Management Specification", the Direct Digital Standard for Archives is the "Technical Specification for Digitalization of Paper Archives". According to this technical specification, "scanning should be based on the size of the file format to select the scanner or professional scanner of the corresponding specification for scanning. The large format file can be scanned by a large format digital platform or a microfilm digital conversion device. It can also be processed by image stitching after small-format scanning. "In addition, "paper is in poor condition, and files that are too thin, too soft or too thick should be scanned by flatbed; files with good paper quality can be scanned at high speed. To improve work efficiency."
The scanning color mode generally has black and white binary values, gray scales, colors, etc., and usually uses black and white binary values. Specifically, it is subdivided into three types: “The page is black and white, and the file with clear writing and no illustration can be scanned in black and white binary mode. The page is black and white, but the writing is poorly defined or illustrated. Files and pages with multi-color text can be scanned in grayscale mode. There are red heads, seals or files with black and white photos, color photos and color illustrations, which can be scanned in color mode if needed."
The selection of the scanning resolution parameter size is in principle based on the clear and complete image after scanning, and does not affect the utilization effect of the image. Because the high resolution is easy to make the file copy, based on this, the national standard specifies that the file is scanned in black and white binary, grayscale, and color modes, and the resolution is generally selected to be ≥100dpi. If you encounter special conditions such as small text, dense, and poor definition, you can increase the resolution appropriately. For files that require OCR Chinese character recognition, the scanning resolution is generally recommended to select ≥200dpi.
(2) Practice in practice
In actual work, the archives department generally uses various types of scanners for scanning according to the different conditions of the files themselves, and digital cameras are less used. In addition, the actual scanning is limited by the file status or scanning equipment, and there are also files that are temporarily unable to be digitally converted, such as paper that is too damaged, brittle, or some oversized drawings. And these can only be solved after waiting for the device or technology to go further.
The choice of color mode is based on the existing equipment and the status of the file itself, and can follow the principle of gradual progress. For example, when the archives digitize paper files, the first phase is mainly black and white scanning. In the second phase, the red-headed documents and other documents with red stamps are scanned in color, and the third phase is all color-scanned. Undoubtedly, the color scan has a richer level and higher definition, which can display the original appearance of the file more realistically.
The choice of resolution is very different from the equipment, and it varies in different regions and departments. For example, in the first phase of the digitalization of paper archives, the archives will set the scanning resolution to 300dpi. The digital scanning resolution of the archives is generally around 200-300dpi, and some are as high as 600dpi; while some devices are relatively backward. Departments and regions, their scanning resolution is set according to national regulations, and even in many departments still less than 200dpi. The higher the resolution, the clearer the scanned image, but the size of the image file must be considered.
(III) Development trend of scanning technology
The most important of the scanning technologies are color mode selection and resolution selection.
The color mode will undoubtedly evolve towards color scanning, and the choice of resolution needs to be flexibly set according to the actual business. Under normal circumstances, in order to meet the needs of networked query, black and white images can meet the requirements by using 200dpi, and the scanning resolution of color images can be lower. The specific parameters can be comprehensively selected according to scanning resolution and quality factors. For some special uses such as hosting exhibitions, higher scanning resolutions can be used. It is worth noting that the choice of resolution should not be too low or too high. For example, Fujian used a scanning resolution of 50dpi. Although the capacity is small and the cost is low, it is impossible to achieve online query utilization, which is equivalent to doing useless work. On the other hand, if you pursue excessive resolution, it will lead to excessive capacity. It is also a burden for the dissemination of online resources.
After the digitization of the file, the use of text recognition (OCR) should also be considered. Generally speaking, the text recognition is mainly used for full-text search, instead of actually restoring the scanned image file to a document. Therefore, in this regard, do not OCR recognition rate to set the scanning resolution, "paper file digital technical specification" suggests the need for OCR recognition of the image file, resolution ≥ 200dpi, which is a relatively neutral standard.
NINGBO BRIGHT MAX CO., LTD. , https://www.equine-tool.com