Archive Impact

 

Archives are a central place to store and maintain records and historical materials (regardless of format) created by an organization, government or individual. 

Associative Indexing is a method of automatic indexing that augments the terms found in documents with related terms obtained from a term association map. A term association map is a vocabulary tool that shows the similarity between terms based on the co-occurrence of the terms in the database documents.(1)

Automatic Indexing is the use of algorithms (software) to analyze the contents of records, such as bibliographic entries, and assign keywords that represent the content of the given database entries. The techniques used to determine appropriate keywords from the contents of database entries include phrase detection, thesaural lookup, linguistic analysis, statistical analysis, and term occurrence probabilities.(1)
Back-file Conversion is an Imaging process that converts records into a digital format. Reasons for a back-file conversion are space saving, preservation, instant access, cost-savings, and legal requirements. 
Bit is short for "binary digit" and is the smallest unit of information. It has one of two values: on (represented by the number 1), or off (represented by 0). 
Byte is a string of eight bits which is the number needed to store one character such as a letter or a number. It is also the standard measurement unit of a file size. 
Bitmap Image is composed of bits and displayed on the screen as pixels. Bit map images lose resolution as they are enlarged. A bitmap image is sometimes identified by the extension .bmp on the filename. 
Cataloging "preparation of bibliographic records which entails recording descriptions and determining all points of access to the record." Cataloging rules are spelled out in Anglo-American Cataloging Rules (AACR2). MARC (machine-readable cataloging) format records are used as the standard in recording electronic bibliographic data. (2)

CD-ROM (Compact-Disk-Read-Only-Memory) is a type of optical disk that stores data up to 640 Megabytes (MB). The designation "read-only" means once stamped with information, the disk can only be viewed and not written over. 

CGM (Computer Graphics Metafile) is the ANSI standard for vector and bitmap images. 
CMYK (Cyan, Magenta, Yellow, Black) is a subtractive color model used for printing on paper. 
Chain Indexing is a subject categorization scheme in which terms describing objects (typically documents) are linked or chained together on the basis of a set of rules in a hierarchical relationship.(1)
Classification is the process of grouping like terms into classification groups or classes. The classes may exhibit a variety of properties, such as monothetic, polythetic, exclusive, overlapping, ordered, and unordered. In textual systems, one generally classifies either the documents into classifications groups or the keywords into like groups.(1)
Clustering is the grouping of items in a database so that members of a cluster exhibit similarities to each other and dissimilarities to other clusters. In retrieval systems, the items in a cluster are often retrieved together in response to a query. For each cluster a composite item, called the centroid, can be generated that represents the cluster and is used as the basis for retrieval of the cluster. There are two classes of clustering methods: hierarchical methods (such as Ward’s Method), which produce a nested set of clusters, and nonhierarchical or partitioning methods (such as single-link methods), which produce a single layer of clusters, although clusters may overlap (i.e., some items may occur in more than one cluster).(1)
Concordance, or inverted list, is a data structure for indexing textual data records by the substantive terms or keywords associated with each record. The inverted list is an index for each keyword containing the location of each occurrence of the keyword in the database.(1)
Conservation is the treatment of maintaining archival materials to stabilize them chemically or strengthen them physically, sustaining their survival as long as possible in their original form.(2)
Controlled Vocabulary A listing of terms that specifies which terms may be used for indexing and the relationships among them, produced by the process of vocabulary control. (4)

Coordinate Indexing is a method of post-coordinate indexing based on the assignment of keywords that capture the concepts covered by a document. These keywords (or word symbols) are then used as the basis for subsequent retrieval of the records.(1)

Copyright is the exclusive right to publish or sell a book, composition, photograph, work of art, software program, etc.  This right is granted by the government for a certain defined time period. 
Database is a program that captures data for the purpose of being able to manipulate the order, retrieval, input, output, etc. 
Digital Archive is an ideal; a focal point database and storage facility for all formats and media of an organization’s institutional knowledge and history. 
Digitize is to make digital. Digitization transforms paper, or analog, to electrical/computer, or digital. 
DPI (dots-per-inch) is a measure of the resolution. It can refer to a printer, scanner, or monitor. It is the number of dots in a one-inch line. The more dots per inch, the higher the resolution. 
Encoding is the process by which analog tape is converted into a digital file. The file format can be in a number of different file formats.  RM is a standard format for real media files. MPEG is another standard. Time code is converted from a SMTPE time code into a proxy time code. SMTPE time code from video tape is measured in 30 frames per second, while proxy time code is measured by milliseconds. This can create a number of difficulties when indexing proxy files. 
Faceted Classification is a style of pre-coordinate subject description (typified by Universal Decimal Classification and Renganathan’s Colon Classification) which provides a flexible system for generating controlled vocabulary subject classification. Techniques and guiding principles are used to build up the vocabulary and the relationships among the terms of the vocabulary rather than a hard and fast classification scheme of subject headings.(1)
GIF (Graphics Interchange Format) is a bit map file format for graphics sometimes identified by the .gif extension on the filename. GIF images support up to 256 colors. 
Imaging is a process, not a product. It combines technology and information management to create a digital archive. 
Imaging Technology is the hardware and software needed to create digital files from analog material. The technology is divided into input devices and output devices. 
Index is an auxiliary data structure used to speed up access to a data set (e.g., a file of records) in which a pointer to each record of the data set is stored. The pointers in the index are accessed on the basis of a key value of each record. The index may actually contain the key values and the pointers, or the key may be used to generate the address of the pointer in the index, perhaps by hashing. The indexing can be used both to provide an order to the data records and to provide direct access to records in the data set.(1) An index is NOT a list of the all the terms in a data set (See also Concordance). 
Input (imaging) hardware includes the scanner, monitor/display device, computer network or PC, and storage device. Imaging software comprises capture software that works with the scanner, Optical Character Recognition software, system (which is also connected to the Information Management system), and compression software. 
JPEG (Joint Photographic Experts Group) is a graphics storage format identified by the extension .jpg on the filename. The JPEG format uses Lossy compression. Which is a compression technique that loses data during compression. 
JPEG 2000 is the ISO answer to an integrated standard format. The proposed format encapsulates the Digital Imaging Groups Flashpix’s features of independent-resolution, independent size, metadata, and an unambiguous color model. 
KWIC (KeyWord In Context) is a simple printed index for textual material in which keywords in the text are sorted alphabetically and presented linearly, surrounded by portions of the preceding and following text for context.(1)
KWOC (KeyWord Out of Context) is a simple printed index for textual material in which keywords found in the text are sorted alphabetically and presented linearly, followed by the original string for context. Sometimes the keyword is replaced by a character such as "^" in the context string.(1)
Latent Semantic Index is an approach to automatic indexing that is based on the assumption of an underlying association or correlation of the terms used in documents, and the content of the documents for retrieval purposes. Most of the techniques used to determine the relevant associations begin with the term occurrences from which term similarities and term associations can then be calculated.(1)
Licensing is the act of giving formal legal permission to reproduce or sell a copyrighted work. 
Lossy Compression is a compression technique that loses some data during compression and file restoration. 
Media Archives is a type of archives that specializes in storing and maintaining media materials such as photography, digital files, graphics, sound recordings, video and film, etc. 

Media Assets are the digital and traditional media formats (video, files, film negatives, audio, photographic prints, slides, graphic materials), which in a Digital Archive, have resale or re-usable value for an institution. 

Mediagraphical is any integrated multimedia information system that includes a wide range of data types, such as audio, graphical, textual, and pictorial.(1)
Meta is a prefix often used in information science terminology, such as "metadata" and meta knowledge. Meta X means "X about X", so that meta data means "data about data" (e.g. data dictionaries), and meta knowledge means "knowledge about knowledge" (e.g. knowledge structures).(1)
Media Archives is a type of archives that specializes in storing and maintaining media materials such as photography, digital files, graphics, sound recordings, video and film, etc. 
Metadata is the data that describes, for purposes of retrieval or classification, the overall architecture and format of the document or file. It is used extensively on the World Wide Web. Recently, Metadata has been linked with imaging and indexing as a way to create a standard information infrastructure for multi-media databases. 
Micron is a moving icon. An icon that contains a moving video clip that allows the user to select from a variety of videos based on a sample of each video. Microns could be used to represent other dynamic data sets such as simulations.(1)
Moving Image Document (MID) is a database item defined as a matrix of sequential phenomena (i.e. a string of images plus strings of sounds) that is synchronized by time (a Micron). A major difficulty in retrieval of MIDs is the lack of easily defined units, such as text, which can be used for indexing or abstracting purposes.(1)
MPEG (Moving Picture Experts Group) is a digital video Lossy compression format. It can be identified by the file extension .mpg 
Object-oriented Database is a database management system that facilitates the management of objects rather than records. Viewing the data as objects, instead of as records, provides more flexibility in the data types used and removes the need to normalize the data. Since objects can contain other objects as sub-components, these databases can implement inheritance hierarchies.(1)
Optical Character Recognition (OCR) is software that works in tandem with the scanner and "recognizes" the characters the scanner "sees" (only letters and numbers in certain fonts) and converts them from the original analog format to digital format. 
Output (Imaging) represents the final function of the Imaging process. Traditionally this means the printed paper; but in the digital world, output has taken on different formats and characteristics. The information can be kept as a digital file, viewed on a monitor, compressed, transferred, or stored. Other output forms of digital information are printed paper, video, photograph, negative, slide, or transparency. 
PCX is a bit map file format supported by many programs. It was developed by Microsoft Paintbrush and can sometimes be identified by the file extension .pcx. 
POPSI is a string indexing algorithm based on Colon classification a faceted classification scheme used widely in India. The indexer assigns the appropriate faceted description and the string permutations of that description, are generated automatically for the index.(1)
Post-coordinate system is a style of indexing in which the relationships of the indexing terms and database entries are not fixed at the time an entry is added to the database, but rather the user can combine and manipulate the indexing terms at query time. The Boolean combinations of keywords drawn from the full text of records that is common to most information retrieval systems is an extreme example of post-coordinate indexing.(1)
PRECIS (PReserved Context Indexing System) is a string indexing system developed by Derek Austin at the British Library in the early 1970s for subject indexing of the British National Bibliography. The terms in the PRECIS string are arranged and connected by relationships found in the original text or context, rather than from classification scheme. The human indexer chooses the base string (i.e. terms and the relationships) and the subsequent permutations of the base string are performed by a computer.(1)
Pre-coordinate System is an indexing method that establishes, at the time entry is added to the database, the access points for that entry (typically bibliographic).(1)
Preservation is the activity associated with maintaining archival materials for use, either in their original physical form or in some other format.(2)
Re-purposing is the sharing and re-using of digital and traditional media that was originally intended for graphic designs, ads, manuals, and web sites. Institutions now recognize the value of their media assets, and the need to synthesize their digital images into a collective central digital archive. 
Resolution is often referred to as dpi, or dots-per-inch. The dpi of an image is measured as the number of rows by columns. 
RGB (Red Green Blue) is a color scheme for screen display. RGB images can be converted to CMYK for printing. 
String Indexing System is a form of document indexing characterized by a string or set of indexing terms for each entry. The terms within each string are connected by relationships to each other according to a set of rules for the particular scheme. Typically, a basic string is generated by a human indexer, and subsequently manipulated by a computer, to produce a multiple-access index to the documents.(1)
Thesaurifacet is an indexing tool that combines the alphabetic access of a thesaurus with the hierarchical access of a faceted classification scheme. The two parts complement each other in that the hierarchical relationships are contained in the classification arrangement, while all other relationships are contained in the thesaurus part. Terms may, however, occur in only one hierarchy of the facet component, but secondary relationships can be included by the thesaural relationships.(1)
Thesaurus is a vocabulary tool that provides information about the use of terms, and certain relationships between terms. The relationships normally used between terms include the following: broader term (BT), narrower term (NT), use for or synonymous (UF), related terms (RT), and use (replace this term with some other).(1) (See also Thesaurifacet, Classification and Faceted Classification) 
TIFF is the Tagged Image File Format. It is a format used most often for archiving images. It can sometimes be identified by the file extension .tif. It is NOT web supported. 
Vector Image is an image composed in a geometrical formulation that can be reduced or enlarged without losing quality. 
(1) Watters, Carolyn.  (1992).  Dictionary of Information Science & Technology.  San Diego, CA: Academic Press.

(2) DePew, John N.  (1991).  A Library, Media & Archival Preservation Handbook.

(3) Soper, Mary E., Larry N. Osborne, and Douglas L. Zweizig.  (1990).  The Librarians Thesaurus: A Concise Guide to Library & Information Terms.  Chicago: American Libary Association.

(4) American National Standards Institute.  (1984).  American National Standard for Basic Criteria for Indexes: Z39.4.  New York: American National Standards Institute.


Home | About Us | Services | Support | Activities | Accomplishments | Glossary | Links