Homepage    The company    Scanning    Technology    Partners and references    Contact
Download        Functionalities        Screenshots

In the past few years, digitization has become essential. This new technique is creating a growing need for computer systems to manage texts electronically.

The DIGISCRIB Society, specialized in digitization of books and documents and encouraged by its partners the Centre d'Etudes Supérieures de la Renaissance (CESR) and its team Virtual Humanistic Libraries (BVH) and RE-TranscriPro, has invested in researching computerized solutions for encoding, analysis, management and manipulation of texts and documents after OCR processing or after their transcription.
This research goes hand in hand with the DIGISCRIB Society's research on Linux-based OCR tools and image management programs, like Tesseract and ImageMagick, for example.

Based on the XML/TEI encoding method, taking account of the possibilities that it offers and of the responses that it brings to an increasing demand, the DIGISCRIB Society has devoted itself to the development of text-encoding business software(1).

« EditTEI » is the name of this new text encoder. It is written in Java, which makes it compatible with several platforms: Linux, Windows, Mac, etc. It is perfectly trilingual (French, English and Spanish).

This first complete version « EditTEI 1.6.5 » offers text-edit functionalities: the layout for interactive tagging without needing to know nor enter the XML/TEI tags. This task is done using a data header(2) or using an existing XML/TEI file(3), or simply from a new file(4).

This encoder offers commonly used text edit tools, as for example: open, save, print file, copy, cut and paste text, insert or delete pages, insert special characters ...

In addition to the basic text edit tools, the business software « EditTEI » allows the addition or deletion of existing XML/TEI tags, the encoding of characters in ASCII(5) and UTF8(6), among others, and may allow with permission the use of an on-line correcting dictionary, the possibility of « un-tilda-ing » of texts (extending words that are abbreviated with a tilda) or the concealment of abbreviations on demand.

(1) Specialized software developed according to the particular needs of a client.
(2) Ensemble of tagged information concerning the work. Information that is not seen in the document.
(3) A document in XML/TEI format with or without headers.
(4) A document in text format without headers.
(5) American Standard Code for Information Interchange, a character-encoding scheme based on the ordering of the English alphabet.
(6) 8-bit UCS/Unicode Transformation Format, a variable-length encoding for Unicode.

Homepage    The company    Scanning    Technology    Partners and references    Contact