An Integrated Participatory Platform for Human Evaluation of Machine Translation

Jiangping Chen; Olajumoke Azogu; Wenqian Zhao

ABSTRACT

We describe the functions of HeMT, a multilingual participatory platform for Human Evaluation of Machine Translation. HeMT is used by three types of users including translators, evaluators, and reviewers. It consists of six major modules: User Management, Manual Translation, User Training, Evaluation, Result Visualization, and Multilingual Lexicon Management. HeMT can be used by Digital Libraries and Machine Translation communities for conducting manual translation and machine translation evaluation tasks.

Erasmus: An Organization- and User-Centered Dublin Core Metadata Tool

Michael Khoo; Craig MacDonald; Joon Park

ABSTRACT

Digital library interoperability is supported by good quality metadata. The design of metadata creation and management tools is therefore an important component of overall digital library design. A number of factors affect metadata tool usability, including task complexity, interface usability, and organizational context of use. These issues are being addressed in the usercentered design of a metadata tool for the Internet Public Library.

Exploratory Cross Collection Search for Digital Objects

Hui Zhang

ABSTRACT

The idea of exploratory search has received growing attention in the digital library field because of its potential to improve user experience with diversified content that is common to many institutions. Furthermore, with the trend that using digital repositories as the central infrastructure for disseminating and curating digital works, there is a demand for exploratory search application that is able to provide access to objects stored in these repositories. In this demo, we present, IU Digital Collections Search, a system that provides an unified access to Fedora objects over different types and collections with faceted search and browse.

NLM Video Search

John Doyle; Doron Shalvi; Ed Luczak

ABSTRACT

NLM Video Search is a new, unique software tool that offers rapid retrieval of historical public health films created by the U.S. government and in the public domain. NLM Video Search solves the challenging task of accurately searching digital videos and embedded video transcripts. In addition to offering a full-text search of a film’s Section 508-compliant transcript, the tool graphically displays where the search word or phrase occurs within the timeline of a film. The timeline results are clickable for viewing selected video content. Digitized and coded using the H.264 standard to produce quality video in a small file, the video files are delivered progressively over HTTP, allowing the user to rapidly browse within the film without latency.

NLM Video Search is based on a combination of open-source and inexpensive commercial multimedia tools augmented with speech recognition technology. It is easily integrated and customizable to fit any digital content, repository or web application.

For more information visit: http://collections.nlm.nih.gov

Research Discovery through Linked Open Data

Paul Albert; Kristi Holmes; Katy Borner; Mike Conlon

ABSTRACT

VIVO is an open source semantic web platform that enables the discovery of research and scholarship across disciplinary and administrative boundaries through interlinked profiles of people and other data relevant to researchers. VIVO allows researchers to highlight areas of expertise, display academic credentials, visualize academic and social networks, and display information such as publications, grants, teaching, service, awards, and more. Profile information can be programmatically imported from authoritative data sources such as institutional sources as well as bibliographic and grants databases. By storing and exposing data as RDF and using standard ontologies, the information in VIVO can either be displayed in a human-readable web page or delivered to other systems as RDF, allowing researcher metadata in VIVO to be harvested, aggregated, and integrated into the Linked Open Data cloud.

Structured Audio Content Analysis and Metadata in a Digital Library

David Bainbridge; J. Stephen Downie; Andreas Ehmann

ABSTRACT

This work illustrates how audio content analysis of music and manually assigned structural temporal metadata can be used to form a digital library designed for musicological exploration. In addition to text-based searching and browsing, the document view is enriched with an interactive structured audio time-line that shows ground-truth data representing the logical segments to the song, and a version that was automatically generated for comparison. A self-similarity “heat” map is also display, and is interactive. Clicking within the map at a co-ordinate (x, y) results in the audio played simultaneous at time offset x, and y panned left and right to make it easier for the listener to separate out the differences. The musicologist can also initiate an audio content based query starting at any point in the song. This produces a ranked result set which can be further studied through their respective document views. Alternatively they can perform a musical structure search (for example, for songs that contain the structure b, b, c, b, c).

The Profiles in Science Digital Library: Behind the Scenes

Marie E. Gallagher; Christie Moffatt

ABSTRACT

This demonstration shows the Profiles in Science® digital library. Profiles in Science contains digitized selections from the personal manuscript collections of prominent biomedical researchers, medical practitioners, and those fostering science and health. The Profiles in Science Web site1 is the delivery mechanism for content derived from the digital library system. The system is designed according to our basic principles for digital library development. The digital library includes the rules and software used for digitizing items, creating and editing database records and performing quality control as well as serving the digital content to the public. Among the types of data managed by the digital library are detailed item-level, collection-level and cross-collection metadata, digitized photographs, papers, audio clips, movies, born-digital electronic files, optical character recognized (OCR) text, and annotations (see Figure 1). The digital library also tracks the status of each item, including digitization quality, sensitivity of content, and copyright. Only items satisfying all required criteria are released to the public through the World Wide Web. External factors have influenced all aspects of the digital library's infrastructure.

The ResultsSpace Collaborative Search Environment

Robert Capra; Jaime Arguello; Annie Chen; Katie Hawthorne; Gary Marchionini; Lee Shaw

ABSTRACT

The ResultsSpace Collaborative Search Environment was developed by our research team to support asynchronous collaborative information retrieval among a small group of collaborators. The tool is designed to promote awareness of collaborators’ searches and the documents they have rated. Collaborative awareness is supported through several mechanisms: an area that shows a history of queries from collaborators, a summary display of collaborators’ ratings next to each search result, and changes in the visual salience of search results based on their aggregate rating from all collaborators. Faceted controls allow users to filter results based on specific ratings (relevant, not relevant, and maybe) and on specific collaborator(s) who have rated an item. In this demonstration, we will describe features of the system, how they are implemented, and give insights into the design rationale.

WARCreate - Create Wayback-Consumable WARC Files from Any Webpage

Mat Kelly; Michele C. Weigle

ABSTRACT

The Internet Archive's Wayback Machine is the most common way that typical users interact with web archives. The Internet Archive uses the Heritrix web crawler to transform pages on the publicly available web into Web ARChive (WARC) files, which can then be accessed using the Wayback Machine. Because Heritrix can only access the publicly available web, many personal pages (e.g., password-protected pages, social media pages) cannot be easily archived into the standard WARC format. We have created a Google Chrome extension, WARCreate, that allows a user to create a WARC file from any webpage. Using this tool, content that might have been otherwise lost in time can be archived in a standard format by any user.