Workshop on Scholarly Databases & Data Integration


August 30 and 31, 2006 (please see the agenda for details)

Meeting Place:

Herman B. Wells Library (map), Indiana University
1320 E. 10th St., Wells Library, Media Showing Room - E 174
Bloomington, IN 47405
Indiana University Campus Map »


group photo


Katy Börner

Associate Professor of Information Science, SLIS, Indiana University. Project director of the InfoVis CI and Network Workbench. Co-curator of Places & Spaces.

PR^2 | PPT

Miguel Andrade

Scientist, Molecular Medicine, Ottawa Health Research Institute Assistant, Professor, Departments of Medicine, Cellular and Molecular Medicine, Faculty of Medicine, University of Ottawa.
PR^2 | PPT

Stacy Kowalczyk

Associate Director for Projects and Services, Digital Library Program, SLIS Ph.D. student, Indiana University.

Barend Mons

Associate Professor in Biosemantics, Erasmus Medical Center and Leiden University Medical Center, the Netherlands & www.knewco.com Knewco, Inc.
PR^2 | PPT

Erik van Mulligen

Chief Technology Officer, Erasmus Medical Center and Leiden University, Medical Center, the Netherlands. www.knewco.com
Knewco, Inc.

Marc Weeber

Head of Technical Operations USA, www.knewco.com Knewco, Inc.
PR^2 | PPT

Workshop Goals & Agenda:

Read White Paper Draft and send comments to katy@indiana.edu.

In recent years, bibliographic databases have become ever more important in scientific research and science management. These databases are essential to the primary work of science - retrieving the correct literature and finding potential collaborators and competitors. But increasingly, data and text mining of these databases have become a science in itself with novel discoveries in a multitude of disciplines. For science management, the use of these databases has become essential in connecting new proposals to what is extant and to find reviewers for these proposals. Additionally, measuring success of both projects and individuals in science has become increasingly important, and bibliographic databases are the key resource.

Uniquely identifying authors is currently an unmet challenge in all freely accessible and searchable bibliographic databases (e.g., PubMed, CiteSeer, arXiv, Google Scholar). To our knowledge, the only extensive (non-free) bibliographic database that has uniquely identified authors is the American Mathematical Society(TM)s Mathematical Reviews Database that has kept track of individual authors since its inception in 1940 (http://www.ams.org/mr- database/mr-authors.html). While this was originally a manual effort it is today assisted by computational means to a certain degree.

The recent success of Wikipedia, a community effort to establish an encyclopedia has lead to a series of spin-off projects and proposals that have community annotation of data at their core. WikiAuthors is such a proposal to collectively annotate and disambiguate science authors. Additionally, there have been several papers about algorithms do uniquely identify authors.

The data integration problem associated with the federation and usage of multiple scholarly databases might be best solved by using existing unique author/institutions/geolocations/etc. lists, and a merge of automatic data integration and manual data integration via a Wiki like approach.


Tuesday, August 29th, 2006

8:30pm Meeting of the Workshop Organizers at Tutto Bene.

Wednesday, August 30th, 2006

8:00am Light Breakfast
8:30am Introduction by Participants lead by Katy Börner (5 min per person/organization)
10:00am Break
10:15am Introduction by participants (continued)
12:00pm Lunch
12:30pm Discussion of Opportunities and Challenges



Potential Futures of Scholarly Data Acquisition, Management & Utilization
Vision talk by Katy Börner, Indiana University
Vision talk by Barend Mons, Knewco Co.


Discussion (Lead by Erik Erik van Mulligen)


Dinner at Little Tibet

Thursday, August 31st, 2006

8:00am Light Breakfast
8:30am Introduction to Wikipedia Ideas and Technology (Erik Moeller)
Introduction to Author Name Disambiguation (Neil Smalheiser & Vetle Torvik)

10:00am Break
10:30am Software Demos
- Scholarly Database by Gavin LaRowe and Sumeet Ambre
- WiktionaryZ by Erik Moeller
- CIShell by Bruce Herr
- Scopus by Dannien Sherman
- Network Workbench by Bonnie Huang
- Biomedical Visualizations by Ketan Mane
- Discovery Logic Tools by Mike Pollard
11:30am Discussion of Challenges and Opportunities (Lead by Marc Weeber)
12:00pm Lunch
1:00pm Breakout Sessions (Lead by Miguel Andrade)
2:00pm Breakout Session Reports: 1-standards, 2-funding, 3-community, 4-technical
2:30pm Break
3:00pm Committments & Discussion of Next Steps (Lead by Barend Mons)
4:00pm Adjourn
6:00pm Indian Dinner at Shanti
8:00pm ExArt & Arthur Murray Dance Studio presents

Friday, September 1st, 2006

9:00am Semantic Tagging by Martijn Schuemie
10:00am Scholarly Database by Gavin La Rowe & Sumeet Ambre
11:00am Cyberinfrastructure Shell & Network Workbench by Bruce Herr and Weixia (Bonnie) Huang
12:00pm Working Lunch and Discussion

Saturday, September 2nd, 2006

9:00am Boat Tour and BBQ at Lake Monroe, Bloomington

Participants Attending:

Erik Moller

Erik Möller

Editor of infoAnarchy. Webmaster of The Origins of Peace and Violence and Der Humanist. Lead developer of WiktionaryZ and Wikidata, former Chief Research Officer of the Wikimedia Foundation. Wikipedian since 2001 and one of the developers of the underlying MediaWiki software.
PR^2 | PPT

Bruce Herr

Bruce Herr

Research Staff, Cyberinfrastructure for Network Science Center, School of Library and Information Science, Indiana University

Weixia Huang

Weixia (Bonnie) Huang

Senior System Architect, Cyberinfrastructure for Network Science Center, School of Library and Information Science, Indiana University.

Sumeet Ambre

Sumeet Ambre

SLIS Master Student, Scholarly Database Developer, Cyberinfrastructure for Network Science Center, School of Library and Information Science, Indiana University.

Israel Lederhendler

Israel I. Lederhendler

Director, Division of Information Services, OER, OD
National Institutes of Health, DHHS
Bethesda, Maryland.


Thom Hickey

Thom Hickey

Vice-President, Bibliometrics Science-Metrix
Chief Scientist, OCLC Research.
PR^2 | PPT

Gavin La Rowe

Gavin La Rowe

Research Assistant, Scholarly Database Team Lead, Cyberinfrastructure for Network Science Center, School of Library and Information Science, Indiana University.

Neil Smalheiser

Neil Smalheiser

Assistant Professor, Department of Psychiatry, University of Illinois at Chicago.

James Pringle

James Pringle

Vice President, Development, Thomson Scientific.

John Burgoon

John Burgoon

Master Student, Cyberinfrastructure for Network Science Center, School of Library and Information Science, Indiana University.

Ketan Mane

Ketan Mane

Ph.D. Candidate, Cy berinfrastructure for Network Science Center, School of Library and Information Science, Indiana University.

Martijn Schuemie

Martijn Schuemie

Department of Medical informatics, Erasmus MC, Rotterdam, The Netherlands.

Lokman Meho

Lokman I. Meh

Assistant Professor of Library and Information Science, SLIS, Indiana University. Researches citation analysis, digital libraries, and information access.
(will not attend 2nd half of Day 2)

Ron Day

Ron Day

Associate Professor of Library and Information Science, SLIS, Indiana University.
Expert in the history, culture and political economy of information, documentation, communication, knowledge, and digital media. roday@indiana.edu

Jeff Krause

Jeff Krause

Professor of Chemistry, Quantum Theory Project, University of Florida & NSF Directorate on Theoretical and Computational Chemistry.

Interested But Cannot Attend:

Robert Bolander

Robert C. Bolander

Communications & Programs Manager, OCLC Research

Mark Claydon-Smith

Mark Claydon-Smith

Head of Information Systems Engineering and Physical Sciences Research Council, Polaris House, Mark.Claydon-Smith@epsrc.ac.uk

Paul Ginsparg

Paul Ginsparg

Professor of Physics, Cornell University, Created arXiv.org, ginsparg@cornell.edu

Zachary Ives

Zachary G. Ives

Assistant Professor of Computer & Information Science,University of Pennsylvania,

Albert Mons

Albert Mons

Chief Executive Officer, Knewco, Inc.

Andre Skupin

André Skupin

Department of Geography, San Diego State University

Alon Levy

Alon Levy

Professor of Computer Science, University of Washington

Adam Jackson

Adam Jackson

Board Member Knewco, Inc

Beth Plale

Beth Plale

Computer Science Department, Indiana University

Vetle Torvik

Vetle Torvik

Research Assistant Professor, Department of Psychiatry, University of Illinois at Chicago.

Chris Rosin

Chris Rosin

President, Parity Computing, Inc.,
PR^2 | PPT

Alex Soojung-Kim Pang

Alex Soojung-Kim Pang

Ph.D., Research Director and Blogger-in-Chief, Institute for the Future



Bloomington has a number of hotels, but the most convenient is the Indiana Memorial Union on campus. A block of rooms have been held for this workshop. When you register with the hotel, please let them know that you are with the "Database Integration Workshop". All participants are responsible for their own transportation and accomodations.


See the contact page for the Cyberinfrastructure for Network Science Center, http://cns.iu.edu/contact.html or contact Samantha Hale (ude.anaidni@elahjs).


This workshop is sponsored by Knewco, Inc., the Digital Library Program of the Indiana University Libraries, the National Science Foundation under Grant No. CHE-0524661, and a James S. McDonnell Foundation grant in the area Studying Complex Systems entitled Modeling the Structure and Evolution of Scholarly Knowledge

Thank you to our generous sponsors: