{content=

Workshops

Workshop on Scholarly Databases & Data Integration

Date:

August 30 and 31, 2006 (please see the agenda for details)

Meeting Place:

Herman B. Wells Library (map), Indiana University
1320 E. 10th St., Wells Library, Media Showing Room - E 174
Bloomington, IN 47405
Indiana University Campus Map »

Photos:

Organizers:

Katy Börner

Associate Professor of Information Science, SLIS, Indiana University. Project director of the InfoVis CI and Network Workbench. Co-curator of Places & Spaces.
katy@indiana.edu
PR^2 | PPT

Scientist, Molecular Medicine, Ottawa Health Research Institute Assistant, Professor, Departments of Medicine, Cellular and Molecular Medicine, Faculty of Medicine, University of Ottawa.
mandrade@ohri.ca
PR^2 | PPT

Stacy Kowalczyk

Associate Director for Projects and Services, Digital Library Program, SLIS Ph.D. student, Indiana University.
skowalcz@indiana.edu
PPT

Barend Mons

Associate Professor in Biosemantics, Erasmus Medical Center and Leiden University Medical Center, the Netherlands & www.knewco.com Knewco, Inc.
bmons@knewco.com
PR^2 | PPT

Erik van Mulligen

Chief Technology Officer, Erasmus Medical Center and Leiden University, Medical Center, the Netherlands. www.knewco.com
Knewco, Inc.
mulligen@knewco.com
PR^2

Marc Weeber

Head of Technical Operations USA, www.knewco.com Knewco, Inc.
marc@weeber.net
PR^2 | PPT

Workshop Goals & Agenda:

Read White Paper Draft and send comments to katy@indiana.edu.

In recent years, bibliographic databases have become ever more important in scientific research and science management. These databases are essential to the primary work of science - retrieving the correct literature and finding potential collaborators and competitors. But increasingly, data and text mining of these databases have become a science in itself with novel discoveries in a multitude of disciplines. For science management, the use of these databases has become essential in connecting new proposals to what is extant and to find reviewers for these proposals. Additionally, measuring success of both projects and individuals in science has become increasingly important, and bibliographic databases are the key resource.

Uniquely identifying authors is currently an unmet challenge in all freely accessible and searchable bibliographic databases (e.g., PubMed, CiteSeer, arXiv, Google Scholar). To our knowledge, the only extensive (non-free) bibliographic database that has uniquely identified authors is the American Mathematical Society(TM)s Mathematical Reviews Database that has kept track of individual authors since its inception in 1940 (http://www.ams.org/mr- database/mr-authors.html). While this was originally a manual effort it is today assisted by computational means to a certain degree.

The recent success of Wikipedia, a community effort to establish an encyclopedia has lead to a series of spin-off projects and proposals that have community annotation of data at their core. WikiAuthors is such a proposal to collectively annotate and disambiguate science authors. Additionally, there have been several papers about algorithms do uniquely identify authors.

The data integration problem associated with the federation and usage of multiple scholarly databases might be best solved by using existing unique author/institutions/geolocations/etc. lists, and a merge of automatic data integration and manual data integration via a Wiki like approach.

Schedule:

Tuesday, August 29th, 2006

8:30pm

Meeting of the Workshop Organizers at Tutto Bene.

Wednesday, August 30th, 2006

8:00am	Light Breakfast
8:30am	Introduction by Participants lead by Katy Börner (5 min per person/organization)
10:00am	Break
10:15am	Introduction by participants (continued)
12:00pm	Lunch
12:30pm	Discussion of Opportunities and Challenges
2:30pm	Break
3:00pm	Potential Futures of Scholarly Data Acquisition, Management & Utilization Vision talk by Katy Börner, Indiana University Vision talk by Barend Mons, Knewco Co.
4:00pm	Discussion (Lead by Erik Erik van Mulligen)
6:00pm	Dinner at Little Tibet

Thursday, August 31st, 2006

8:00am	Light Breakfast
8:30am	Introduction to Wikipedia Ideas and Technology (Erik Moeller) Introduction to Author Name Disambiguation (Neil Smalheiser & Vetle Torvik)
10:00am	Break
10:30am	Software Demos - Scholarly Database by Gavin LaRowe and Sumeet Ambre - WiktionaryZ by Erik Moeller - CIShell by Bruce Herr - Scopus by Dannien Sherman - Network Workbench by Bonnie Huang - Biomedical Visualizations by Ketan Mane - Discovery Logic Tools by Mike Pollard
11:30am	Discussion of Challenges and Opportunities (Lead by Marc Weeber)
12:00pm	Lunch
1:00pm	Breakout Sessions (Lead by Miguel Andrade)
2:00pm	Breakout Session Reports: 1-standards, 2-funding, 3-community, 4-technical
2:30pm	Break
3:00pm	Committments & Discussion of Next Steps (Lead by Barend Mons)
4:00pm	Adjourn
6:00pm	Indian Dinner at Shanti
8:00pm	ExArt & Arthur Murray Dance Studio presents MARIA DE BUENOS AIRES http://buskirkchumley.org/

Friday, September 1st, 2006

9:00am	Semantic Tagging by Martijn Schuemie
10:00am	Scholarly Database by Gavin La Rowe & Sumeet Ambre
11:00am	Cyberinfrastructure Shell & Network Workbench by Bruce Herr and Weixia (Bonnie) Huang
12:00pm	Working Lunch and Discussion

Saturday, September 2nd, 2006

9:00am

Boat Tour and BBQ at Lake Monroe, Bloomington

Participants Attending:

Erik Möller

Editor of infoAnarchy. Webmaster of The Origins of Peace and Violence and Der Humanist. Lead developer of WiktionaryZ and Wikidata, former Chief Research Officer of the Wikimedia Foundation. Wikipedian since 2001 and one of the developers of the underlying MediaWiki software.
moeller@scireview.de
PR^2 | PPT

Ricardo Pietrobon

Assistant Professor of Surgery
Duke University.
pietr007@gmail.com
PR^2

Mike Pollard

Vice President, Discovery Logic.
mikep@DiscoveryLogic.com
PR^2 | PPT

Bruce Herr

Research Staff, Cyberinfrastructure for Network Science Center, School of Library and Information Science, Indiana University
bh2@BH2.NET
PR^2

Weixia (Bonnie) Huang

Senior System Architect, Cyberinfrastructure for Network Science Center, School of Library and Information Science, Indiana University.
huangb@indiana.edu

Sumeet Ambre

SLIS Master Student, Scholarly Database Developer, Cyberinfrastructure for Network Science Center, School of Library and Information Science, Indiana University.
sambre@indiana.edu
PR^2

Israel I. Lederhendler

Director, Division of Information Services, OER, OD
National Institutes of Health, DHHS
Bethesda, Maryland.
lederhei@od.nih.gov
PR^2

Thom Hickey

Vice-President, Bibliometrics Science-Metrix
Chief Scientist, OCLC Research.
hickey@oclc.org
PR^2 | PPT

Damien Sherman

Senior Product Manager, Scopus.
D.Sherman@elsevier.com
PR^2 |PPT

Gavin La Rowe

Research Assistant, Scholarly Database Team Lead, Cyberinfrastructure for Network Science Center, School of Library and Information Science, Indiana University.
glarowe@indiana.edu
PR^2

Neil Smalheiser

Assistant Professor, Department of Psychiatry, University of Illinois at Chicago.
smalheiser@psych.uic.edu
PR^2

James Pringle

Vice President, Development, Thomson Scientific.
james.pringle@thomson.com
PR^2

John Burgoon

Master Student, Cyberinfrastructure for Network Science Center, School of Library and Information Science, Indiana University.
jburgoon@indiana.edu
PR^2

Ketan Mane

Ph.D. Candidate, Cy berinfrastructure for Network Science Center, School of Library and Information Science, Indiana University.
kmane@indiana.edu
PR^2

Martijn Schuemie

Department of Medical informatics, Erasmus MC, Rotterdam, The Netherlands.
m.schuemie@erasmusmc.nl
PPT

Lokman I. Meh

Assistant Professor of Library and Information Science, SLIS, Indiana University. Researches citation analysis, digital libraries, and information access.
meho@indiana.edu
(will not attend 2nd half of Day 2)
PR^2

Ron Day

Associate Professor of Library and Information Science, SLIS, Indiana University.
Expert in the history, culture and political economy of information, documentation, communication, knowledge, and digital media. roday@indiana.edu

Jeff Krause

Professor of Chemistry, Quantum Theory Project, University of Florida & NSF Directorate on Theoretical and Computational Chemistry.

Interested But Cannot Attend:

Robert C. Bolander

Communications & Programs Manager, OCLC Research
bolander@oclc.org

Mark Claydon-Smith

Head of Information Systems Engineering and Physical Sciences Research Council, Polaris House, Mark.Claydon-Smith@epsrc.ac.uk

AnHai Doan

Professor of Computer Science, University of Illinois, Urbana,
anhai@cs.uiuc.edu

Paul Ginsparg

Professor of Physics, Cornell University, Created arXiv.org, ginsparg@cornell.edu

Timothy Hays

NIH - Division of Resource Development and Analysis (DRDA),
thays@mail.nih.gov

Janice Hicks

NSF Directorate for Mathematical & Physical Sciences - Chemistry
jhicks@nsf.gov

Zachary G. Ives

Assistant Professor of Computer & Information Science,University of Pennsylvania,
zives@cis.upenn.edu

Albert Mons

Chief Executive Officer, Knewco, Inc.
amons@knewco.com

André Skupin

Department of Geography, San Diego State University
skupin@mail.sdsu.edu

Alon Levy

Professor of Computer Science, University of Washington
alon@cs.washington.edu

Adam Jackson

Board Member Knewco, Inc
ajackson@argoglobal.com

Beth Plale

Computer Science Department, Indiana University
plale@indiana.edu

Vetle Torvik

Research Assistant Professor, Department of Psychiatry, University of Illinois at Chicago.
vtorvik@uic.edu
PR^2

Chris Rosin

President, Parity Computing, Inc.,
crosin@paritycomputing.com
PR^2 | PPT

Alex Soojung-Kim Pang

Ph.D., Research Director and Blogger-in-Chief, Institute for the Future
apang@IFTF.org

References

Authors, Authors: Thomson Scientific and Elsevier Scopus Search Them Out, Information Today, Inc. July 24, 2006
Automated Name Authority Control and Enhanced Searching in the Levy Collection, D-Lib Magazine. April 2001.
Thomson Scientific Announces Development of full suite of Authorship Tools, Thomson. June 8, 2006
Schijvenaars, B.J., Mons B., Weeber, M., Schuemie, M.J., van Mulligen E.M.,Wain H.M., Kors J.A. (2005). Thesaurus-based disambiguation of gene symbols . BMC Bioinformatics , 6(1), 149.
Jelier, R., Jenster, G., Dorssers, L.C., van der Eijk, C.C., van Mulligen E.M., Mons, B., Kors J.A. (2005). Co-occurrence based meta-analysis of scientific texts: Retrieving biological relationships between genes. Bioinformatics, 21(9), 2049-2058.
Torvik, V.I., Weeber, M., Swanson, D.R. and Smalheiser, N.R. (2003). A Probabilistic Similarity Metric for Medline Records: A model for Author Name Disambiguation. Journal of the American Society for Information Science and Technology, 56 (2). 140-158.
Han, H., Giles, C.L., Zha, H., Li, C., and Tsioutsiouliklis, K. (2004). Two Supervised Learning Approaches for Name Disambiguation in Author Citations. Proceedings of ACM/IEEE Joint Conference on Digital Libraries (JCDL 2004). 296-305.
Han, H., Zha, H., and Giles, C.L. (2005). Name Disambiguation in Author Citations using a k-way Spectral Clustering Method. Proceedings of the International Conference on Digital Libraries.
Newman, M. (2004). Coauthorship Networks and Patterns of Scientific Collaboration. Proceedings of the National Academy of Sciences, 101. 5200-5205.
Malin, B. (2005). Unsupervised Name Disambiguation via Social Network Similarity. Proceedings of the Workshop on Link Analysis, Counterterrorism, and Security, in conjunction with the SIAM International Conference on Data Mining. Newport Beach, CA. 93-102.
Dellavalle, R.P., Hester, E.J., Heilig, L.F., Drake, A.L., Kuntzman, J.W., Graber, M., Hester, E., Schilling, Lisa. (2003). Going, Going, Gone: Lost Internet References. Science 302. 787-788.
Wrenn, J.D., Grissom, J.E., and Conway, T. (2006). E-mail Decay Rates Among Corresponding Authors in MEDLINE. EMBO Reports, 7 (4). 122-127.
Bennet, D.B., and Williams, P. (2006). Name Authority Challenges for Indexing and Abstracting Databases. Evidence Based Library and Information Practice, 1 (1). 37-57.

Travel/Housing:

Bloomington has a number of hotels, but the most convenient is the Indiana Memorial Union on campus. A block of rooms have been held for this workshop. When you register with the hotel, please let them know that you are with the "Database Integration Workshop". All participants are responsible for their own transportation and accomodations.

Directions:

See the contact page for the Cyberinfrastructure for Network Science Center, http://cns.iu.edu/contact.html or contact Samantha Hale (ude.anaidni@elahjs).

Acknowledgments:

This workshop is sponsored by Knewco, Inc., the Digital Library Program of the Indiana University Libraries, the National Science Foundation under Grant No. CHE-0524661, and a James S. McDonnell Foundation grant in the area Studying Complex Systems entitled Modeling the Structure and Evolution of Scholarly Knowledge