Workshop on Scholarly Databases & Data Integration
August 30 and 31, 2006 (please see the agenda for details)
Workshop Goals & Agenda:
Read White Paper Draft and send comments to firstname.lastname@example.org.
In recent years, bibliographic databases have become ever more important in scientific research and science management. These databases are essential to the primary work of science - retrieving the correct literature and finding potential collaborators and competitors. But increasingly, data and text mining of these databases have become a science in itself with novel discoveries in a multitude of disciplines. For science management, the use of these databases has become essential in connecting new proposals to what is extant and to find reviewers for these proposals. Additionally, measuring success of both projects and individuals in science has become increasingly important, and bibliographic databases are the key resource.
Uniquely identifying authors is currently an unmet challenge in all freely accessible and searchable bibliographic databases (e.g., PubMed, CiteSeer, arXiv, Google Scholar). To our knowledge, the only extensive (non-free) bibliographic database that has uniquely identified authors is the American Mathematical Society(TM)s Mathematical Reviews Database that has kept track of individual authors since its inception in 1940 (http://www.ams.org/mr- database/mr-authors.html). While this was originally a manual effort it is today assisted by computational means to a certain degree.
The recent success of Wikipedia, a community effort to establish an encyclopedia has lead to a series of spin-off projects and proposals that have community annotation of data at their core. WikiAuthors is such a proposal to collectively annotate and disambiguate science authors. Additionally, there have been several papers about algorithms do uniquely identify authors.
The data integration problem associated with the federation and usage of multiple scholarly databases might be best solved by using existing unique author/institutions/geolocations/etc. lists, and a merge of automatic data integration and manual data integration via a Wiki like approach.
Tuesday, August 29th, 2006
|8:30pm||Meeting of the Workshop Organizers at Tutto Bene.|
Wednesday, August 30th, 2006
|8:30am||Introduction by Participants lead by Katy Börner (5 min per person/organization)|
|10:15am||Introduction by participants (continued)|
|12:30pm||Discussion of Opportunities and Challenges|
Discussion (Lead by Erik Erik van Mulligen)
Dinner at Little Tibet
Thursday, August 31st, 2006
|8:30am||Introduction to Wikipedia Ideas and Technology (Erik Moeller)
Introduction to Author Name Disambiguation (Neil Smalheiser & Vetle Torvik)
- Scholarly Database by Gavin LaRowe and Sumeet Ambre
- WiktionaryZ by Erik Moeller
- CIShell by Bruce Herr
- Scopus by Dannien Sherman
- Network Workbench by Bonnie Huang
- Biomedical Visualizations by Ketan Mane
- Discovery Logic Tools by Mike Pollard
|11:30am||Discussion of Challenges and Opportunities (Lead by Marc Weeber)|
|1:00pm||Breakout Sessions (Lead by Miguel Andrade)|
|2:00pm||Breakout Session Reports: 1-standards, 2-funding, 3-community, 4-technical|
|3:00pm||Committments & Discussion of Next Steps (Lead by Barend Mons)|
|6:00pm||Indian Dinner at Shanti|
|8:00pm||ExArt & Arthur Murray Dance Studio presents
MARIA DE BUENOS AIRES
Friday, September 1st, 2006
|9:00am||Semantic Tagging by Martijn Schuemie|
|10:00am||Scholarly Database by Gavin La Rowe & Sumeet Ambre|
|11:00am||Cyberinfrastructure Shell & Network Workbench by Bruce Herr and Weixia (Bonnie) Huang|
|12:00pm||Working Lunch and Discussion|
Saturday, September 2nd, 2006
|9:00am||Boat Tour and BBQ at Lake Monroe, Bloomington|
Editor of infoAnarchy. Webmaster of The Origins of Peace and Violence and Der Humanist. Lead developer of WiktionaryZ and Wikidata, former Chief Research Officer of the Wikimedia Foundation. Wikipedian since 2001 and one of the developers of the underlying MediaWiki software.
PR^2 | PPT
Professor of Chemistry, Quantum Theory Project, University of Florida & NSF Directorate on Theoretical and Computational Chemistry.
Interested But Cannot Attend:
Head of Information Systems Engineering and Physical Sciences Research Council, Polaris House, Mark.Claydon-Smith@epsrc.ac.uk
Department of Geography, San Diego State University
Professor of Computer Science, University of Washington
Computer Science Department, Indiana University
Alex Soojung-Kim Pang
Ph.D., Research Director and Blogger-in-Chief, Institute for the Future
- Authors, Authors: Thomson Scientific and Elsevier Scopus Search Them Out, Information Today, Inc. July 24, 2006
- Automated Name Authority Control and Enhanced Searching in the Levy Collection, D-Lib Magazine. April 2001.
- Thomson Scientific Announces Development of full suite of Authorship Tools, Thomson. June 8, 2006
- Schijvenaars, B.J., Mons B., Weeber, M., Schuemie, M.J., van Mulligen E.M.,Wain H.M., Kors J.A. (2005). Thesaurus-based disambiguation of gene symbols . BMC Bioinformatics , 6(1), 149.
- Jelier, R., Jenster, G., Dorssers, L.C., van der Eijk, C.C., van Mulligen E.M., Mons, B., Kors J.A. (2005). Co-occurrence based meta-analysis of scientific texts: Retrieving biological relationships between genes. Bioinformatics, 21(9), 2049-2058.
- Torvik, V.I., Weeber, M., Swanson, D.R. and Smalheiser, N.R. (2003). A Probabilistic Similarity Metric for Medline Records: A model for Author Name Disambiguation. Journal of the American Society for Information Science and Technology, 56 (2). 140-158.
- Han, H., Giles, C.L., Zha, H., Li, C., and Tsioutsiouliklis, K. (2004). Two Supervised Learning Approaches for Name Disambiguation in Author Citations. Proceedings of ACM/IEEE Joint Conference on Digital Libraries (JCDL 2004). 296-305.
- Han, H., Zha, H., and Giles, C.L. (2005). Name Disambiguation in Author Citations using a k-way Spectral Clustering Method. Proceedings of the International Conference on Digital Libraries.
- Newman, M. (2004). Coauthorship Networks and Patterns of Scientific Collaboration. Proceedings of the National Academy of Sciences, 101. 5200-5205.
- Malin, B. (2005). Unsupervised Name Disambiguation via Social Network Similarity. Proceedings of the Workshop on Link Analysis, Counterterrorism, and Security, in conjunction with the SIAM International Conference on Data Mining. Newport Beach, CA. 93-102.
- Dellavalle, R.P., Hester, E.J., Heilig, L.F., Drake, A.L., Kuntzman, J.W., Graber, M., Hester, E., Schilling, Lisa. (2003). Going, Going, Gone: Lost Internet References. Science 302. 787-788.
- Wrenn, J.D., Grissom, J.E., and Conway, T. (2006). E-mail Decay Rates Among Corresponding Authors in MEDLINE. EMBO Reports, 7 (4). 122-127.
- Bennet, D.B., and Williams, P. (2006). Name Authority Challenges for Indexing and Abstracting Databases. Evidence Based Library and Information Practice, 1 (1). 37-57.
Bloomington has a number of hotels, but the most convenient is the Indiana Memorial Union on campus. A block of rooms have been held for this workshop. When you register with the hotel, please let them know that you are with the "Database Integration Workshop". All participants are responsible for their own transportation and accomodations.
This workshop is sponsored by Knewco, Inc., the Digital Library Program of the Indiana University Libraries, the National Science Foundation under Grant No. CHE-0524661, and a James S. McDonnell Foundation grant in the area Studying Complex Systems entitled Modeling the Structure and Evolution of Scholarly Knowledge