Past CNS Talks
Avalanche Dynamics of Online Popularity
Abstract: Traditionally, information and opinions were filtered and amplified by two classes of trusted intermediaries: institutional media and our social networks of friends and family. The advent of social media is disrupting these mechanisms by fostering Web-mediated brokers such as blogs, wikis, folksonomies, and search engines, through which anyone can easily publish and promote content online. This "second age of information'' is driven more than ever before by the economy of attention. Popularity (the accumulation of attention) is its measure of success; popular sources have formidable power to impact opinions, culture, and policy, as well as profit through online advertising. Yet the dynamical processes that drive popularity in our online world are still unclear and largely unexplored. Here we provide for the first time a quantitative, large scale, longitudinal analysis of the dynamics of different popularity measures for online content. We analyze the evolution of two massive model systems, the Wikipedia and an entire country's Web space, finding that the temporal and magnitude behaviors of popularity follow statistical laws typical of critical avalanche processes, such as earthquakes and depinning phenomena. Such statistical features hold across measures, systems, and their histories. To make sense of these empirical results, we offer a model that mimicks with a simple random mechanism the exogenous shift of user attention and the ensuing non-linear perturbations in the popularity ranking of online resources. Remarkably this stylized model recovers the key features observed in the empirical analysis of the two model systems analyzed here.
Joint work with Jacob Ratkiewicz, Santo Fortunato, Alessandro Flammini, and Alessandro Vespignani.
An Economic Model of Friendship: Homophily, Minorities and Segregation
Exposing Social Interactions with Active RFID
Abstract: The availability of networked wearable devices is providing new ways to expose mobility and interaction patterns of individuals. In this talk we discuss how the OpenBeacon active RFID platform (http://www.openbeacon.org) was used to create a distributed system that achieves reliable detection of face-to-face interactions between individuals. We provide some details on the hardware platform, on the contact detection strategy, as well as on the real-time visualization of the contact networks we measure. We subsequently report on recent experiments involving 50-100 people at conference gatherings. We discuss the longitudinal analysis of the contact network and show several striking regularities of social contact that emerge from our data. We close by pointing to directions for future research and illustrating a few upcoming large-scale experiments.
See also http://www.sociopatterns.org/.
Illuminating the Fine Print: Visualizing Medication Side-Effects in Complex Multi-drug Regimens
Abstract: Prescription drug use has increased markedly over the past several decades, with nearly half of Americans over 65 taking at least five medications daily. As these numbers grow, physicians are faced with the increasingly complex task of recognizing and addressing any adverse reactions associated with these treatments. Although information on drug side-effects is readily available, its sheer volume can be daunting: The average drug label contains over 75 potential reactions, and there are many drugs that report well over two-hundred. In this talk, I will discuss the development of an electronic tool which synthesizes adverse reaction data and provides doctors with clinically useful visualizations at the point of care. We will cover many of the challenges in creating such a system, including integration of quantitative and qualitative data, evaluation and iteration of the visualization approach, and actual implementation into physician workflow.
Faculty, Staff, and Students
Abstract: Open your laptops and demo your software. Bring posters to introduce your research questions and results. So far, the following posters and demo's are planned:
—Longitudinal Analysis of Mobility within the American Legal Academy-1922 to 1989: Visualizations, Network Dynamics, Trends, and Emergent Hierarchies by Peter A. Hook
—Preserving e-Science - Perceptions from the Lab Floor by Stacy Kowalczyk
—Stencil: A Declarative, Generative System for Visualizing Streaming Data by Joseph Cottam and Andrew Lumsdaine
—Effects of Social Organization on Information Transfer in a Zebrafish Group by Cuau Vital & Emilia P.Martins
—TBD by Margaret Clements
See also IV/CNS Open House web site at http://ella.slis.indiana.edu/~katy/gallery/08-openhouse/
InPhOrmed Philosophy: Combining Text Mining and Expert Judgments
Abstract: This talk describes ways of managing the varying expertise of people who supply input to the Indiana Philosophy Ontology (InPhO). Although we exploit some features of Web 2.0, our word is not flat. There are different communities with different levels of expertise. It is very important for academic project on the Web to protect the expert assets. One has to ensure that expert-generated content is something that cannot be messed up and the resulting product is one in which the experts will take pride and feel invested. Simultaneously it should possible to use these expert assets to ground more speculative or experimental applications of the data. Public participation can also be used in various ways to help leverage the expert assets. The way to manage this is to keep stratified data, track who is who, where they are coming from, and what kind of reliability they have on various topics. Then, one can use software to find structure in the data, and use the structure to collect the feedback. This feedback can be used to generate yet more structure. In this iterative fashion we hope to realize the full potential that Web 2.0 really has for scholarly disciplines such as philosophy.
Multiple-Scale Visualization and Modeling of Biological Networks/Pathways
Abstract: Tools for mining and visualizing cell systems has moved beyond static pictures of networks and links, and now capture functional hierarchies and adaptive networks. Integrative frameworks play a critical role in meeting these challenges. The concept of multiple information scales—for example, the protein molecules that form a complex, the participation of that complex in a pathway, the emergence of phenotype from this pathway, and so on—is central to formulating a global view of network dynamics. Here we present a new graph structure –the metagraph, that is able to integrate the context (temporal or modular activity) and the hierarchical organization of cellular agents (molecules and complexes with associated properties and states) in addition to their interactions, with increased performance and network readability. The features of this new type of graph, as well as its applications and implementations in VisANT will be discussed in detail. VisANT is freely available at http://visant.bu.edu.
See also his paper entitled Towards zoomable multidimensional maps of the cell.
Semantic Web Application: Music Retrieval
Abstract: The vision of the Semantic Web is to lift current Web into semantic repositories where heterogeneous data can be queried and different services can be mashed up. The Web becomes a platform for integrating data and services. Ontology or agreed consensus is the key issue to achieve that. Especially in cultural heritage area, cross-media and cross-archival retrieval turn out to be the slogan in this area. The EASAIER project (European Union funded) aims to enable enhanced access to sound archives by providing multiple methods of retrieval, integration with other media archives and content enrichment. During this talk, I will share with you the development of this project.
Topic Mapping Tools for Biomedical Corpora
Dave Newman, Gully Burns, and Bruce Herr II
Abstract: Biomedical research is supported by several medium-scale corpora that provide a uniform interface to biomedical documents across many subdisciplines. These include (A) Medline (providing access to citations and abstracts for almost all papers published from the 1950s to present), (B) CRISP (providing access to abstracts from NIH-funded proposals); (C) abstracts from specific large-scale conferences (such as the Society for Neuroscience) and (D) full-text collections (such as ScienceDirect from Elsevier). This presentation describes an approach to the analysis of the text within these corpora to provide navigable web-based maps of these document collections that are intuitive and easy to navigate for end-users (administrators, biologists, and doctors). The underlying analysis is based on Latent Dirichlet Allocation (Topic Modeling) which scales well to collections of millions to billions of documents. We use large-scale graph visualization techniques to build a map of the documents within a corpus using a scalable force-directed layout algorithm. This map then forms the basis of a Google Maps user interface that has additional web-support to describe individual documents within the collection. This work (currently supported by NIH) will create tools that allow scientists to evaluate grants from the CRISP database.
Bibliometric Visualizations and Relevance Theory
Abstract: This session will introduce my recent work with bibliometric data and a version of the tf*idf formula from information retrieval to produce pennant diagrams. These diagrams can be interpreted as confirmations of Dan Sperber & Deirdre Wilson's relevance theory from linguistic pragmatics, which Stephen Harter of Indiana University advocated for information science in 1992. The diagrams can also be interpreted as a complex cognitive model of users of document retrieval systems. My examples will include several pennants chosen to be of interest to members of IU's School of Library and Information Science.
Integrating Scientific Knowledge Databases to Inform Policy Decision Making
Abstract: To fulfill its public health mission, the National Institutes of Health (NIH) need to develop a process to correlate the research it funds with public health outcomes. However, NIH staff have few tools for evaluation of research project outcomes that can be applied across a portfolio of awarded grants. Various individual outcome measures are available through private and public databases, but calculating a broad set of indicators for a portfolio of projects requires a tedious and time-consuming manual effort that is rarely applied. We set out with the goal to develop an electronic scientific portfolio assistant to provide quantitative information for program officers managing research portfolios. We found that it was possible to meet this goal by assembling diverse data sources into a single infrastructure and applying modern business intelligence reporting tools to visualize outcome indicators.
The Semiology of Graphics - Take 2
Abstract: The famous cartographer Jacques Bertin wrote a classic book titled the "Semiology of Graphics" in 1967. In this book, he analyzed many different types of charts, network diagrams, and maps, and then developed a systematic description of how information is coded in these visual representations. His goal was to describe pictures in terms of the conventions used to depict the information, not in terms of low-level graphics primitives. In this talk I will review Bertin's ideas, and then describe several recent attempts to formally specify information graphics using computers. The formal approach leads to a language of pictures. We have used visual languages to build two major visualization systems, Polaris and Tableau. Formally describing pictures leads to new capabilities including easy integration with database query languages such as SQL, the ability to describe statistical linear models, and new methods for automatically creating graphical presentations best suited to the data.
Visualization and Analysis of Biological Interaction Networks
Abstract: Cytoscape is an open-source, cross-platform network visualization and analysis application. Cytoscape has it's roots in Systems Biology and is therefore well suited for analyzing data from high-throughput experimentation as well as other molecular state information. The central organizing metaphor of Cytoscape is a network (graph), with genes, proteins, and molecules represented as nodes and interactions represented as edges between nodes. The Cytoscape application acts as an extensible framework by providing core functionality to handle common tasks and software interfaces that allow easy extension to support unique needs. The core functionality includes the visualization, layout, and manipulation of networks in addition to data handling services needed for importing, exporting, and managing network data. Cytoscape's raison d'etre is its ability to integrate data and map it onto visual attributes of the networks. This functionality allows for rich visualizations that can provide insight into otherwise complicated data. In addition to the core functionality we have an ever growing library of plugins that extend and enhance Cytoscape's abilities. Cytoscape is open source (free) and is a collaborative effort of the University of California San Diego, the Institute for Systems Biology, Memorial Sloan-Kettering Cancer Center, Agilent Technologies, Unilever, the Institut Pasteur, University of California San Francisco, and the University of Toronto. See http://cytoscape.org for downloads and more detail.
Representational Redescription: From collective computation in cellular automata, to understanding the conceptual properties of gene regulatory networks
Manuel Marques Pita
Abstract: In this seminar, I will introduce Aitana, a cognitively-inspired system that "redescribes" models of complex systems (represented in their implicit form, as state-transition tables). The main use of such redescriptions is to "uncover" the conceptual properties of these models, which reveal knowledge about them that is not accessible on the implicit representational level. The aim of this exploration is to support new ways of conceptualising the phenomenon of emergence, the main characterising feature of complex systems in general. Here, I will focus on exemplar cellular automata (CA) that perform the density classification task (as defined by Mitchell et al., 1994). Conceptual representations of the best known rules for this task will be presented, and I will show how the resulting abstractions can be considered suitable for the formation of "conceptual spaces", wherein rules that perform similar computations are positioned in close proximity. I will end this seminar discussing, and demonstrating, how Aitana could be used to analyse random Boolean networks, a common architecture used to model gene regulatory networks (GRNs).
The Integration of GIS and Agent-Based Modeling
Abstract: The parallel growing interest in Agent-Based Modeling (ABM), on one hand, and Geographic Information Systems (GIS), on the other, calls for platforms that would integrate them both. Current GIS and ABMS software do not support each other in a seamless manner, and an integrated platform that would support both is needed. In this talk, I introduce Agent Analyst, an extension of ArcGIS, the popular GIS software, which supports ABM. Agent Analyst fully integrates ABM with GIS, and extends the functionalities of the open-source Repast modeling and simulation environment with the spatial capabilities of ArcGIS. Through this integration, GIS experts gain the ability to model behaviors and processes as change and movement over time (e.g., simulate land use and land cover changes, predator-prey interactions, or network flows and congestion) while ABM modelers are able to incorporate detailed real-world environmental data, perform complex spatial analyses, and study how behavior is constrained by space and geography. Furthermore, ABM models can include real-time GIS data feeds for situations such as disaster management, firefighting, or resource management . To illustrate these ideas, I present a few models developed in Agent Analyst.
Citation Counting, Citation Ranking, and h-Index of Human-Computer Interaction Researchers: A Comparison between Scopus and Web of Science
Abstract: This study examines the differences between Scopus and Web of Science in the citation counting, citation ranking, and h-index of 22 top human-computer interaction (HCI) researchers from EQUATOR, a large British Interdisciplinary Research Collaboration project. Results indicate that Scopus provides significantly more coverage of HCI literature than Web of Science, primarily due to coverage of relevant ACM and IEEE peer-reviewed conference proceedings. No significant differences exist between the two databases if citations in journals only are compared. Although broader coverage of the literature does not significantly alter the relative citation ranking of individual researchers, Scopus helps distinguish between the researchers in a more nuanced fashion than Web of Science in both citation counting and h-index. Scopus also generates significantly different maps of citation networks of individual scholars than those generated by Web of Science. The study also presents a comparison of h- index scores based on Google Scholar with those based on the union of Scopus and Web of Science. The study concludes that Scopus can be used as a sole data source for citation-based research and evaluation in HCI, especially when citations in conference proceedings are sought, and that h scores should be manually calculated instead of relying on system calculations. The full version of the papers is available at: http://www.slis.indiana.edu/faculty/meho/meho-rogers.pdf.
Soul Mate or Chance Fate: Success Rates of Speed Dates
Peter Todd and Thomas T. Hills
Abstract: Theories of mate choice have suggested that human mate selection may be driven by a variety of factors, ranging from finding similar quality partners to finding partners who share similar preferences. Within a more restricted mate search context, such as speed-dating, it may be that these factors do not come into play. In fact, it may be that speedy human mate choice is a largely random process, governed by the simple laws of probability. To find out, we analyzed over 100 speed-dating sessions to see whether the matches that are produced, when both a man and a woman indicate interest in each other, occur any more often than we would expect if choices were made at random. We also looked at whether the set of people that any given individual was interested in had anything in common with the interest-sets of other individuals, or whether the range of choices also appeared random at an individual level. We will discuss the outcome of these analyses and what they imply for human behavior in this domain.
On the origin of novelty and diversity in horned beetles
Abstract: Evolutionary biology offers several frameworks for understanding how complex traits such as legs, eyes or wings may be modified over evolutionary time. However, we know remarkably little about how such traits might originate in the first place. What are the genetic, developmental, and ecological mechanisms, and the interactions between them, that mediate not just the modification of existing traits, but the origin of novel traits and trait diversity? In my talk I explore the genetic, developmental, and ecological underpinnings of a class of traits that is both novel and highly diverse: beetle horns. Several thousand species of beetles express horns, and dramatic variation in size, location, shape, and number of horns exist both within and between species. Most importantly, beetle horns lack obvious homology to other insect traits, instead they can be viewed as a novel feature that horned beetles invented during their evolutionary history, and which has since undergone one of the most dramatic trait radiations in the animal kingdom. Using a combination of morphological, developmental, genetic and genomic studies I explore the evolutionary origins of horns, as well as the mechanisms that mediated the subsequent diversification of horn expression across spe
Domain-Specific Search and the Encyclopedic Internet Vision
Abstract: As a learning tool, the Internet holds the promise of instant access to all knowledge everywhere. In practice, however, this is far from coming about, in many ways for political and economic reasons, as outlined in the debates about open access publication and how money should be circulated in the commerce of ideas. There are, however, technological issues that also must be solved. These include developing tools to organize massive data collections in ways that are meaningful to users and fair to the content of the resources themselves, to determine the quality and reliability of these resources, and to provide a workable interface so that users can quickly find what they need, even in cases where they are unaware that the needed resources exist. To add complication, the fact that Internet resources may change quickly, can be added in parallel without detection or can disappear altogether makes human intervention in this process ineffective. Thus, the need for automated procedures of emergent quality control and organization. In this talk, I will address a two-pronged approach to these problems that involves, on the one hand, the design of an appropriate search space, and, on the other, mechanisms for formulating meaningful search requests across this space. Finally, I will acknowledge the scope and importance of the 'interface problem' indicated above without offering a solution for it. For examples, I will draw on two projects in particular, Noesis: Philosophical Research Online and the Indiana Philosophy Ontology Project.
Abstract: The Network for Computational Nanotechnology (NCN) was established in 2002 by the NSF with a mission to create, deploy, and operate a national resource for theory, modeling, and simulation in nanotechnology, to connect users in research, education, design, and manufacturing. Nanotechnology is a broad field, so the NCN has focused its efforts on developing materials for a few focus areas: nanoelectronics, nanoelectromechanical systems, and nanomedicine. Users access these resources from the nanoHUB.org web site. In the 12-month period from October 2006 to September 2007, more than 26,000 users accessed nanoHUB to view a collection of seminars, tutorials, animations, publications, and simulation tools submitted by more than 390 contributors from all over the world. But the nanoHUB is more than just a repository. It is a place where researchers and educators can meet and accomplish real work. The nanoHUB offers integrated, online web meetings via Macromedia Breeze, source code collaboration through its nanoFORGE area, events calendars, and many other services designed to connect researchers and build a community. Most importantly, the nanoHUB connects users to the simulation tools they need for research and education. Users can access more than 50 interactive, graphical tools, and not only launch jobs, but also visualize and analyze the results-all via an ordinary web browser. In the same 12-month period mentioned earlier, more than 5,900 users performed over 226,000 online simulations. The NCN's emphasis on usability has produced a clean interface that makes it easy to use powerful research tools. Although simulation codes can be accessed through a web browser, they are executed on state-of-the-art computational facilities. The nanoHUB has partnered with the TeraGrid and the Open Science Grid to deliver the computational cycles needed by the growing community of nanoHUB users. The nanoHUB middleware hides much of the complexity of Grid computing, handling authentication, authorization, file transfer, and visualization, and letting the researcher focus on research. This approach also helps educators bring these tools to the classroom, letting them avoid the complexities of Grid computing and focus instead on physics. This talk will start with a live demonstration of nanoHUB and show how it can be used to support collaborative research and educational activities for nanotechnology development.
The Social Dynamics of Online Networks
Abstract: Social scientists routinely collect stores of individual-level data, using surveys and records kept by governments and employers. These data are then aggregated across groups of varying size, from households to nation states. In comparison, we have very limited data about the interactions between people. Social interactions are fleeting and mostly private, making them hard to capture and arduous to hand-code and record. These problems are compounded by the need for repeated observations and by the exponential increase in the number of relations as group size increases. As a result, social dynamics are not systematically documented at the relational level, except in observational and ethnographic studies of small groups. All this is rapidly changing as human interactions move increasingly online, leaving digital records that allow automatic data collection on an unprecedented scale. However, social scientists have been reluctant to embrace the study of what is often characterized as the "virtual world," as if human interaction somehow becomes metaphysical the moment it is mediated by information technologies. While great care must be exercised in generalizing to the offline world, the digital traces of computer-mediated interactions open a window on aspects of social life that have been previously hidden from view. The detailed records of interaction in online communities are unique in human history, providing an exceptional opportunity for research on the formation of communities, a broad topic that includes research questions ranging from recruitment of new members to the emergence, spread, and enforcement of norms. Are new members influenced to join primarily through commitment to shared goals or do they tend to be pulled in by friends who have already joined? Are people influenced more by strong ties to close friends who also know one another, as in a tightly clustered network, or are they influenced more by weak ties to acquaintances who do not know one another and thus are more likely to have access to non-redundant information? Are people and organizations attracted to similar others, or does similarity lead to competition and rivalry? Does the dynamic of influence and attraction lead to cultural convergence or differentiation? These are some of the questions for which we are beginning to get answers from data collected from online networks.