The GENESIS Simulator-based Neuronal Database
NEUROINFORMATICS: AN OVERVIEW OF THE HUMAN BRAIN PROJECT
Lawrence Erlbaum Associates, Mahwah, NJ, 57-81 (1997)
EDITORS: S.H. Koslow and M. F. Huerta
D. Beeman1, J.M. Bower2, E. De Schutter3, E.N. Efthimiadis4, N. Goddard5, J. Leigh6
1. Department of Electrical and Computer Engineering, University of Colorado at Boulder
2. Department of Biology, California Institute of Technology
3. Born Bunge Foundation, University of Antwerp - UIA, B2610 Antwerp, Belgium
4. Department of Library and Information Science, University of California at Los Angeles
5. Pittsburgh Supercomputing Center
6. Electronic Visualization Laboratory,University of Illinois at Chicago
As the complexity and amount of neurobiological data continues to increase, neuroscientists are at risk of being inundated with huge amounts of data of unknown accuracy or relevance for the task of understanding the workings of the brain. Below, we present a new approach towards overcoming inherent limitations in conventional databases including: (a) the accuracy and relevance of the data entered; (b) the problem of conflicting data; (c) data compression; (d) promoting participation in development and use of the database; and (e) the connection between the data and its functional significance.
With support from The Human Brain Project, we are exploring the construction of a brain database based on our existing neural simulation system, GENESIS. GENESIS (the GEneral NEural SImulation System) has been developed as a research tool to provide a standard and flexible means of constructing realistic simulations of biological neural systems. It currently serves as the basis for both instruction and research in a growing number of institutions around the world. The GENESIS object oriented database project involves a novel approach to neural database construction, organization, and interaction. The intention of the project is to develop software tools to make the wealth of information already accumulated within GENESIS about the structural organization of the nervous system more easily accessible and more generally available to both modelers and other neurobiologists.
We believe that a useful neuroscience database should address certain educational needs, as well. Students, as well as experienced researchers who are interested in extending their research into new areas, face a formidible challenge when seeking to acquire "state of the art" understanding of a particular system. Such a database should be useful for both unsophisticated users interested in understanding more about a particular structure or area, and for more sophisticated modelers and experimentalists exchanging information. Thus, our design focuses on methods of providing users at all levels of experience with a convenient means to achieve a basic understanding of what is known in a particular area, and the opportunity to connect to centralized, coordinated information at any level of depth. This also serves the purpose of educating neurobiologists in the use of new tools, including modeling.
Informatics Needs in Computational Neuroscience
The last several years have seen a tremendous growth in the use of computer modeling within neurobiology (Bower, 1992). As experimental data continues to amass, it is increasingly clear that detailed physiological and anatomical data alone are not enough to infer how neural circuits work. This combination of modeling and experimental work has led to the creation of the new discipline of computational neuroscience (Eeckman \& Bower, 1993). As computer power continues to grow, it is inevitable that the size and detail of neurobiological models will also continue to expand. This poses new problems to be solved for those working in computational neuroscience, and new challenges for informatics research.
As the complexity of constructed models, and we would claim, their predictive power continue to increase, it becomes more clear that efficient and informed flow of information between modelers and experimentalists will be increasingly important. Traditionally, modelers have obtained the information they need to construct models through the published literature, while those experimentalists interested in modeling have also had to understand models through published accounts. Even for those still relatively small numbers of neurobiologists who combine modeling and experimental work, a great deal of information is still obtained through the printed literature. However, it is already clear that this awkward form of information exchange will not be able to support the continuing growth in model sophistication (Bower & Koch, 1992) or the potential for modeling to inform experimental research (Hasselmo & Bower, 1993). Instead, we must explore new means of transferring information about the nervous system and nervous system related models.
User survey of informatics needs
During the first year of our participation in the Human Brain Project, we carried out a survey of the informatics needs of practicing neurobiologists, with an emphasis on the needs of those who carried out modeling studies in addition to experimental research. This was achieved with laboratory meetings with neurobiologists, and their responses to a hypertext electronic questionaire administered over the World Wide Web. From these questionaires we were able to isolate a number of tasks that the neurobiologists agreed were an accurate characterization of their research process. These tasks include:
- Experimentation - Electrical activity from neural tissue is recorded and the amplified signals are digitized and stored on a hard disk. Each recording session is stored in a separate flat file.
- Simulation - A simulation is constructed, based on experimental data which is either obtained directly or taken from the literature. In the case of single cell modeling, this consists of building a neuron out of a number of discrete compartments and tuning the properties of the compartments, based on the experimental data. This often involves the addition of of a number of ionic channels which are known to exist in the neuron, and the adjustment of the parameters which characterize their behavior. The simulation results are then compared with other experimental findings in order to validate the model.
- Data Analysis - Experimental or simulation data is translated to formats that can be imported into statistical analysis packages. Typically, these translation programs must be written by the researcher. Additional visualizaton tools may also be applied in order to gain greater insight into the data.
- Journal Research - This is similar to research that goes on in most scientific disciplines. Journals and books are searched for relevant material. References are tracked to obtain further relevant information. Photocopies may be made for particularly relevant material which may include graphical as well as textual data. For example, it may include plots of experimental data which will be used to characterize ionic channels which will be incorporated in the model.
- Personal Note Taking and Data Management - Personal notes and notebooks are kept to log experimental techniques and map the search space of possible research paths. Additional notes consist of diagrams, graphs, equations, simulation parameters for experimental and simulation work. Data management involves the storage of data from experiments, simulations and results of analyses, in flat files for later use.
- Publishing Results - Results are published through standard peer-reviewed mechanisms such as journals and conference proceedings. Various image conversions are performed to transfer graphs from analysis packages and drawing programs to word processors. Bibliography lists have to be compiled.
The following is a compilation of what the neurobiologists felt would be useful to their research:
- It would be useful to integrate the simulation system, and data from experiments, with a data analysis and visualization system. That is, the neurobiologists want a set of tools to convert data from their experiments and simulations to formats that can be read by their analysis and visualization tools.
- It would be useful to maintain an electronic library of neuroscience literature from which some of the parameters used in modeling work can be systematically collected to allow easier access and comparison.
- It would be useful to have a means of maintaining a large collection of modeling data. Multiple simulation models for perhaps the same neuron should be stored to allow comparison. Simulation parameters should be retrievable for other simulation work. Also, it is important that models stored in this collection have references to the literature and information on the experimental techniques from which data were derived. There should be an automatic mechanism for linking modeling data with experimental data and information found in electronic copies of neuroscience literature.
- It would be useful to maintain a personal database of notes, diagrams, equations, experimental and modeling data.
Challenges Presented by Brain and Behavioral Sciences
The brain and behavioral sciences provide difficult challenges to computer scientists, in areas of information retrieval and database technology, simulation systems, visualization tools, and high performance computing. These challenges stem from two fundamental characteristics of the data involved in brain and behavioral sciences. First, the quantity of collectable (and collected) data is undergoing exponential increase. This increase is due to rapidly developing technologies for data acquisition including imaging modalities such as functional magnetic resonance imaging (fMRI) and electrical sensing techniques such as multi-electrode mult-unit recording. Second, the data collected in these sciences and the models constructed from them are extremely heterogenous in nature. This is largely due to the underlying complexity of the systems being examined and the huge ranges of distance (e.g., molecular to organ) and time (microseconds to years) that are of interest. There are few other areas of science that exhibit both of these two characteristics to such an extreme.
These characteristics impose difficult requirements for many areas of informatics research. Information retrieval and database technologies are just beginning to seriously address extremely heterogenous data. In part, the development of object oriented database systems (Grossman & Qin, 1993) has been motivated by the need for databases that can manage heterogeneous data. There has been a large effort in creating simulation systems (including the GENESIS simulator) that provide a modeling environment capable of handling the complex models developed in the brain and behavioral sciences. Visualization techniques are particularly difficult for these sciences. It is often not appropriate to view the data as a multi-dimensional block, since we already know that the systems generating it are highly structured. Scientists are interested in viewing the data at spatial scales spanning many orders of magnitude, as described above, and often in viewing time series at multiple temporal scales.
In common with other sciences that generate large quantities of data that must be analyzed and modeled, behavioral and brain sciences force the development of terabyte scale file systems and computer hardware and software that to match. Multi-resident distributed file systems and RAID arrays are two examples of technogies developed to meet the need for large scale storage. A particularly interesting current development is large scale parallel machines and software. Although parallel hardware is being developed for technical reasons, the range of software available to program it will be significantly extended by the demands of the behavioral and brain sciences. The heterogeneity of the data, the range of spatial and temporal scales that must be accomodated, and the volume of data to be dealt with, mean that sophisticated parallel programming tools will need to be developed (Feldman, 1995).
A Simulation-Based Neural Modeling Database
The premise of our solution to the problems and needs described above is that as model sophistication grows, the models themselves become a form of information storage about the nervous system. We believe that such models can serve as a particularly effective mechanism for inter-investigator communication. Like a database, models contain detailed information about a particular phenomenon (e.g. the features necessary to generate a particular neuronal response pattern). Unlike a traditional database, however, models also contain precise information about the relationships between the known facts. Further, by running a simulation, a model can, in effect, internally check the accuracy of the information used to construct it. Finally, modeling results can actually direct the acquisition of the additional information necessary to expand the model and thus the database.
We believe that by providing a means of storing, representing, and transferring information about the nervous system, the increased use of models has the potential to change the structure of communication and understanding within neuroscience itself. This approach deals with the limitations of conventional databases in the following ways:
Knowing where to start
When using a database, it is often difficult to know what sort of initial queries to make. At the beginning of a new research project, a good review article on a particular area can often provide the needed overview and a suitable list of references. However, assuming that such a paper exists, it may be outdated, and any omissions may not be apparent. By using simulation-based tutorials as an entry point to the database, it is possible to explore the subject at many levels of depth (Beeman, 1994; Bower & Beeman, 1994). This serves the needs of the researcher as well as that of the student. Links to remote sites provide updated information and references. By directly experimenting with the model, the user can quickly spot gaps in what is known about the system and identify fruitful areas for additional research. If the intention is to carry out a modeling study, existing simulations or simulation components may be extracted from the database to be used as a starting point.
Accuracy of data
One of the most difficult aspects of constructing a database is assuring that all of the data entered is accurate. Usually, this requires a moderator capable of certifying the quality of the data. The more inaccurate the stored data, the less useful the database becomes. In the current case, the data from which the database will be derived is contained in numerical simulations already shown to be capable of replicating specific types of brain activity. While determining the absolute accuracy of a particular model or model parameter is a complex process (see below and Bhalla & Bower, 1993), the user of the database at least knows that values are within an appropriate range for the specific behavior modeled. In this way the simulations themselves represent an internal check on the accuracy of the data in the database.
Relevance of data
Another issue related to the quality of stored data involves the question of what data is most relevant and therefore useful to our knowledge of a particular system at a particular time. In other words, just because data can be collected does not necessarily mean that it will help expand our current understanding of a particular neural system. However, if the data has been included in a functioning biological model and can be demonstrated as necessary to produce the desired output (c.f. Bhalla & Bower, 1993), then some basis for relevance can be established. Furthermore, by exploring a particular model it is possible to determine exactly how the information is relevant to that model's output.
Extending the database
While models can help determine the relevance of a particular datum at the moment, they can not rule out the value of a particular type of data in the future. In fact, in our experience, one of the major benefits of modeling is to demonstrate what data must be collected next (Bower, 1991). A simulation based database therefore not only reveals the relevance of existing data, but can also highlight the data that must now be obtained. In this case the potential relevance of new data is indicated even before the data is present in the database.
It is often the case that data obtained by different experimentalists conflicts or appears to conflict. Not infrequently these conflicts are difficult to resolve at first glance. Modeling often not only highlights these conflicts, but can also potentially provide an opportunity for their resolution (c.f. De Schutter & Bower, 1994a,b). Thus, not only do models point out data conflicts, they can also help to resolve the conflict, thus serving again, as a check on the veracity of the data in the data base.
Connectedness of data
In many database efforts, it is difficult to know the relationship between data of different types. With a simulation-based database, the relationship between different types of data is apparent in the simulation. Thus, it is possible to make a direct connection between the distribution of ion channels and the diameter of dendrites for example, even if this particular comparison had not previously been anticipated by the database designer.
Many databases suffer from awkward or inflexible display of the data. While tabular presentation is usually relatively easy to set up, it is usually also the least informative way of looking at data. However, it is not always clear what the correct graphical presentation form is for a particular type of data. In contrast, with a simulation-based database, the optimal presentation of the data can often be determined by the structure of the model itself. For example, data concerning the distribution of a particular ion channel in pyramidal cells is already organized by the model into a 3-D image of the cell itself.
Compactness of the data
Ultimately, simulations represent the most compact form of data possible. For example, in principle, a simulation capable of replicating all the features of the biological system it mimics could reconstruct whatever dataset is of interest from first principles. In this case, raw data would not need to be stored at all. While we DO NOT anticipate that this will happen any time soon, it does illustrate the fact that models can be, in effect, a very compact means of representing data. For example, a correctly parameterized Hodgkin-Huxley model can, in principle, contain all the information necessary to reconstruct voltage and current clamp records for a single population of channels.
Functional significance of the data
Ultimately, the objective of the Human Brain Project is not simply to store massive amounts of data, but to contribute to our understanding of its significance for human brain function. However, with large and growing databases of the most common type, it is not at all clear how the data is eventually assembled to create some functional understanding. In the current case, however, the basis for exploring the functional implications of the data are built into the design and construction of the database itself. The models that serve as the point of entry for the data are also tools that can be used to understand its significance. As the models become more sophisticated, so does the representation of the data. As the models become more capable, they extend our ability to explore the functional significance of nervous system structure and organization. Thus, there is a direct link between the ultimate objective of acquiring the data and the data acquisition process itself.
Genesis As a Database Foundation
As we have described above, a realistic simulation is a way of efficiently encapsulating knowledge about a neural system. Specifically, we propose to develop a database using GENESIS, the general neural simulation system that has been developed over the last eight years at Caltech, as a foundation. This simulation system already contains a great deal of information about the structure, organization, and behavior of the nervous system. Also, because the system was designed to be user extensible, new information about the nervous system is being added to GENESIS libraries constantly as modelers construct new simulations, and experimentalists provide new data. We propose to use this information as the foundation for the neuroscience database which we are constructing.
We consider GENESIS as a form of knowledge base because it encapsulates the structure, organization, and computational knowledge of the systems it models. It is also suitable as the nucleus of an object oriented database, because as described below, a GENESIS model is inherently object oriented. However, unlike conventional knowledge base systems which are popularly based on a logic-programming or rule-based system (Ullman, 1988), GENESIS is a simulation-based knowledge base system. That is, information is deduced by the execution of numerical algorithms rather than the chaining of rules in a rule-based system.
What is the GENESIS simulator?
GENESIS was specifically designed to allow the construction of biological simulations at many different levels, from sub-cellular components, to whole cells to networks of cells (Bower & Hale, 1991). The ultimate objective was to provide a simulation platform that could support simulations of the nervous system at any level of detail and complexity. GENESIS uses a high-level simulation language to construct neurons and their networks in an object oriented manner. Commands may be issued either interactively to a command prompt, by use of simulation scripts, or through the graphical interface.
Basic organization of GENESIS
The design of the GENESIS simulator and interface is based on a "building block" approach that is fundamentally object oriented. Simulations are constructed of modules/objects which receive inputs, perform calculations on them and then generate outputs. For example, models of single neurons are constructed of small compartments (Segev, Fleshman, & Burke, 1989) which in turn are linked to objects representing variable conductance ion channels. These compartments can then be linked together to form multi-compartmental neurons of any desired level of complexity (Bhalla & Bower, 1993; De Schutter & Bower, 1994a,b). Once constructed, such neurons can be linked together to form neural circuits (Wilson & Bower, 1991; 1992). Neural systems are particularly amenable to this approach because they typically consist of discrete components interacting in quite stereotyped ways and because the different simulations tend to use similar neural components, display routines, numerical integration routines, etc. A particular simulation is set up by writing a sequence of commands in the scripting language that establish the network itself and the graphical interface for a particular simulation. The scripting language and the modules are powerful enough that only a few lines of script can specify a sophisticated simulation.
This object oriented approach is central to the generality and flexibility of the GENESIS system (Bower & Hale, 1991). For example, this modularity means that it is possible to quickly construct a new simulation or to modify an existing simulation by inserting different simulation objects from the existing library of standard simulation components. In this way, individual modules or linked assemblies of modules (such as compartments with channels, entire cells, or networks of cells) may be easily replicated. Each object manages its own variables and objects communicate with one another through message passing. This makes it easy to extend the simulator by writing new modules without the necessity of making changes to existing modules. In this way, the simulation objects available for use within GENESIS continues to grow as the system is used. This growth is reflected in the size and complexity of the libraries of GENESIS objects and simulations that currently exist.
Current GENESIS libraries are quite extensive and have been contributed to by researchers from all over the world. As such, they represent a continuously expanding detailed description of the nervous system. GENESIS libraries can be divided into two types, those that contain whole simulations of cells or networks, and the smulation object libraries which contain the building blocks from which these simulations are constructed. As with the brain component libraries, these simulations contain detailed information about the organization of different brain regions.
A major focus of GENESIS development has been on its graphical interface (Uhley et al., 1990; Bhalla, 1994) which allows neurobiologists to navigate through the complexities of the GENESIS system. The software that has been developed, refered to as XODUS (X-based Output and Display Utility for Simulators) has been intimately incorporated into the GENESIS system itself and allows modelers to interactively set up and control the simulation as well as display simulation results. XODUS was designed to be highly interactive both in setting up, parameterizing, running, and evaluating simulations.
GENESIS and education
From the outset, the education of neurobiologists in the use of simulation tools has been a major focus of GENESIS development. Several complete demonstration simulations have been constructed to illustrate the properties of single cells and parts of cells, through simple neural circuits, to large networks of neurons. A textbook guide to the use of the tutorials as educational tools is being used in many neuroscience and pre-medical courses (Bower & Beeman, 1994). These tutorials are used by other researchers as a starting point for the construction of their own simulations, as well as for undergraduate and graduate teaching.
The Prototype Information Base
We have decided, based on the current design of the database system, to focus the development of the database prototype specifically on the models we have constructed related to the mammalian cerebellum. We have determined that this specific focus will allow us to fully develop a working prototype of the database system. At present, several models exist at the single cell level, which will form the basis for the database. Over the last year, we have completed construction of a systems level-network model within GENESIS which includes major projections to cerebellar folium, crus IIa. The model incorporates physiological data on the structure of cerebellar tactile maps as well as maps for the trigeminal nuclei (Principalis and Interpolaris), thalamic nuclei (VPM and POM) and somatosensory cortex. This later data was obtained from the literature. Over the last year, we have also begun to develop a network model of cerebellar cortex. This large project will take several years to complete, but will, when finished allow us to link the systems level model to the single cell models currently complete or being constructed. From the point of view of the database, linking these models together will provide a prototype for a multi-level browsing interface of neurobiological data relevant to the functional organization of the cerebellar cortex.
Usage scenario for the GENESIS object oriented database
The following example illustrates the way in which the database would be used. In this example, we assume that the interface to the multibase system has been implemented on a high-end graphics workstation where a windowing environment and mouse is a standard means for interaction. Let us assume that a neuroscientist is interested in exploring ways in which calcium channels affect the firing of Purkinje cells in the cerebellum.
Figure 1 shows a prototype for the top level interface to the database. After entering the keyword ``purkinje'', the user is presented with a number of contexts in which Purkinje cells appear in the database, grouped into several categories. From left to right, the categories are (a) behavioral studies (of tactile exploration and cerebellar lesions), (b) the systems level (in this case, what is known about the connections between regions of the cerebellar system), (c) the network level (a model of the cerebellar cortex), (d) the cellular level (Purkinje cell model), (e) the subcellular level (ion channels, etc.), and (f) pertinent references to the literature. These various categories may be accessed both from the top level interface (the Multi-Browser), and from within other categories.
The selection of the De Schutter and Bower (1994a,b) Purkinje cell model leads to a tutorial on the model, with the opportunity to view a description of the model and simulation results of voltage clamp, current injection, synaptic input, and simulated in vivo responses. (See COLOR PLATE) Each of these categories allow an examination of the experimental data for comparison. From within the tutorial, as well as from the subcellular category of the top level interface, the object shown in Figure 2 may be accessed. Here, it is possible to query the model for the parameters used to characterize each of the ten types of channels used in the model (lower left), the distribution of these conductances within the cell model, and references to the papers which provided the descriptions of the channels which were implemented in the model.
The tool box at the lower left provides a way to import and export other data. As it is important to understand the basis of assumptions which were made in the model, there is access to the experimental current injection results which were used to ``tune'' the model parameters. Links to remote sites are provided so that it is possible to access the full data set, as well as the data used for tuning. Objects from the database may be exported to GENESIS simulations, data analysis tools, and visualization tools. The COMPARE button provides tools for aggregate operations such as making automated comparisions with other objects in the database. For example, comparisons may be made with channels found in another model. Within the tutorial it will be possible to perform searches to answer questions like ``what are all the single-cell models which use this type of Ca channel with parameters in this range; what are all the bursting models that were based on spiking behaviour with these statistics?" The researcher can also move from a Ca channel in a Purkinje cell to a listing of all the places Ca channels are currently used in models in GENESIS.
At this point, the user elects to view predictions of in vivo physiological responses to real stimuli, bringing up the display shown in Figure 3. The link at the lower left allows the viewing of experimental extracellular recordings and peri-stimulus time (PST) histograms for comparison with the simulated results. By clicking on a set of results, it is possible to find a reference for the source of these results (De Schutter & Bower, 1994b).
Having explored the single cell level, the user may switch to the cerebellar network level and explore the response of the model of the cerebellar cortex (crus IIa) of the rat to selected input patterns of stimulus. Figure 4 shows the result of a query for the corresponding experimental results. Here, the cursor is used to select tactile stimulation applied to the ipsilateral upper lip area (IUL) in the diagram at the upper right. This results in a display of the PST histogram obtained from recordings from the point marked ``X'' in crus IIa, which is within the patch that is somatopically mapped to the IUL. The peaks in the PST histogram correspond to information coming from two different pathways - the direct pathway through the trigeminal nucleus, and the indirect pathway from the somatosensory cortex. A query for the source of the experimental data leads to a reference to the work of Thompson and Bower (1993).
Moving up another level, the user then shifts to the cerebellar system model in order to learn more about the pathways which are involved. Figure 5 shows a simplified representation of the major sensory pathways to the cerebellum. This is an interface to the high level GENESIS systems level model of the cerebellum and the regions projecting to it. This model lets one click on particular regions of either somatosensory cortex, the thalamus, the trigeminal nucleus, the cerebellum, the pons, or the animal's face, and see all the related regions in each structure. It is possible to see where they connect as well as what their receptive fields are. One can also click on different buttons and see the secondary receptive fields. This can be done in the normal animal simulation as well as in the simulation of an animal with lesioned trigeminal nerve. When the trigeminal nerve is cut, these original projections are kept constant and the model may be used to test the hypothesis that the reorganization that is observed experimentally in the cerebellum could happen without any new projections.
The projections are all established based on the reports in the literature of the receptive field size of each different brain area. It is possible to query this data, as well as descriptions of the experimental procedures. By clicking at the lower left of Figure 5, it is possible to query the highest level of the database, and examine data for the behavior of the rat before and after severing the trigeminal nerve.
The user can repeat all of the above for the other models found in the original query in order to make comparisons between them. With all this information available, the researcher can develop a better understanding of the difference between the models as well as their similarities. If the neuroscientist is interested in developing a new model or exploring different mechanisms, then the database will provide information on which sets of parameters are the most appropriate to use in the simulations that will be performed.
Implementation of the GENESIS Database System
At an early stage, we determined that the GENESIS database should be usable by neuroscientists who do not have the GENESIS simulator installed. Although simulation models are the central organizing principle for the database, much of the information in the database can be useful without the need to simulate any of the models in the database. The tutorials described above are generated by the simulator, but in most cases can be explored without the simulator. As we have said, we expect these tutorials to the the main point of entry to the database for early searching in a domain. At the same time, it is important that rich interfaces be provided between the simulator and the database, so that a model established in the simulator can be easily entered into the database, and so that queries to the database about the behavior of models can be satisfied by running the models in the simulator rather than storing every possible behavior characteristic of each model (an impossible task). In addition to these two requirements, we identified three fundamentally different types of information that will be entered in the database. First and foremost are the GENESIS models, which organize all the information in the database. Second are the experimental data on which the models are based and the simulated data the models produce. Third are classical textual information sources: texts, citations, annotated images, etc.
To meet these needs, we settled on a design using a standard HTML (Hypertext Markup Language) browser interface (e.g., Netscape) as a front end to three database systems, subserving the three types of information listed above, which may be integrated into one database eventually. Although HTML browsers do not provide sufficient interface tools to run an entire database, they suit our project for two reasons. First, the tutorials can be easily organized as a set of hypertext linked pages, with hooks to database entries at appropriate points. Second, these browsers represent an enormous investment in graphical user interface design and implementation for all the common platforms (X/Unix, MS-Windows, Macintosh), which we certainly do not want to duplicate. The HTML format does provide the capability to invoke external programs which can in turn invoke their own graphical user interface (GUI) components, and we expect that this will be a major pathway for accessing the database.
The second critical design decision was to use object oriented database technology. Object technology is highly suited to neuroscience data because of its extensive heterogeneity and compositionality, and for this reason many projects in the Human Brain Program are using object technology. However, the GENESIS database has an even more compelling reason to use object oriented techniques. The GENESIS models are the structured information objects at the core of the database, forming the entries which organize all other information in the database. GENESIS models are designed and written as object oriented programs, using standard and user-supplied library objects communicating via stereotypical message types. Thus it is natural to store a GENESIS model as a collection of related objects, for which an object oriented database system is the clear choice. Moreover, since the models and the simulator itself are organized as objects, we can add to each object actions which will write the necessary code to enter the object into the database. Thus much of the work of entering and structuring models in the database can be automated. We plan to use a commercial object oriented database (UniSQL) to implement the database which accesses the GENESIS models and the experimental data. The third subsystem, for textual material, is described below.
A second important reason for mirroring the object structure of a GENESIS model in an object oriented database is that it will allow sophisticated queries to be run not only over each model as an entity but also over the component parts of the model. This is crucial in facilitating a user's understanding of a model found in the database. We expect each class of GENESIS object to define a database schema, and each instance of that class in a model to be entered as an instance of it's progenitor's schema.
Interfaces between the simulator and the database
There are two primary interfaces between the simulator and the database. The first allows the database to invoke the simulator, to set up a model derived from a database entry, and then either to hand over control to the simulator's user interface (if the user wishes to manually run the model), or to run the model under the control of the database in order to derive some piece of information about the model's behavior. This information could have been explicitly requested by the user, or it could be implicitly required to satisfy a query. This interface will involve creating some database classes devoted to running the simulator, and some GENESIS library objects devoted to reporting information back to the database.
The second interface is that provided by the simulator to facilitate entry of a model into the database. This will be provided by one or more GENESIS library objects which export user commands for dumping a model into the database. Each standard GENESIS library object involved in a model will be augmented with one or more actions to generate the corresponding database schema and to instantiate the schema in the database for each element derived from that object in the model.
For the experimental and simulated data we will provide schema that can store and manipulate data of the different types. For example, an electrical recording schema will provide methods for computing statistics of spiking behavior, and these methods can be invoked by the user in queries. For the textual data, we will provide schema to store the different data types. These will have methods for exploring those data, including statistical text processing techniques and knowledge-based techniques.
The information retrieval subsystem
The document retrieval subsystem has different requirements than the model and data subsystems described above, and will be implemented in a different manner. However, this retrieval system will be deployed using the same user interface described above. Here, we emphasize the word "document" in order to draw attention to the distinction between document and data retrieval. A document is a textual representation of an object, such as, an article, book, image, audio, video, etc. For the prototype, the database will cover the literature that is used by the researchers who use GENESIS, i.e., computational neurobiology and the relevant experimental studies.
This subsystem will be a state-of-the-art full-text information retrieval (IR) system using advanced IR techniques, and will be a client/server system that uses SGML (Standard Generalized Markup Language) as the underlying database format in the server search engine. (See, for example, van Herwijnen, 1990.) Any valid SGML Document Type Description (DTD) could be used as the database description for a file of bibliographic records, or full-text documents. The retrieval engine will make use of statistical retrieval techniques based on the probabilistic retrieval model (Robertson et al., 1995), ranked output and relevance feedback. A number of additional approaches would be utilized, such as Boolean searching, document clustering, citation linking of documents for browsing and searching, hypermedia linking of documents and other objects outside the database (Fidel & Efthimiadis, 1995).
Users would enter queries as "free-text" (that is, normal English prose) statements of their interest or need for topical searches. No formal "query language" or Boolean logic imposed on the user. The graphical client interface would also include features such as the ability to accumulate bibliographies of citations seen, an extensive help system, command history and redo, multiple display formats for retrieved records and the ability to export and use the text of retrieved documents in other applications within the GENESIS system or for word-processing. The use of probabilistic IR techniques to match the users' initial query with a set of documents in the database would result in the retrieval of documents in decreasing ranked order of probable relevance to the users' query. This aids the user in subject focusing and topic/treatment discrimination. The system would also provide for direct probabilistic or Boolean searches of any indexed data elements.
We are now beginning the serious stage of user interface design. This includes carefully determining the type(s) and interaction styles for user interface(s); query formulation, query reformulation and query expansion support; browsing support; data visualization for information retrieval. Our objective will be to assure that the overall design reflects the integration of retrieval techniques, knowledge structures and database structure at the user interface. In the next year we will also implement the HTML based interface that can be accessed over the internet. We will use this interface to allow selected users to interact with our ongoing interface design and provide feedback to the database development group.
We will proceed with development of the retrieval subsystem along several lines. First, we will work further with users. This effort includes continued study of the neurobiology (modelers and not) user population and their information seeking patterns, needs assessment, and user modeling. Second, we will continue database design for this component. This includes defining subject matter, document representations, and record structures. Third, considerable design is necessary in the area of knowledge structures and knowledge-bases. We want to be able to take advantage of domain knowledge and develop knowledge structures and knowledge bases for use in indexing and searching. Fourth, we will begin work on the actual retrieval engine itself.
During the next year we plan to implement the commercial object oriented database prototype described above, and will link the interface we have designed to the information retrieval subsystem. We will make the database available, for to those in the GENESIS development group, and then to a selected set of outside users for evaluation. Based on the results of this evaluation, we will modify the database structure. This will give us a working version of the GENESIS database built around models related to the mammalian cerebellum.
We also intend to start the process of linking the GENESIS database to some of the other database projects within the Human Brain Project. These include the Shepherd project, which is already using GENESIS and could quite naturally extend our efforts into a prototype related to the olfactory system, and to the project run by Peter Fox in which we will be able to attach human fMRI data on the cerebellum to the GENESIS cerebellar database. In this way we will begin serious exploration of the ways in which our project and several other current projects can cooperate in providing data and information of use to the neuroscience community.
Recommendations for Future Directions
Our work so far suggests some future directions which neuroinformatics technology might take. The management and distribution of neurobiological data in general, and for the Human Brain Project in particular, requires that the wide variety of information be made available in a consistent and reliable format to a wide range of groups, such as neuroscientists, medical doctors, and students, for multiple purposes. The potential of the Internet to provide a reliable means of sharing the Human Brain Project generated data is enormous. The Human Brain Project and its constituent programs can be viewed as the beginning of the development of a digital library for human brain research. We recommend that in the next phase of the Human Brain Project considerations should be made for the interoperability and the integration of all the projects under the umbrella of a Human Brain Project Digital Library.
Efforts to develop digital libraries have already begun (Fox et al., 1995). The development of scientific Human Brain Project related digital libraries (data repositories) can be expected to trigger the development of information retrieval and other tools that provide not just access to published neurobiology related works, but to the information resources (the data and metadata) upon which published works are built. Information systems must provide multiple forms of access to research data, but they must also tie data to additional information resources that typically reside outside scientific databases in the form of overviews describing the origins and contexts of the research, and metadata (i.e., data about the data) indicating the appropriate uses of the data and its availability or distribution.
The addition of metadata, and other information resources on the Human Brain Project data adds value to the data and makes interoperability easier. Metadata standards provide specialist users with information about such things as the content, format, availability, and appropriate uses of data. Data transfer standards facilitate the exchange of data over wide area networks. Communication protocols for networked information retrieval, such as ANSI/NISO Z39.50 (NISO 1992), enable users to query a remote data base over a network. Therefore, we recommend that interdisciplinary and inter-agency efforts should be attempted to develop a Human Brain Project Digital Library, the benefits of which will be shared by those involved in neuroscience research and education.
Obtaining Further Information
Many of the plans described above will have been carried out by the time this chapter is in print. For the latest information regarding the GENESIS Human Brain Project neural database, you may use the URL http://www.bbb.caltech.edu/hbp/database.html to access our World Wide Web server.
Inquiries concerning GENESIS may be addressed by email to firstname.lastname@example.org. We also have a World Wide Web server to provide further information about GENESIS. This will allow you to see "snapshots" of GENESIS simulations and find information about research which has been conducted with GENESIS. The server may be accessed using the URL, http://www.bbb.caltech.edu/GENESIS.
The prototype for the top level interface to the database. Several levels of contexts for the keyword "purkinje" are displayed, ranging from behavioral studies of the animal to subcellular properties of the Purkinje cell, plus pertinent references to the literature.
The tutorial on Purkinje cell channels, accessible from either the top level interface or the Purkinje cell tutorial, provides a basis for queries regarding channel properties. The detailed cell model shown (De Schutter & Bower 1994a,b) uses 4550 compartments to model the dendrites and dendritic spines, and contains 8021 ion channels which fall into 10 categories. The tool box at the lower left provides a way to import and export other data, and to make comparisions with other objects in the database.
Predictions of in vivo physiological responses to appied stimuli are accessed from the Purkinje cell tutorial. Both simulated and experimentally obtained measurements of PST histograms, spike trains, and other data may be viewed.
The result of a query for experimental results showing the response of the cerebellar cortex of a rat to facial stimulation. The ipsilateral upper lip area (IUL) was selected, and the PST histogram displayed is for recordings at the corresponding somatopically mapped area of crus IIa.
The graphical interface for queries regarding Purkinje cells at the systems level. Selecting particular regions in the diagram of the major sensory pathways to the cerebellum allows one to see both simulated and experimental data for primary and secondary receptive fields under various conditions.
A tutorial on the cerebellar Purkinje cell is used as an entry point to queries on Purkinje cells, when using the GENESIS Simulator-based Neuronal Database. Here, the user has selected the Voltage Clamp tutorial. In addition to showing both simulated and experimental results for voltage clamp results, this tutorial provides a list of suitable keywords for additional searches. For example, "channels" will allow the the user to extract information regarding the voltage dependent channels which are found in Purkinje cells. The detailed cell model (De Schutter & Bower 1994a,b) is showing a false color representation of the membrane potential throughout the cell, as a result of a previously performed current clamp simulation.
ANSI/NISO Z39.50-1992 (1992). American national standard information retrieval application service definition and protocol specification for open systems interconnection. Bethesda, MD: NISO Press.
Beeman, D. (1994). Simulation-based tutorials for education in computa- tional neuroscience. In F. H. Eeckman (Ed.), Computation in Neurons and Neural Systems (pp. 65-70). Boston: Kluwer Academic Publishers.
Bhalla, U. S. (1994). Advanced XODUS techniques: Simulation visualization. In J. M. Bower & D. Beeman (Eds.), The Book of GENESIS: Exploring Realistic Neural Models with the GEneral NEural SImulation System chapter 20, (pp. 337-362). New York: Springer-Verlag.
Bhalla, U. S. & Bower, J. M. (1993). Exploring parameter space in detailed single neuron models: Simulations of the mitral and granule cells of the olfactory bulb. J. Neurophysiol., 69, 1948-1965.
Bower, J. (1991). Exploring biological neural networks using realistic computer simulations. Naval Research Reviews, 43, 17-22.
Bower, J. & Hale, J. (1991). Exploring neuronal circuits on graphics workstations. Scientific Computing and Automation, March 1991, 35-45.
Bower, J. M. (1992). Modeling the nervous system. Trends Neurosci., 15, 411-412.
Bower, J. M. & Beeman, D. (1994). The Book of GENESIS: Exploring Realistic Neural Models with the GEneral NEural SImulation System. New York: Springer-Verlag.
Bower, J. M. & Koch, C. (1992). Experimentalists and modelers: Can we all just get along? Trends Neurosci., 15, 458-461.
De Schutter, E. & Bower, J. M. (1994a). An active membrane model of the cerebellar Purkinje cell I. Simulation of current clamps in slice. J. Neurophysiol., 71, 375-400.
De Schutter, E. & Bower, J. M. (1994b). An active membrane model of the cerebellar Purkinje cell II. Simulation of synaptic responses. J. Neurophysiol., 71, 401-419.
Eeckman, F. H. & Bower, J. M., Eds. (1993). Computation and Neural Systems. Boston: Kluwer Academic Publishers.
Feldman, J. (1995). Universal high performance computing - we have just begun. http://www.icsi.berkeley.edu/Sather/ps/universal.ps.gz.
Fidel, R. & Efthimiadis, E. N. (1995). Terminological knowledge structure for intermediary expert systems. Information Processing and Manage- ment, 31, 15-27.
Fox, E., Akscyn, R.M.and Furuta, R., & Leggett, J. (1995). Digital libraries. Communications of the ACM, 38(4), 22-28.
Grossman, R. & Qin, X. (1993). PTool: A Software Tool for Working with Persistent Data. Technical Report 93-5, Laboratory for Advanced Computing, University of Illinois at Chicago.
Hasselmo, M. E. & Bower, J. M. (1993). Acetylcholine and memory. Trends Neurosci., 16, 218-222.
Robertson, S. E., Walker, S., Jones, S., Hancock-Beaulieu, M. M., & Gatford, M. (1995). Okapi at TREC-3. In The Third Text Retrieval Con- ference (TREC-3) (pp. 109-126).: NIST Special Publication 500-225, CODEN: NSPUE2.
Segev, I., Fleshman, J. W., & Burke, R. E. (1989). Compartmental models of complex neurons. In C. Koch & I. Segev (Eds.), Methods in Neuronal Modeling chapter 3, (pp. 63-96). Cambridge, MA: MIT Press.
Thompson, J. H. & Bower, J. M. (1993). Electrophysiological dissection of the three excitatory inputs to cerebellar Purkinje cells. In J. M. Bower & F. H. Eeckman (Eds.), Computation and Neural Systems (pp. 349- 354). Boston: Kluwer Academic Publishers.
Uhley, J., Wilson, M., Bhalla, U., & Bower, J. (1990). A UNIX, X-windows-based neural network simulation system. In USENIX '90 Conference Proceedings.
Ullman, J. (1988). Principles of Database and Knowledge-base Systems. Computer Science Press.
van Herwijnen, E. (1990). Practical SGML. Boston: Kluwer Academic Publishers.
Wilson, M. & Bower, J. (1991). A computer simulation of oscillatory behavior in primary visual cerebral cortex. Neural Computation, 3, 498-509.
Wilson, M. & Bower, J. M. (1992). Cortical oscillations and temporal interactions in a computer simulation of piriform cortex. J. Neurophysiol., 67, 981-995.