Manuals for GEMS
[ Using the Client Tools ] [ Setting up Storage ] [ Setting up the Server ] [ GEMS API ]
This manual is intended to help users get started with the GEMS tools. GEMS users will typically only need to refer to the client section. Adding storage to an existing system requires reviewing the storage section, and creating a new system requires the server section.
The GEMS API docs are included for users wishing to access the GEMS classes directly in their application. This is particularly useful for Java and Beanshell developers who desire to gain full control of the GEMS clients. Additionally, a Java API is provided for Chirp.
GEMS client tools
Overview
The GEMS system stores datasets, specifically directory trees, called "configs", on a distributed network of file servers. The files are replicated automatically by the system. Metadata specified by the users is kept in a central database that may be queried for information about stored configs. The GEMS clients allow the user to create, view, and download configs that are stored in the system via a GUI. Tools are also included to insert, query, and retrieve configs programmatically from the shell.Setup
You have two options: using the JAR file or installing the complete program including the scripts.| JAR File | Client install |
|---|---|
|
|
Browsing GEMS
A graphical method to view the records stored in GEMS is provided by the GEMSview tool. If using the JAR file method, double-click on the GEMS file or type> java -jar GEMS.jar &
or, if using the client install,> GEMSview &
Main window
Storage Map
Each config in GEMS is associated with a Storage Map. The map indicates which Chirp hosts are eligible to store the files in the config, and allows the files to be replicated to distinct, separate clusters to improve survivability.
|
|
Access Control
Each config in GEMS is associated with an Access Control List (ACL). These ACLs are based on the ACL used in the underlying Chirp system, and are similar to AFS ACLs, for example.
|
|
Inserting into GEMS
Overview
Datasets in GEMS are called configs. Each config consists of a directory subtree of files in which the path information is maintained, similarly to tar or zip archives. Each config is associated with a set of metadata tags called params that may be searched as above with GEMSview or GEMSmatch. Additionally, an Chirp-formatted Access Control List (ACL) is associated with each config. This ACL is propagated to each server that hosts this config. At a minimum, GEMS will grant access to itself and to the submitting user or owner.Technique
The standard way to insert data into GEMS is to use GEMSput, by putting all the information on the command line, or by creating an GEMSput XML document to specify the operation.
~exp01> GEMSput --owner unix:sorin
scientist_lname=sorin app=exp count=1
--file exp.in --file exp.out
--acl-entry "unix:*" read
--acl-entry "hostname:*.university.edu" read
java -jar GEMS.jar put
but the arguments remain the same.
<GEMS>
<GEMSput>
<config owner="unix:sorin">
<acl>
<entry principal="hostname:*.university1.edu"
perms="lr" />
<entry principal="hostname:*.university2.edu"
perms="lr" />
<entry principal="unix:*" perms="lrwda" />
</acl>
<params>
<scientist_lname>Sorin</scientist_lname>
<app>exp</app>
</params>
<files>
<file path="./" name="exp.in"
type="exp input" reps="4" io="i" />
<file path="./" name="exp.out"
type="exp output" reps="2" io="o" />
</files>
<host>machine.university.edu</host>
<storage name="two_universities">
<cluster name="university1">
<pattern>*.university1.edu</pattern>
</cluster>
<cluster name="university2">
<pattern>*.university2.edu</pattern>
</cluster>
</storage>
</config>
</GEMSput>
</GEMS>
|
|
GEMSput Options
| Flag | Description |
|---|---|
| -i<host> | The hostname on which the GEMS services are running. If omitted, defaults to the local machine (127.0.0.1). |
| -p <port> | The port number on which the GEMS server is listening. If omitted, defaults to 7101. |
| <file.xml> | Use this XML file. Only one such file may be submitted at a time. You may combine this file with additional command line options. |
| - | Read XML file from stdin. |
| --owner <identity> | Set the owner of this config to identity, a Chirp-formatted identity. |
| --host <host> | Recommend a target Chirp host for the initial upload. |
| --localhost | Recommend the local Chirp host for the initial upload. |
| <key>=<value> | Adds a param pair to the params list. |
| --reps <count> | All of the following files will have this replica count, up to the next --reps flag. |
| --file <file> | Adds file to the set of files to be uploaded. |
| --find | Adds all files under the current directory and subdirectories to the set of files to be uploaded. |
| --key | Print the config key number that corresponds to this new config. |
| --hosts | Print the name of the Chirp host actually used for the upload. |
| --acl-entry <principal> <perms> | Add an ACL entry. |
| --acl <file.xml> | Specify an ACL file. ACL files may be generated with the GEMSview ACL GUI shown above or simply formatted by hand as shown in the GEMSput XML example above in stand alone files. The top level tag in the file must be <acl>. |
| --map <file.xml> | Specify a map file. Storage map files may be generated with the GEMSview map GUI shown above or simply formatted by hand as shown in the GEMSput XML example above in stand alone files. The top level tag in the file must be <storage>. |
| --auto-params <prog> <args> | Specify an external program or script that will generate the params for this config. The format of the output of the program is one param pair per line, the first word is the tag name, the rest of the line will be the value. |
| --debug | Turn on verbose debugging. |
Searching in GEMS
Overview
Once configs have been stored in GEMS, it becomes necessary to be able to search for them. GEMSmatch matches sets of params to config keys, which can then be used to download the files. A typical GEMSmatch may produce multiple matches, resulting in multiple config keys. The user may retrieve a full XML-formatted response, or a simple list of keys.Technique
> GEMSmatch scientist_lname=Sorin --keys
141325
> GEMSmatch --config 141325
which produces the full XML.
<GEMS>
<GEMSmatch>
<params>
<scientist_lname>Sorin</scientist_lname>
<app>exp</app>
<count>1</count>
</params>
</GEMSmatch>
</GEMS>
|
|
<GEMS> <GEMSmatch> <config key="24858230" /> </GEMSmatch> </GEMS> |
|
<GEMS>
<GEMSmatch_Response>
<configs>
<config key="24858230" owner="unix:sorin">
<params>
<scientist_lname>Sorin</scientist_lname>
<app>exp</app>
</params>
<files>
<file path="./" name="exp.in"
type="exp input" reps="4" io="i">
<host>machine.university.edu</host>
<host>host2.university.edu</host> </file>
<file path="./" name="exp.out"
type="exp output" reps="2" io="o" >
<host>host1.university.edu</host>
<host>host2.university.edu</host> </file>
</files>
</config>
</configs>
</GEMSmatch_Reponse>
</GEMS>
|
|
GEMSmatch Options
| Flag | Description |
|---|---|
| -i <host> | The hostname on which the GEMS services are running. If omitted, defaults to the local machine (127.0.0.1). |
| -p <port> | The port number on which the GEMS server is listening. If omitted, defaults to 7101. |
| <file.xml> | Use this XML file. Only one such file may be submitted at a time. |
| - | Read XML file from stdin. |
| --config <key> | Ignore param pairs, just search for this config key. |
| <key>=<value> | Adds a param pair to the params list for searching. |
| --locate <file> | Print a valid full chirp-formatted virtual filename
for this abstract file.
Abstract files are formatted:
|
| --hosts | When used with --locate, print all possible host locations for this file. |
| --keys | Print the config key number that corresponds to each matching config. |
| --params | Print the params for each matching config. |
| --files | Print the files and current replica count for each matching config. |
| --acls | Print the ACL for each matching config. |
| --maps | Print the storage map for each matching config. |
| --owners | Print the owner for each matching config. |
| --first <n> | Omit the first n-1 configs. Configs are ordered by the config number. |
| --last <n> | Omit the configs after config n Configs are ordered by the config number. |
| --debug | Turn on verbose debugging. |
Retrieving data from GEMS
Overview
Once the user has obtained the config key by using GEMSmatch or GEMSview, the data files may be downloaded by using the GEMSget tool. All or some of the files may be obtained, and will be stored in the specified output directory.Technique
~/tmp> GEMSget --config 123.
This will download all the directories and files in the config and place them under ~/tmp.GEMSget options
| Flag | Description |
|---|---|
| -i <host> | The hostname on which the GEMS services are running. If omitted, defaults to the local machine (127.0.0.1). |
| -p <port> | The port number on which the GEMS server is listening. If omitted, defaults to 7101. |
| <file.xml> | Use this XML file. Only one such file may be submitted at a time. |
| - | Read XML file from stdin. |
| --config <key> | The config to download. |
| --file | Download this file. Do not download files that are not on the command line. |
| --output <directory> | Specify an output directory for the downloads. |
| --auth-mode | Use this Chirp authentication mode, e.g., unix. |
| --hosts | Print the hosts used to download the files. |
| --debug | Turn on verbose debugging. |
Deleting data from GEMS
Overview
Once the user has obtained the config key by using GEMSmatch or GEMSview, the whole config may be deleted.Technique
> GEMSdelete --config 123.
This will delete the whole record from the database, and the files in this config will be garbage collected.GEMSdelete options
| Flag | Description |
|---|---|
| -i <host> | The hostname on which the GEMS services are running. If omitted, defaults to the local machine. |
| -p <port> | The port number on which the GEMS server is listening. If omitted, defaults to 7101. |
| <file.xml> | Use this XML file. Only one such file may be submitted at a time. |
| - | Read XML file from stdin. |
| --config <key> | The config to delete. |
| --auth-mode | Use this Chirp authentication mode, e.g., unix. |
| --debug | Turn on verbose debugging. |
Using a .gemsclient file
Overview
Using a .gemsclient greatly simplifies many GEMS client operations. Simply create an XML file like the one shown below and place it in your home directory, called .gemsclient. Windows users should put this file in their "Documents and Settings\<username>" directory.
<GEMS>
<GEMSclient>
<chirp>
<user>unix:sorin</user>
<user>hostname:sorin.university.edu</user>
</chirp>
<acl>
<entry principal="*.university.edu" perms="lr" />
</acl>
<storage name="two_universities">
<cluster name="university1">
<pattern>*.university1.edu</pattern>
</cluster>
<cluster name="university2">
<pattern>*.university2.edu</pattern>
</cluster>
</storage>
<servers>
<host>gems.university.edu</host>
</servers>
<GEMSview>
<keys>
<scientist_lname> Sorin </scientist_lname>
</keys>
</GEMSview>
</GEMSclient>
</GEMS>
|
|
Providing resources for GEMS
Overview
GEMS allows storage owners to volunteer space to the system on a temporary basis by running a small server, a service that does not require root access. Storage may be revoked at any time. Additionally, GEMS ensures that your disk does not fill up with data, and actually removes GEMS data as the disk becomes full. In short, volunteering storage space to GEMS is safe, administratively easy, non-committal, and does not interfere with disk consumption by regular users.Setup
UNIX users may volunteer space to GEMS using the method outlined below.
> chirp_server -r /tmp/chirp -u gems.university.edu
> chirp myhost.university.edu
chirp:myhost:/> mkdir /GEMS
chirp:myhost:/> setacl /GEMS hostname:gems.university.edu admin
GEMS services
Overview
The GEMS services are the center of a GEMS installation. The software consists of a Java-based GEMS daemon called GEMSd, which manages client connections, and responds to queries. This service manages metadata only: the actual data files are transmitted directly from clients to the Chirp servers. The service requires a Postgres database and Java 1.5.0 . It should be run entirely as a non-root user. Installation may be performed by a non-root user as well by modifying the installation process below, but we assume the installation may be performed by root.Setup
~/postgres-src# configure --prefix=/opt/pgsql
~/postgres-src# make ; make install
~/postgres-src# chmod a+rx /opt/pgsql
~/postgres-src# chmod a+rx /opt/pgsql/*
~/postgres-src# mkdir /opt/pgsql/data
~/postgres-src# useradd postgres
~/postgres-src# chown postgres /opt/pgsql/data
~/postgres-src# su - postgres
> cd /opt/pgsql
> bin/initdb data
> bin/pg_ctl -D data start
postmaster successfully started
> tar xf GEMS-???-src.tar
> ant -Ddist.dir=/opt/GEMS dist
# useradd gems
# chmod a+rx /opt/GEMS
# chown gems /opt/GEMS/.gemsdconfig
# su - postgres
> /opt/pgsql/bin/createdb GEMSd
> /opt/pgsql/bin/psql GEMSd
GEMSd=# create user gems;
GEMSd=# grant all on database "GEMSd" to gems;
GEMSd=# \q
# su - gems
> /opt/GEMS/bin/DBcreate
> /opt/GEMS/bin/GEMSd -d -f /opt/GEMS/.gemsdconfig
GEMSd Options
| Flag | Description |
|---|---|
| -d | Daemon mode: disable the console.
The console is intended to be used for debugging purposes only,
this option is used to restrict output to useful log messages only,
for example:
> GEMSd -d > /opt/GEMS/gemsd.log |
| --debug | Displays verbose debugging information, including Chirp operations, SQL statements, etc. |
| -f <file> | Specifies the path to the GEMSd config file. |
| -h | Displays GEMSd options. |
GEMSd Configuration File
An example config file is shown below:
<GEMS>
<GEMSadmin>
<Catalog>
<host>catalog.university.edu:9097</host>
<refresh>5</refresh>
</Catalog>
<ChirpRoot>
/GEMS/
</ChirpRoot>
<Metadatabase>
<host>gems.university.edu</host>
<user>gems</user>
<password />
</Metadatabase>
<GEMSprincipal>
hostname:gems.university.edu
</GEMSprincipal>
<Threads>
...
</Threads>
<Groups>
<group name="A">
<host name="*.A.university.edu" />
</group>
<group name="BCd">
<host name="*.B.university.edu" />
<host name="*.C.university.edu" />
<host name="d.D.university.edu" />
</group>
</Groups>
</GEMSadmin>
</GEMS>
|
|