The FoldMiner and LOCK 2 software and web pages require approximately
40MB of disk space. The vast majority of this space (38MB) consists of
example FoldMiner structural similarity searches. We recommend that
you do not delete these files until you are familiar with FoldMiner.
The installation process assumes the existence of a tmp directory on
your system. It must be located in "/". If your tmp directory is in
a different location, make a symbolic link "/tmp" that points to the
actual location (you must do this manually):
For most applications, you will most likely want to align your query structure to a database of targets representing most or all known folds. We recommend SCOP as a source of target databases. (See the ASTRAL website for a description of methods used to select sequence dissimilar subsets of SCOP and target databases consisting of representatives from various levels of the SCOP hierarchy.) The web page interface assumes that only these SCOP target databases will be used, though you may, of course, use any target database at the command line.
If you choose to use SCOP target databases, you will need to download SCOP PDB files from the FoldMiner distribution website or from the ASTRAL web site. You will need approximately 600MB of disk space. You may decrease the amount of time required for LOCK 2 and FoldMiner runs somewhat by gunzipping these directories, which will then require approximately 2.5GB of disk space. The computational time saved is that required to gunzip the PDB files. The location of the directory containing the SCOP PDB files must be set in the site.defs file (see the SCOP_PDB_DB_LOC variable and explanation), which is described below.
Because secondary structure information is obtained using the dssp
algorithm if it is not included in PDB file headers, you may wish to
create a database of DSSP files for the SCOP PDB files. This will
require an additional 85MB of disk space if the files are gzipped, or 245MB otherwise.. You may choose to create this
database during the installation process. (See the MAKE_SCOP_DSSP_DB
and SCOP_PDB_DB_LOC variables and explanations in site.defs.) DSSP files will be gzipped by default; type:
gunzip -r <dssp files directory>
to gunzip them.
Alternatively, you may create this directory later by typing:
make MAKE_SCOP_DSSP_DB
from within the installation package directory at any time after running the "./configure" command described in the distribution installation section below, regardless of how you set the MAKE_SCOP_DSSP_DB variable in the site.defs file.
You may also wish to create a local copy of the PDB on your system. If
so, its location must be specified in site.defs (see the PDB_DB_LOC
variable and explanation). If you will use PDB structures frequently
in your searches, you may also wish to create a corresponding database
of DSSP files. You may choose to create this database during the
installation process; DSSP files will be gzipped by default.
This will require approximately 420MB of disk space, and will take some time to create. (See the PDB_DSSP_DB_LOC and MAKE_PDB_DSSP_DB
variables and explanations in site.defs.)
Alternatively, you may create this directory later by typing:
make MAKE_PDB_DSSP_DB
from within the installation package directory at any time after running the "./configure" command described in the distribution installation section below, regardless of how you set the MAKE_PDB_DSSP_DB variable in the site.defs file.
Two perl modules (Scop.pm and Expectation.pm) were included with this distribution. We will provide updated modules on the FoldMiner distribution website as new SCOP releases become available. The modules in this distribution correspond to SCOP release 1.63. (See the SCOP_MODULES_LOC variable and explanation in site.defs to choose the installation directory for these modules.)
Download the file
http://fold.stanford.edu/distributions/FoldMiner/FoldMinerDistribution.tar.gz and
unpack it:
gunzip -c FoldMinerDistribution.tar.gz | tar xvf -
Edit the file site.defs in the FoldMinerDistribution directory to define various
directories and default parameters. Each variable in the file has
associated instructions. If you will not be installing the web
interface, you may safely ignore variables relating to web
directories.
Do NOT include any extra spaces around the equals signs.
Distribution installation:
Program installation may require root privileges, depending on the installation directories you entered in the site.defs file described in section II.C. Log in as root (if necessary) and cd to
the directory containing the software distribution (FoldMinerDistribution by default).
There are three major installation options:
If you wish to install LOCK 2, FoldMiner, and the web interfaces, type:
./configure
make
make install
For a customized configuration, the GNU configure package allows you
to set certain environment variables, such as CC (your desired C
compiler) and CFLAGS before you run configure.
./configure
make
make install_lock2
make install_foldminer
./configure
make
make install_lock2
lock2 -q <queryfile> -t <targetfile>
The
arguments <queryfile> and <targetfile> may be complete paths to PDB
files, PDB accession codes (e.g. 1mbd or 1seb-A, where 1seb-A
specifies chain A of 1seb), or SCOP identifiers (e.g. d1dlwa_). PDB accession
codes will work only if you have correctly entered the location of a
local copy of the PDB in the site.defs file, and SCOP identifiers will
work only if you have correctly entered the location of a local copy
of SCOP PDB files in the site.defs file (see section II.C above).
To see all available command line options, type:
lock2 -h
lock2 -q 1mbd -t mytargets.file
The p values given in the search.out file are based on a background
score distribution that encompasses all SCOP folds. We have found that
the significance of an alignment is more accurately assessed by
considering the query structure's fold. If your query structure is a
SCOP domain, you can obtain more accurate p values using the script
"calculate_pvalues.pl." This will replace the p values in search.out;
no new files will be created. The usage is as follows:
./calculate_pvalues.pl <query's SCOP identifier or SCOP fold> <full path to
search.out>
The script looks for the LOCK 2 output file specified in the second command line argument; if you have renamed search.out, this script will still function properly.
If your query structure is not a SCOP domain but you wish to calculate
p values for a specific SCOP fold (e.g. if you know your query is a globin), you may do so by providing the SCOP
fold as an argument in place of the query's SCOP identifier. For example:
./calculate_pvalues.pl a.1 myresults/search.out
The script "makePDBfile.pl" can be used to create a PDB file
containing the query structure as chain A and the target structure as
chain B. To run, type:
./makePDBfile.pl <query_target.out>
where "query_target.out" is the alignment file produced by LOCK 2 for the alignment you wish to view (see section II.C, item 1). This will produce a file of the same name with the ".out" extension replaced with a ".pdb" extension.
Load the PDB file in any viewer. We recommend displaying the alignment
as a cartoon diagram. To do this in Rasmol, enter the following commands or choose the equivalent options from the menus:
wireframe off
cartoons
color chain
FoldMiner runs a structural similarity search (using LOCK 2 to perform pairwise structural superpositions) and automatically finds a structural motif that is the basis of the similarity between the query structure and high scoring targets. Algorithmic details can be found in the reference cited at the end of this document.
If you have not already performed LOCK 2 alignments, FoldMiner will do so for you.
If you have already aligned a query protein structure to a database of
targets using LOCK 2, FoldMiner will not redo the structural
alignments. Note that you must have a search.out file (potentially renamed) containing
results for all pairwise alignments to avoid repeating the LOCK 2
alignments. This file is automatically produced each time LOCK 2 is run.
Files containing subsets of SCOP domain identifiers are installed in the "targetdb" subdirectory of the directory in which you have chosen to install the FoldMiner command line interface (specified by the variable FOLDMINERDIR in the site.defs file described in section II.C). See astral.stanford.edu for documentation on these subsets.
We find the file
astral-scopdom-seqres-gd-sel-gs-bib-25-1.63.id, which contains a set
of SCOP domains such that no two have greater than 25% sequence
identity, to be particularly useful.
./FoldMiner.pl -q <query PDB or SCOP id> -t <full path to target database file> [-r <full path to search.out file> -a <alignments directory> -x -e -exclude <exclude string> -lpg]
Query structure's PDB or SCOP id: e.g. d1dlwa_ (a SCOP id), 1dlw (a PDB id),
or 1dlw-A (pdb id with chain specified)
If the query structure can't be found in local pdb or SCOP
databases specified in site.defs,
you must supply the full path to an *uncompressed* pdb file
Full path to the target database file.
Each line must contain either a full path to a PDB file or the SCOP or PDB
identifier for a target structure. To specify a single chain of a PDB structure,
append a dash and the chain identifier to the PDB identifier (e.g. chain A of 1dlw
becomes 1dlw-A).
Full path to search.out file (you may rename it), e.g. "./myresults/run1/search.out". If not specified,FoldMiner will use the directory supplied with the -a argument,if specified, or the current working directory otherwise. If you would like FoldMiner to run the LOCK 2 alignments, this argument should be the
desired location of the search.out file. If you specify a directory that does not exist,
it will be created. You may specify a name other than search.out, and FoldMiner will use
this name. If you have not already run LOCK 2 alignments. the alignments will be run and placed in the directory specified by the -a option, or the current working directory if the -a argument is not given.
Expectation. A value of 10 indicates that, on average, 10 false positives will be reported. If not specified, a default value of 10 is used.
x value. If not specified, the default value is 0.75 Extent to
which the query secondary structure elements' conservations
influence the structural similarity search by giving more weight to
highly conserved regions of the query structure. This value must lie
between 0 and 1, where a value of 0 turns off the motif detection. A
value of 1 (or very close to 1) may lead to undesirable behavior and
so is not recommended. More information can be found in the reference
cited at the bottom of this document.
Specifies secondary structure elements to be excluded from analysis. Value is a
comma separated list of start residues, e.g. 8,25. Optional.
Make PDB files for statistically significant alignments. Each PDB file contains the query as chain A and the target as chain B. You may also use the script "makePDBfile.pl" to make alignment PDB files using the syntax makePDBfile.pl <path to alignment file, e.g. query_target.out>
Specifies local (rather than global) alignments
Specifies that alignments should be gzipped, or, if they've already been run, that they are currently gzipped.
To reanalyze results with different parameters, simply set the options as desired, the -a argument and, if necessary, the -r argument. FoldMiner will reanalyze data without repeating alignments. Note that results files may be overwritten.
When running alignments, FoldMiner produces the same output files as
described in the "LOCK 2 Output Files" section above. The "search.out" file described in that section is is placed in
the location specified in FoldMiner's second command line argument. If
this directory does not exist, it will be created if its parent
directory exists.
There are several FoldMiner output files that are placed in the same directory as the search.out file:
rasmol <full path for query PDB file>
source query_motif.spt
The numerical conservation values calculated by FoldMiner for each secondary structure element. Values range from 0 to 1, where higher values indicate a greater degree of conservation of the secondary structure element's position within the query and its structural homologs. The start and end residue numbers for each secondary structure element are also given.
You may wish to run FoldMiner again on the same set of alignments by excluding certain secondary structure elements using the -exclude option. Weakly conserved secondary structure elements can be excluded altogether to attempt to improve the specificity and sensitivity of the structural similarity search (i.e. to exclude false positives and recruit additional true positives). Alternatively, if a large number of secondary structure elements are weakly conserved (and therefore unlikely to be part of a conserved structural motif), you may wish to exclude the strongly conserved secondary structure elements in order to attempt to identify a second structural motif among the remaining ones. This process is described in more detail in the reference cited at the bottom of this document.