A P Francisco
Code and projects
A graph-based framework to prioritize regulatory players
involved in transcriptional responses within the regulatory network
of an organism, whereby every regulatory path containing genes of
interest is explored and incorporated into the analysis. This tool was
integrated in both YEASTRACT and regulatory
snapshots prototype. Check our paper for
algorithmic details and experimental results.
A prototype and yeast related files are available
for evaluation purposes. Source code is available upon request.
Assuming that you have a Java environment available and that all
files in the archive above are in your current directory, you can run
our prototype as follows:
[aplf@darkstar ~]$ java -cp yrank.jar com.yeastract.rank.core.RankIt2 \
0 0 0 1 \
Parameters are as follows:
- ORF to name mapping, only required if vertices in the network
file are named both with ORF and gene names. If you do not need it, then
null as parameter.
- Gene association file is used for extracting gene annotations.
If you do not need it or do not have it available, then just use
null as parameter.
- Network parameter allows the user to select the network under
study. In the example above, we use the regulatory network obtained from
the YEASTRACT repository.
Each line in the network file should describe a regulation as
TF_Gene Gene_Target W
tab separated. The weight
W may be omitted, taking the default
value 1.0. Note that TFs should be identified by their encoding gene.
- Edge weights definition is available for the yeast regulatory
network based on regulation evidence, which can be direct, indirect or
undefined. The first three weights are multiplied by the values in the last
three columns in the network file, respectively, and the last weight
is summed up to the result. In the example, since first three weights are 0,
i.e, evidence nature is ignored, each edge in the network takes a weight of
1.0, the value of the fourth weight parameter.
- Heat diffusion coefficient, or thermal conductivity, is a
parameter of the heat kernel ranking algorithm that controls the
diffusion rate (0.25 in the example above). A large value
causes heat to diffuse rapidly, giving preference to regulators located
farther away in the network. Conversely, a small value promotes a slow
diffusion, thus favoring nearby TFs.
- #Iterations is the number of steps to be used in the heat
kernel discrete approximation. For a heat diffusion coefficient lower
than 1, 100 steps as in the example assure an error below 0.005.
- Normalization allows the user to attenuate the bias toward
genes with a large number of regulatory associations, the final ranking
can be refined by degree-normalizing the scores, using as normalization
1 + out-weight (option
1 + in-weight (option
1 + weight sum (option
log(1 + out-weight) (option
log(1 + in-weight) (option
or log(1 + weight sum) (option
For no normalization we should use option
none as in the
example. When applied to the original network graph, these will mitigate
the effect of TFs regulating many genes, genes regulated by many TFs or
- Visualization, you can include the
in the class path and pass
true as parameter to view
the subnetwork where ranked TFs and genes are highlighted.
We provide also two input examples in the archive above,
where the latter includes expression weights for each gene in the
Please, let us know if you need help with anything.