Network C libraries - RGraph
Source code of the C libraries for a variety of network calculations, including community finding using simulated annealing, network cartography, and link and network reliability.
Description
This package includes the RGraph libraries, the C libraries for complex network analysis developed mostly by Roger Guimera, with collaboration from the SEES:lab and the Amaral Lab. Some executables, built from the libraries, are also included.
For the libraries to compile, you will NEED TO INSTALL, first the GNU Scientific Libraries (GSL), available from http://www.gnu.org/software/gsl/
In a Unix-like system, you can install the RGraph libraries and the executables by uncompressing the tarball (tar -xzvf rgraph-x.x.x.tar.gz) and running the usual stuff from the rgraph-version directory:
cd rgraph-version
./configure
./make
./make install
(In a Windows system, you will first need to install some sort of "Unix emulation." we have successfully compiled the libraries using either Cygwin or MinGW.)
This will install the libraries in your_default_lib_directory/rgraph and the executables in your_default_bin_directory. To install in a different directory run
./configure --prefix=path_to_install_directory
instead of just "./configure". For other configure options run:
./configure -h
You can uninstall the whole thing by running
./make uninstall
from the installation directory.
You can also test that everything is working by running
./make check
from the installation directory.
librgraph
librgraph is the library itself. You can use it to build your own network analysis programs. Sorry, as of now no documentation is available, but you may want to take a look at the header files and try to figure things out.
netcarto
Given a network, the program netcarto identifies modules---i.e. densely connected groups of nodes in the network---and classifies nodes according to their roles (see references below for precise definitions).
In case you use the results of the program in a publication, please cite the following papers:
Guimera, R. & Amaral, L.A.N., Functional cartography of complex metabolic networks, Nature 433, 895-900 (2005).
Guimera, R. & Amaral, L.A.N., Cartography of complex networks: modules and universal roles, J. Stat. Mech.-Theory Exp., art. no. P02001 (2005).
For the comparison to randomized networks, please cite:
Guimera, R., Sales-Pardo, M. & Amaral, L.A.N., Modularity from fluctuations in random graphs and complex networks, Phys. Rev. E 70, art. no. 025101 (2004).
Input parameters
To run the program, type netcarto.exe in the command line or double-click on the icon in Windows. The program will prompt for the following parameters:
-
Seed for the random number generator: Must be a positive integer. Since the module identification algorithm is stochastic, different runs will yield, in general, slightly different different modules. Two runs with the same seed, though, should give the exact same results.
-
Name of the network file: Name of the file that contains the network. The file must be a list of links with the same format as the test network provided below.
-
Iteration factor (f): At each temperature of the simulated annealing (SA), the program performs fN^2 individual-node updates (involving the movement of a single node from one module to another) and fN collective updates (involving the merging of two modules and the split of a module). The number "f" is the iteration factor. Large values of f (1 or larger) will result, in general, in better results (higher modularities) and longer execution times. The recommended range for f is [0.1, 1], although smaller values may be needed for large and/or dense networks. Note, also, that a minimum number of iterations is imposed at each temperature, so that when f is very small, the minimum number will be used instead of fN^2 or fN.
-
Cooling factor (c): After the desired number of updates is done at a certain temperature T, the system is cooled down to a new temperature T'=cT, where c is the cooling factor. the cooling factor must be strictly larger than 0 and strictly smaller than 1. In general, values close to one will result in better results and longer execution times. Recommended values of the cooling factor f are [0.990, 0.999], although smaller values (0.95 or even 0.9) may be needed for large and/or dense networks.
-
Number of randomizations: After modules and roles are identified in the original network, there is the option to calculate the value of the modularity in a random graph with the same degree (connectivity) distribution as the original network. As discussed in [3], this test is necessary to establish whether the modular structure of the original network is significant or not. Calculation of the modularity for each random network will take approximately the same time as for the original network. If you do not want to run any randomization, just enter 0 here.
Program output
After entering these parameters, the algorithm will start to identify the modules in the network. As the SA proceeds, the program prints three columns on the screen, which indicate the inverse of the temperature, the modularity at that temperature, and the temperature itself, respectively. This provides you with a fast way to check if the process is too slow or, conversely, if it is fast and the accuracy can be increased. The program will stop when the modularity remains unchanged during 25 different temperatures. In general, finding a good partition requires decreasing the temperature several orders of magnitude. Thus, as a rule of thumb, if the time between one temperature and the next is larger than a second, the program will likely take days to complete (this, however, will also depend on the cooling factor). Of course, there is nothing wrong with long runs, as long as you are willing to wait!
When the SA for the original network finishes, the program calculates the role of each node almost instantly and outputs the following files:
-
network.net: a Pajek file containing the giant component of the network (for information on Pajek, visit http://vlado.fmf.uni-lj.si/pub/networks/pajek/).
-
modules.clu: a Pajek partition containing the modules as identified by the algorithm.
-
roles.clu: a Pajek partition containing the roles as identified by the algorithm.
-
modules.dat: A text file containing some basic information about the modules (can be edited with any text editor such as NotePad, or imported in Excel as a csv file). The format of the file is as follows. Each line corresponds to a different module. The first number is an ID number for the module, mostly irrelevant. The second is the number of nodes in the module. The third is the total number of links in the module, the fourth the number of within-module links, and the fifth the number of links from this module to other modules (of course, the third column must be equal to the sum of the fourth and fifth columns). Then there is a "---" and the next columns correspond to the list of nodes in the module. The last line of the file contains the value of the modularity for this partition.
-
roles.dat: A text file containing some basic information about the roles (can be edited with any text editor such as NodePad, or imported in Excel as a csv file). The format of the file is as follows. Each line corresponds to a different role. The first number is the role number. The second is the number of nodes with that role. The third is the total number of links of nodes with that role, the fourth the number of within-role links, and the fifth the number of links from this role to other roles (of course, the third column must be equal to the sum of the fourth and fifth columns). Then there is a "---" and the next columns correspond to the list of nodes with that role.
-
node_prop.dat: A text file with four columns. The first one is the number of the node. The second is the degree (number of links) of the node. The third is the participation coefficient. The fourth one is the within-module relative degree.
After all of this, the program starts to calculate the modularity for as many randomizations as you have selected, printing on the screen, as before, the inverse of the temperature, the modularity, and the temperature. Once all randomizations are done, the program creates one last file that contains the modularity of the original network, the average modularity of the randomized networks, and the standard deviation of the modularity of the randomized networks.
netcarto_cl
netcarto_cl is equivalent to netcarto, but arguments are passed directly from the command line (so that it is easier to script around the executable). Additionally, netcarto_cl allows you to fix the initial temperature. Please read the netcarto documentation to learn more about netcarto_cl.
The arguments need to be passed in the following order:
netcarto_cl net_file_name seed T_ini iteration_factor cooling_factor #_randomizations
T_ini, iteration_factor, and cooling_factor can be set to -1 to use the defaults (2/size_of_network, 1.0, and 0.995, respectively).
bipartmod
Given a bipartite network, the program returns a partition of the nodes in one of the sets of nodes (the 'actors' or the 'teams') obtained according to the algorithm described in:
Guimera, R., Sales-Pardo, M. & Amaral, L.A.N., Module identification in bipartite and directed networks, Phys. Rev. E 76, 036102 (2007)
In case you use the results of the program in a publication, please cite the paper above.
Input parameters
The program will prompt for the following parameters:
-
Seed for the random number generator: Must be a positive integer. Since the module identification algorithm is stochastic, different runs will yield, in general, slightly different different modules. Two runs with the same seed, though, should give the exact same results.
-
Name of the network file: Name of the file that contains the network. The file must be a list of links with the format of the test network provided below.
-
Iteration factor (f): At each temperature of the simulated annealing (SA), the program performs fN^2 individual-node updates (involving the movement of a single node from one module to another) and fN collective updates (involving the merging of two modules and the split of a module). The number "f" is the iteration factor. Large values of f (1 or larger) will result, in general, in better results (higher modularities) and longer execution times. The recommended range for f is [0.1, 1], although smaller values may be needed for large and/or dense networks. Note, also, that a minimum number of iterations is imposed at each temperature, so that when f is very small, the minimum number will be used instead of fN^2 or fN.
-
Cooling factor (c): After the desired number of updates is done at a certain temperature T, the system is cooled down to a new temperature T'=cT, where c is the cooling factor. the cooling factor must be strictly larger than 0 and strictly smaller than 1. In general, values close to one will result in better results and longer execution times. Recommended values of the cooling factor f are [0.990, 0.999], although smaller values (0.95 or even 0.9) may be needed for large and/or dense networks.
-
Invert (0 or 1): If 0 (conversely, 1), the program will identify modules in the first (second) set of nodes, that is, the first (second) column in the input file.
Program output
The program returns a file modules_bipart.dat, which is formally identical to the modules.dat described above for netcarto.
reliability
Given a network observation, the programs in reliability:
1) reliability_links: evaluate the reliability of links
2) reliability_reconstruct: reconstruct the network
In case you use the results of the program in a publication, please cite the following paper:
Guimera, R. & Sales-Pardo, M., Missing and spurious interactions and the reconstruction of complex networks, Proc. Natl. Acad. Sci. USA 106, 22073-22078 (2009).
Input parameters
The programs take two arguments:
-
Name of the network file: Name of the file that contains the network. The file must be a list of links with the format of the test network provided below.
-
Seed for the random number generator: Must be a positive integer. Since the reliability algorithms are stochastic, different runs will yield, in general, slightly different different results. Two runs with the same seed, though, should give the exact same results.
Program output
The "links" program generates two files: missing.dat and spurious.dat. Each of these files has the format:
score12 n1 n2
score13 n1 n3
...
missing.dat contains all scores for links that are not observed in the network. High scores in missing.dat correspond to links that are likely to be missing.
spurious.dat contains all scores for links that are observed in the network. Low scores in spurious.dat correspond to links that are likely to be spurious.
The "reconstruct" program returns a file net_reconstructed.dat with the reconstructed network.
multiblock
Given a single-layer network observation, the programs in multiblock return the reliability of all node pairs using the model described in our PRX 2016 publication (see below):
1) reliability_links_mb: evaluate the reliability of links using the intersection of two Stochastic Block Models
2) reliability_links_mb_gibbs: evaluate the reliability of links using the intersection of two Stochastic Block Models, with a faster gibbs sampling
3) reliability_links_mb_OR: evaluate the reliability of links using the union of two Stochastic Block Models
4) reliability_links_mb_gibbs_OR: evaluate the reliability of links using the union of two Stochastic Block Models, with a faster gibbs sampling
In case you use the results of the program in a publication, please cite the following paper:
Valles-Catala, T., Massucci, F.A., Guimera, R. & Sales-Pardo, M., Multilayer stochastic block models reveal the multilayer structure of complex networks, Phys. Rev. X 6 , 011036 (2016).
Input parameters
The programs take two arguments:
-
Name of the network file: Name of the file that contains the network. The file must be a list of links with the format of the test network provided below.
-
Seed for the random number generator: Must be a positive integer. Since the reliability algorithms are stochastic, different runs will yield, in general, slightly different different results. Two runs with the same seed, though, should give the exact same results.
Program output
The program generates one file: "Name of the network file".AND_scores (OR_scores in case of the OR programs) with all scores for links that are not observed in the network. File has the format:
score12 n1 n2
score13 n1 n3
...
Utils
Additionally, a few utility programs are also compiled and installed.
-
countlinks netA: count the number of links in a network.
-
netcompare netA netA: compares two networks.
-
netprop netA: print a number of properties of a network.
-
netrandomize netA: randomize an undirected unweighted network and print result to standard output.