Some notes on computational analysis of metabolic networks and path lengths within them, from Ma H, Zeng AP, Bioinformatics 2003, 19(2):270-7. PMID: 12538249
Background stuff: In previous work, metabolic networks had characteristics of small-world networks (most nodes with small connectivity, a few nodes very high connectivity – like WWW). Average path length (APL) for all metabolite pairs across 43 organisms was about the same, 3.2 (short). However, common intermediaries such as ATP are allowed as nodes, so going from one ATP-using reaction to another counts as “conversion,” even though the key metabolites from each reaction would take quite a few steps to mutually interconvert. The fact of in vivo irreversibility also badly breaks this network idea.
This study: Used KEGG LIGAND DB (has COMPOUND, REACTION, ENZYME). They corrected some mistakes in the KEGG info. Reaction direction is indicated on the KEGG displays, but not in the data (typical). They set a bunch of reactions as irreversible, yielding 2,000 irreversible reactions. When they make the graph representation, irreversible reactions are called arcs, reversible ones are called edges.
“Current” metabolites, which carry charge and functional groups, must be discounted – but not in some cases, then they actually are the key pathway constituents (and again, manual curation is necessary). They then looked at whether their network was still small world. Small world networks have a power-law distance between nodes, while random networks have a poisson distribution. They measure input and output values for each metabolite and find that it’s still a power-law distribution (small world). Having excluded the current metabolites, the top ten hub metabolites are pretty much the same across several organisms.
They then identified the shortest path length from one metabolite to all reachable metabolites by the ‘breadth first searching method’. Start with a metabolite M. All metabolites directly connected to M are in layer 1. All metabolites connected directly to a layer k, but not in any earlier layers are in layer k+1. The layer number is the path length from M to the metabolites in that layer (and maximal path length occurs when you run out of new layers). Given irreversible reactions, the path length from A to B is not always the path length from B to A. Using Glucose as an example: They were able to reach 386 metabolites from glucose (that feels like less than I’d expect, but that probably reflects KEGG’s limitations). Average path length is 7.68. The average path length for the whole metabolic network is 8.2. The network diameter (longest pathway length) is 23 for coli (based on KEGG).
Average path lengths vary a lot, and not exactly with network scale. Very small networks have a low AL, but that tends to be because they’re parasitic and many of their nodes have dropped out (where they piggyback on the parasitized organism). APL by domain: Eukarya (9.57), Archae (8.5), Bacteria (7.22), Bacteria w/out parasites (7.73). Diameters, same order: 33.1, 23.4, 20.6.
They suggest that it would be interesting to find the shortcuts that contribute to shorter APL in bacteria. I’m inclined to agree.