Update README.md

d66d7317 · Fuchs, Pim · c00ff4e7 · d66d7317
Commit d66d7317 authored 7 years ago by Fuchs, Pim
--- a/README.md
+++ b/README.md
+# Requirements
+
+
+Make sure you have the following Python packages installed:
+* TensorFlow
+* NumPy
+* sk-learn
+* MatplotLib
+* Weblogo
+
+# Getting started with biological data
+
+## Pre-processing
+In Supp-C.txt and Supp-D.txt you will find the positive (interacting) and negative (non-interacting) samples, respectively. These are the files provided by Pan et al. [1].
+Run the script create_data.py to convert these .txt files to create separate TFRecords, which are used as input by model. Two folders should appear: one name Fasta, containing human-readable versions of the separate sequences; one named Records, containing the binary TFRecords. Additionally, the test and training set are created as .txt files. These files contain entries of the format 'seq_id_1 seq_id_2 label', where the sequence ids correspond to TFRecord files in the Records directory.
+
+## Running the network
+After creating the TFRecords, training set and test set, you can run 'runs.py'. Should you choose to do so, you can edit the hyperparameter settings first, which are found in the main() function. After training, a folder is created in the 'Results' directory, containing the following.
+
+* The weights of the network
+* Tensorflow model of the network, which can be used to load the model again at another time
+* The convolution filters plotted as WebLogos
+* A .txt file containing performance metrics (AUC-ROC, accuracy, specificity, precision)
+* A .txt file containing the settings used to train the model
+* A prediction heatmap; a heatmap that indicates which filters are relevant for classifying the samples in the test set.
+* ROC plot
+
+
+# References
+
+[1] Pan, X. Y., Zhang, Y. N., and Shen, H. B. (2010). Large-
+scale prediction of human protein-protein interactions from amino acid
+sequence based on latent topic features. Journal of Proteome Research,
+9(10):4992–5001.
\ No newline at end of file