Make sure you have the following Python packages installed:
* TensorFlow
* NumPy
* sk-learn
* MatplotLib
* Weblogo
# Getting started with biological data
## Pre-processing
In Supp-C.txt and Supp-D.txt you will find the positive (interacting) and negative (non-interacting) samples, respectively. These are the files provided by Pan et al. [1].
Run the script create_data.py to convert these .txt files to create separate TFRecords, which are used as input by model. Two folders should appear: one name Fasta, containing human-readable versions of the separate sequences; one named Records, containing the binary TFRecords. Additionally, the test and training set are created as .txt files. These files contain entries of the format 'seq_id_1 seq_id_2 label', where the sequence ids correspond to TFRecord files in the Records directory.
## Running the network
After creating the TFRecords, training set and test set, you can run 'runs.py'. Should you choose to do so, you can edit the hyperparameter settings first, which are found in the main() function. After training, a folder is created in the 'Results' directory, containing the following.
* The weights of the network
* Tensorflow model of the network, which can be used to load the model again at another time