################################################# Header ####################################################

"""
This code is related to the following publication: Qi Y, Martin JW, Barb AW, 
Thélot F, Yan AK, Donald BR, Oas TG. J Mol Biol. 2018 Sep 14;430(18 Pt B):3412-3426.
doi: 10.1016/j.jmb.2018.06.022. 
We recommend this be used to test the hypothesis and results of the
interdomain orientation of ZLBT-C as presented in this manuscript.

In addition to this method, our lab has developed other computational tools
that can be accessed and used for related protein studies. We encourage users
to explore further applications of these tools in structural biology. Available
at https://github.com/donaldlab.


pdb2quat 1.21
Copyright (C) 2024 Bruce Donald Lab, Duke University
pdb2quat is free software: you can redistribute it and/or modify it under the
terms of the GNU Lesser General Public License as published by the Free
Software Foundation, either version 3 of the License, or (at your option) any
later version. The two parts of the license are attached below.  There are
additional restrictions imposed on the use and distribution of this open-source
code, including:
• The header from Sec. 1 must be included in any modification or extension of
the code;
• Any publications, grant applications, or patents that use pdb2quat must state
that pdb2quat was used, with a sentence such as "We used the open-source
pdb2quat software to calculate...."
• Any publications, grant applications, or patents that use pdb2quat must cite
our paper. The citations for the various different modules of our software are
described in Sec. 2.


Section 1: Source Header

pdb2quat 1.21
Copyright (C) 2024 Bruce Donald Lab, Duke University
pdb2quat is free software: you can redistribute it and/or modify it under the
terms of the GNU Lesser General Public License as published by the Free
Software Foundation, either version 3 of the License, or (at your option) any
later version.  pdb2quat is distributed in the hope that it will be useful, but
WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for
more details <http://www.gnu.org/licenses/>.  There are additional restrictions
imposed on the use and distribution of this open-source code, including:
    (A) This header must be included in any modification or extension of the
    code.
    (B) Any publications, grant applications, or patents that use pdb2quat must
    state that pdb2quat was used, with a sentence such as "We used the
    open-source pdb2quat software to calculate...." The citation for the
    various different modules of our software, together with a complete list of
    requirements and restrictions, are found in the document 'license.pdf'
    enclosed with this distribution.

Contact Info:
Bruce Donald
Duke University
Department of Computer Science
Levine Science Research Center(LSRC)
Durham
NC 27708-0129
USA
email: www.cs.duke.edu/brd/
<signature of Bruce Donald>, Sep 4, 2024
Bruce Donald, Professor of Computer Science


Section 2: Citation Requirements:
The citation requirement for this software is:
Continuous Interdomain Orientation Distributions Reveal Components of Binding Thermodynamics.
Qi Y, Martin JW, Barb AW, Thélot F, Yan AK, Donald BR, Oas TG. 
J Mol Biol. 2018 Sep 14;430(18 Pt B):3412-3426.
doi: 10.1016/j.jmb.2018.06.022.

////////////////////////////////////////////////////////////////////////////////////////////
// pdb2quat.py
//
//  Version:           1.21
//
//
// authors:
//    initials    name              organization                email
//   ---------   --------------     ------------------------    ------------------------------
//     EHC        Edward Cheng       Duke University            edward.cheng@duke.edu
//     ACM        Allen C. McBride   Duke University            allen.mcbride@duke.edu
//
// Collaborator:
//    Lab         name              organization                email
//   ---------   ---------------    ------------------------    ------------------------------
//    Oas Lab    Prof. Terry Oas       Duke University            oas@duke.edu
//
////////////////////////////////////////////////////////////////////////////////////////////

The logic behind the frame calculation was provided by Terry Oas. The majority
of the code was written by Edward Cheng and then improved by Allen McBride.
This is an original code to calculate a rotation matix and its corresponding
quaternion based on protein structure. This is helpful in determining 
protein dynamics and flexibility.  
"""


################################### User Guide ########################################################

"""
Prerequisites (can be installed using conda or pip):
    Biopython (https://biopython.org/)
    Numpy (https://numpy.org/)
    Scipy (https://scipy.org/)

This script writes, on standard out, two descriptions of a rotation: as an
orthonormal matrix and as a unit quaternion.  This rotation can be thought of
in a few equivalent ways:
 * As a rotation of the axes aligned with the ZLBT domain onto the axes aligned
 with the C domain frame, in the frame aligned with the ZLBT domain.
 Equivalently, the columns of the rotation matrix are simply the axes aligned
 with the C domain, in the ZLBT-aligned frame. 
 * As a mapping that takes a vector's coordinates with respect to the C domain
 frame to coordinates for this vector with respect to the ZLBT domain frame.

Previous versions of this code also described a rotation of the axes of the
ZLBT frame to the axes of the C domain frame; however, this rotation was given
with respect to the global frame defined by the coordinates of a given PDB
file. To recover the old behavior and compute the rotation with respect to the
PDB coordinate frame, one can simply reverse the order of operands in the final
calculation of the rotation matrix. That is, change the line:
    "Rotation_matrix = Reference_frame.T @ Rotating_frame"
back to its old form, 
    "Rotation_matrix = Rotating_frame @ Reference_frame.T".

Usage: python pdbToQuat.py FILE [--chainid CHAINID] [--ranges R0 R1 R2 R3 R4 R5 R6 R7] [--axesname STRING]

    FILE should contain a list of PDB files, one per each line

    CHAINID is, optionally, the chain identifier of the relevant domains (default is 'A')

    R0 through R7 are integers, as follows:
        R0: Start of reference domain helix II
        R1: End of reference domain helix II
        R2: Start of reference domain helix III
        R3: End of reference domain helix III
        R4: Start of rotated domain helix II
        R5: End of rotated domain helix II
        R6: Start of rotated domain helix III
        R7: End of rotated domain helix III
    If not specified, R1 through R8 have the default values 36, 50, 70, 85,
    112, 125, 129, 143.

    STRING is, optionally, the base name of a Python script to be written for
    each file which, when run in PyMol after loading the corresponding
    structure, will draw reference frame and rotated frame axes superimposed on
    that structure. For example, if STRING is 'hello' and if one of the
    structures is named 'world.pdb', then this script will write another script
    named 'hello_world.py'. The user could then load world.pdb in PyMol, then
    enter 'run hello_world.py' at the PyMol command prompt to visualize frame
    axes.

Note: Each PDB structure should contain a two-domain protein structure.  Here,
we only choose to analyze Helix II and Helix III for both domains because Helix
I has been experimentally proven to have intradomain motion (Deis, et al. Structure 22:1467 (2014)).  
"""

import sys
import numpy as np
import scipy
from Bio.PDB import PDBParser
from argparse import ArgumentParser
from pathlib import Path

def toScalarFirst(q):
    return np.array([q[3], q[0], q[1], q[2]])

def toScalarLast(q):
    return np.array([q[1], q[2], q[3], q[0]])

def pdbToRot(pdbFilename, ranges, chainid, axesFilename = None, debug = False):
    
    parser = PDBParser(QUIET=True)
    structure = parser.get_structure('structure', pdbFilename)
    model = structure[0]  # assuming first model
    chain = model[chainid]

    # Define the coordinate frame
    def define_coordinate_frame(V2, V23, Cterm_II, Nterm_III, Cterm_III):

        towardNtermOfIII = chain[Nterm_III]["CA"].get_coord() - chain[Cterm_III]["CA"].get_coord()
        if np.dot(V2, towardNtermOfIII) >= 0:
            z_prime = V2
        else:
            z_prime = -V2

        towardIIfromIII = chain[Cterm_II]["CA"].get_coord() - chain[Nterm_III]["CA"].get_coord()
        zp_cross_V23 = np.cross(z_prime, V23)
        if np.dot(zp_cross_V23, towardIIfromIII) >= 0:
            y_prime = zp_cross_V23
        else:
            y_prime = -zp_cross_V23

        x_prime = np.cross(y_prime, z_prime)

        primeFrame = np.column_stack((x_prime, y_prime, z_prime))
        coords = np.column_stack((chain[Nterm_III]["CA"].get_coord(), chain[Cterm_III]["CA"].get_coord()))
        coordsInPrimeFrame = primeFrame.T @ coords
        assert coordsInPrimeFrame[2, 0] >= coordsInPrimeFrame[2, 1]

        return x_prime, y_prime, z_prime

    def writePymolAxes(stream, label, axisSuffix, color, center, x, y, z):
        def axisEndpoint(axis):
            length = 20
            return length * axis + center

        def oneCyl(axis, col):
            colsubstr = f'{col[0]}, {col[1]}, {col[2]}, '
            coords = np.concatenate((center, axisEndpoint(axis)))
            return f'CYLINDER, {np.array2string(coords, separator = ", ").rstrip("]").lstrip("[")}, 0.3, {colsubstr} {colsubstr}'

        print(f'axisobj_{label} = [{oneCyl(x, color)} {oneCyl(y, color)} {oneCyl(z, color)}]', file = stream)
        print(f'cyl_text(axisobj_{label}, plain, {np.array2string(axisEndpoint(x), separator = ", ")}, \'x{axisSuffix}\', 0.2, color={color}, axes=[[3,0,0],[0,3,0],[0,0,3]])', file = stream)
        print(f'cyl_text(axisobj_{label}, plain, {np.array2string(axisEndpoint(y), separator = ", ")}, \'y{axisSuffix}\', 0.2, color={color}, axes=[[3,0,0],[0,3,0],[0,0,3]])', file = stream)
        print(f'cyl_text(axisobj_{label}, plain, {np.array2string(axisEndpoint(z), separator = ", ")}, \'z{axisSuffix}\', 0.2, color={color}, axes=[[3,0,0],[0,3,0],[0,0,3]])', file = stream)
        print(f'cmd.load_cgo(axisobj_{label}, \'{label}\')', file = stream)

    # Function to extract the coordinates of the backbone atoms
    def extract_backbone_coords(start_residue, end_residue):
        coords = []
        for i in range(start_residue, end_residue + 1):
            residue = chain[i]
            for atom_name in ["N", "CA", "C"]:
                coords.append(residue[atom_name].get_coord())
        return np.array(coords)

    # Extracting the back bone atom coordinates for the reference and rotating domain
    # Reference H2
    Reference_H2_coords = extract_backbone_coords(ranges[0], ranges[1]) 
    # Reference H3
    Reference_H3_coords = extract_backbone_coords(ranges[2], ranges[3])

    # Rotating H2
    Rotating_H2_coords = extract_backbone_coords(ranges[4], ranges[5])
    # Rotating H3
    Rotating_H3_coords = extract_backbone_coords(ranges[6], ranges[7])

    # Combine H2 and H3 together
    Reference_H2_H3_coords = np.vstack((Reference_H2_coords, Reference_H3_coords))
    Rotating_H2_H3_coords = np.vstack((Rotating_H2_coords, Rotating_H3_coords))

    # Centering the coordinates by subtracting the mean 
    Reference_H2_H3_centered = Reference_H2_H3_coords - np.mean(Reference_H2_H3_coords, axis=0)
    Rotating_H2_H3_centered = Rotating_H2_H3_coords - np.mean(Rotating_H2_H3_coords, axis=0)

    # Perform SVD
    Re_23_U, Re_23_W, Re_23_VT = np.linalg.svd(Reference_H2_H3_centered)
    Ro_23_U, Ro_23_W, Ro_23_VT = np.linalg.svd(Rotating_H2_H3_centered)

    if debug:
        print("Re_23_VT", Re_23_VT)
        print("Ro_23_VT", Ro_23_VT)

    # Extract the third column from Vs
    Re_VT_H23 = Re_23_VT.T[:, 2]
    Ro_VT_H23 = Ro_23_VT.T[:, 2]

    # Centering the coordinates by subtracting the mean (H2 only)
    Reference_H2_centered = Reference_H2_coords - np.mean(Reference_H2_coords, axis=0)
    Rotating_H2_centered = Rotating_H2_coords - np.mean(Rotating_H2_coords, axis=0)

    # Perform SVD
    Re_2_U, Re_2_W, Re_2_VT = np.linalg.svd(Reference_H2_centered)
    Ro_2_U, Ro_2_W, Ro_2_VT = np.linalg.svd(Rotating_H2_centered)

    if debug:
        print("Re_2_VT", Re_2_VT)
        print("Ro_2_VT", Ro_2_VT)

    # Extract the first column from Vs
    Re_VT_H2 = Re_2_VT.T[:, 0]
    Ro_VT_H2 = Ro_2_VT.T[:, 0]

    if debug:
        print("Re_VT_H2", Re_VT_H2)
        print("Ro_VT_H2", Ro_VT_H2)

    # Calculate the new coordinate frame axes for the domains
    Reference_x_prime, Reference_y_prime, Reference_z_prime = define_coordinate_frame(Re_VT_H2, Re_VT_H23, ranges[1], ranges[2], ranges[3])
    Rotating_x_prime, Rotating_y_prime, Rotating_z_prime = define_coordinate_frame(Ro_VT_H2, Ro_VT_H23, ranges[5], ranges[6], ranges[7])

    if axesFilename:
        with open(axesFilename, mode = 'w') as axesFile:
            print('from pymol.cgo import *', file = axesFile)
            print('from pymol import cmd', file = axesFile)
            print('from pymol.vfont import plain', file = axesFile)
            writePymolAxes(axesFile, 'Reference', '', [1, 1, 1], np.mean(Reference_H2_H3_coords, axis=0), Reference_x_prime, Reference_y_prime, Reference_z_prime)
            writePymolAxes(axesFile, 'Rotating', 'p', [0.5, 0, 0], np.mean(Reference_H2_H3_coords, axis=0), Rotating_x_prime, Rotating_y_prime, Rotating_z_prime)
            writePymolAxes(axesFile, 'Rotating2', 'p', [1, 0, 0], np.mean(Rotating_H2_H3_coords, axis=0), Rotating_x_prime, Rotating_y_prime, Rotating_z_prime)

    if debug:
        print("Reference domain frame:")
        print("X' axis:", Reference_x_prime)
        print("Y' axis:", Reference_y_prime)
        print("Z' axis:", Reference_z_prime)

        print("\nRotating domain frame:")
        print("X' axis:", Rotating_x_prime)
        print("Y' axis:", Rotating_y_prime)
        print("Z' axis:", Rotating_z_prime)

    # Using the coordinate frames, calculate rotation matrix from the reference
    # coordinate system to the rotating coordinate system, in the reference
    # coordinate system frame.

    Reference_frame = np.vstack([Reference_x_prime, Reference_y_prime, Reference_z_prime]).T
    Rotating_frame = np.vstack([Rotating_x_prime, Rotating_y_prime, Rotating_z_prime]).T

    Rotation_matrix = Reference_frame.T @ Rotating_frame

    if debug:
        print("Reference frame, rotated by computed matrix:")
        print(Rotation_matrix @ Reference_frame)

    return Rotation_matrix, Reference_frame

if __name__ == '__main__':

    parser = ArgumentParser()
    parser.add_argument("filename", type=str)
    parser.add_argument("--chainid", default='A', type=str, required=False)
    parser.add_argument("--ranges", default=[24, 37, 53, 68, 95, 108, 112, 126], type=int, nargs=8, required=False)
    parser.add_argument("--axesname", default=None, type=str)
    parser.add_argument("--debug", default=False, action='store_true')
    args = parser.parse_args()

    with open(args.filename) as pdblist:
        for pdbnameline in pdblist:
            pdbpath = Path(pdbnameline.rstrip())
            axesname = f'{args.axesname}_{pdbpath.stem}.py' if args.axesname else None
            Rotation_matrix, rf = pdbToRot(pdbpath, args.ranges, args.chainid, axesname, args.debug)
            print(f'Rotation matrix for {pdbpath.name}:\n', Rotation_matrix)

            # Convert the rotation matrix to a quaternion
            spRotMat = scipy.spatial.transform.Rotation.from_matrix(Rotation_matrix)
            rot_quat = toScalarFirst(spRotMat.as_quat())
            print(f'Quaternion for {pdbpath.name} (scalar first format):', rot_quat)
            print()
