#import "../template.typ": *

= Exploring post-translational modification data
//    <keyword>identifications</keyword>
//    <keyword>post-translation modification</keyword>
//    <keyword>PTM</keyword>
//    <keyword>peptide vs mass spectrum match</keyword>
//    <keyword>PSM</keyword>
//    <keyword>data exploration</keyword>


This chapter describes in detail all the steps that the user accomplishes in
  their post-translational modification (PTM) data exploration session. 



== Setting the #xtandem; Run Presets for Phospho-proteomics<sect_phospho-identification-run-settings>

The very first step in starting a phospho-proteomics-based protein
    identification run is to configure the run so that the database search
    engine can model phopshorylated peptides on the basis of the sequence of the
    peptides in the database. That configuration step is started as described in
    @fig:fig_xtpcpp-phospho-identification-run-settings.



#figure(
caption: [#xtandem; settings window for a phospho-proteomics project],
[
#image("../assets/print-xtpcpp-phospho-identification-run-settings.png")
The #guilabel[Choose presets] option in this window
        allows the user to select an #application[#xtandem;] presets
        file to suit the protein identification run. In this example, the chosen
        presets file contains a number of configuration bits specific for a
        phospho-proteomics project.
]
)<fig_xtpcpp-phospho-identification-run-settings>



The configuration of #xtandem; needs to be performed by using the presets
  method, described in @sect_xtandem-parameter-presets and
  following sections.  The #guibutton[Edit] button next to the drop
  down list allows one to edit the presets that configure the handling of the
  database by the #application[#xtandem;] database search engine.
  The window that opens up upon clicking onto that button has two tabs that
  require the user's attention, as shown in @fig:fig_xtpcpp-phospho-presets-protein-tab and @fig:fig_xtpcpp-phospho-presets-residue-tab.

As described in @sect_loading_existing_xtandem_preset_configurations, it is
  possible to load existing presets in case these were defined already and might
  be reused for a repeated data exploration session on the same data set.



#figure(
caption: [Setting the project to be a phospho-proteomics project],
[
#image("../assets/print-xtpcpp-phospho-presets-protein-tab.png")
The major bit that need the user's attention in the #guilabel[Preset
        editor] window's #guilabel[Protein] tab is the
        #guilabel[Interpretation of peptide phosphorylation model] to
        #guilabel[yes]. The question mark icon on the side of that
        configuration option displays explanatory text on the right hand side of
        the window.
]
)<fig_xtpcpp-phospho-presets-protein-tab>


#figure(
caption: [Configuring the phosphorylated residues],
[
#image("../assets/print-xtpcpp-phospho-presets-residue-tab.png")
The phosphorylation events are not #guilabel[Fixed
        modifications] because it is not possible to predict that they
        will occur systematically. This is the reason why the phosphorylation
        events are configured as #guilabel[Potential modifications].
]
)<fig_xtpcpp-phospho-presets-residue-tab>



@fig:fig_xtpcpp-phospho-presets-residue-tab shows how to
  configure the potential phosphorylation of selected residues (tyrosine,
  threonine and serine).  Setting the #xtandem; parameters for phosphoproteomics
  analyses involves specifying the mass difference between unmodified and
  modified residues and the nature of the residue that might bear the
  modification. There are two different settings available:


/ Potential modifications: the #guilabel[Y] residue might be phosphorylated, with a net mass increment of #guilabel[79.96633] Da.
/ Potential modifications motif: the #guilabel[S] and #guilabel[T] residues might be phosphorylated (net mass increment of #guilabel[79.96633] Da) or be subject to a neutral phosphoric acid loss (net mass loss of #guilabel[97.9769] Da).



The reason why the potential phosphorylation of tyrosine (Y) is not mentioned
  along with the S and T modifications in the motif setting is that
  phosphorylated tyrosine residues do not suffer from phosphoric acid neutral
  loss upon collisionally activated dissociation (CID).  Phosphorylated serine
  and threonine residues are readily dephosphorylated upon CID. Hence, the
  requirement to configure both the phosphorylation and the dephosphorylation
  events as a PROSITE motif (see the question mark help).

  


#block-tip[
The loss of a phosphoric acid molecule #emph[(not ion)] is
    called a #quote[neutral loss]. By essence, the lost molecule cannot
    be detected, because it bears no  charge. However, the search software may
    detect that there might be a negative mass delta between calculated
    fragments bearing a phoshoryl group and the measured mass of product ions.
    In this eventuality, the software may deduce that the fragment was
    phosphorylated before the fragmentation occurred.

The mass spectrometer might be configured to monitor neutral phosphoric acid
    loss, or not. In some instruments, that workflow is not available; however, in
    these instruments a higher energy collisional dissociation#footnote[In
    Orbitrap analyzer-based instrument, HCD stands for #quote[higher-energy C-trap
    dissociation]. However, a more generic term is oft-used: #quote[higher
    energy collisional dissociation].] process elicits two
    fragmentation events: loss of a phosphoric acid molecule and peptide backbone
    dissociation. In this case, the database searching engine (#xtandem;, for us)
    is instructed to monitor the loss of phosphoric acid (that is, a neutral loss)
    on the product ions of the y ion series. In the best cases (best sequence
    coverage by the product ions), it is thus possible to locate the phosphoryl
    group on the peptide.
]


#block-caution(title: "Phospho-proteomics projects often involve label-based quantification")[
The #guilabel[Residue] tab of the #guilabel[#xtandem; preset
    editor] window shown above lists a number of fixed modifications
    that need an explanation, because we'll find them later in other figures.
    The #guilabel[57.02146\@C] modification is the carbamidomethylation
    of cysteine residues. The various 28, 32 and 36 mass increments on lysine
    residues are the di-methylation modifications with (32, 36) or without (28)
    heavy isotopes#footnote[Boersema #emph[et al.] 2009.
    Multiplex peptide stable isotope dimethyl labeling for quantitative
    proteomics. #emph[Nature Protocols].]. The
    exact same mass increments labelled with the #quote[\@\[] notation are
    the same modifications occurring on the N-terminus of the peptide.
] <caution_phosphoproteomics-with-label-based-quantitation>


Once all the settings have been validated, click the #guilabel[Run]
  button, in the same manner as described at @sect_running-properly-configured-xtandem-process, to actually
  start the database search process.

  

== Loading the Protein Identification Results<sect_phospho-loading-protein-identification-results>

The loading of the protein identification results for a phospho-proteomics
      project occurs in an identical fashion as for a non phospho-proteomics
      project, as described in @sect_loading-protein-identification-results. The main program window shows the same #guilabel[Identifications] tab as
      described before (see @fig:fig_xtpcpp-ptm-project-results-after-loading-main-window).



#figure(
caption: [The main window's #guilabel[Identifications] tab],
[
#image("../assets/print-xtpcpp-ptm-project-results-after-loading-main-window.png")
The #guilabel[Identifications] tab inside of the main program
            window after loading a PTMs-based project's identification results
            files. The #guibutton[View protein list] and #guibutton[View
            MS identification list] buttons perform exactly as described
            earlier. In order to trigger the PTMs-based data exploration session,
            click the #guibutton[View PTM islands] button.
]
)<fig_xtpcpp-ptm-project-results-after-loading-main-window>




When #i2mcq; has done loading all the results, the #guilabel[Protein
      list] window opens up as it usually does. This window, however, is
      not devoted to the exploration of phosho-proteomics data.  In order to start
      the exploration of phospho-proteomics data, it is necessary to display a
      window that lists all the post-translational modification islands
      (#quote[PTM islands]). That window shows up when the user clicks onto
      the #guibutton[View PTM islands] button shown in @fig:fig_xtpcpp-ptm-project-results-after-loading-main-window.

      


== Exploring PTM Islands Identification Data<sect_phospho-exploration-ptm-islands-identification-data>

The workflow involved in the scrutiny of phospho-proteomics data is
      comparable to that for a conventional proteomis project, as described in
      @sect_protein-list-window. There are differences,
      however, both in terminology and in the kind of data presented to the user
      that require a detailed review, performed in the following sections. The
      first step is to show the PTM islands, which is the step analogous to
      showing the identified proteins. The second step is to look into a given
      PTM island by scrutinizing the PTM peptides that make it; this step is
      analogous to showing the peptide list for a given protein.




=== The PTM Islands List Window<sect2_phospho-ptm-islands-list-window>

The #guilabel[PTM island list] window in @fig:fig_xtpcpp-ptm-islands-list-window displays, in a table view,
        all the PTM islands that were identified (see @sect_protein-identification-phospho-proteomics).


#footnote[The data presented in the examples below come from an experiment published
            in T. Y. Delormel #emph[et al. 2022]. In vivo
            identification of putative CPK5 substrates in #emph[Arabidopsis
            thaliana]. #emph[Plant Science]. DOI: #link("https://doi.org/10.1016/j.plantsci.2021.111121","https://doi.org/10.1016/j.plantsci.2021.111121").]

            
            


#figure(
caption: [PTM islands list window],
[
#image("../assets/print-xtpcpp-ptm-islands-list-window.png")
The #guilabel[PTM island list] table view has many columns
              that characterize each PTM island, as described in detail in the text
              below.
]
)<fig_xtpcpp-ptm-islands-list-window>



In the table view of all the PTM islands, each row corresponds to an island.
        It must be noted that multiple rows may appear as identical islands. This is
        not exactly true because, while the PTM islands appear to be exactly the
        same, these have been identified in proteins that are listed in the proteins
        database under different accession numbers (see the
        #guilabel[accession] column of the table view).

        


The columns in the table view hold post-translational modification-specific data described below.

/ Checked: if checked, the corresponding PTM island will be taken into account;

/ PTM island ID: unambiguous identifier for the PTM island. The nomenclature is important and follows a precise syntax according to this scheme: 
ptm$<$letter1$>$$<$number$>$$<$letter2$>$$<$number$>$$<$letter3$>$$<$number$>$,
              with the following meaning:

- $<$letter1$>$$<$number$>$: identifies the group of proteins that share PTM islands;
- $<$letter2$>$$<$number$>$: identifies the PTM island in the above group;
- $<$letter3$>$$<$number$>$: identifies the accession number of the protein in the group above.

/ accession: the accession number of the protein in
              the protein database file;
/ description: the description of the protein in
              the protein database file;
/ ptm positions: the number of positions in the
              PTM island that bear a post-translational modification;
/ spectrum: number of distinct spectra that allowed
              defining this PTM island;
/ PTM spectrum: number of distinct spectra that
              allowed the identification of more than one searched
              post-translational modifications;
/ sequence: number of unique peptide sequences
              found in this PTM island;
/ multi PTM: number of spectra that allowed the
              identification of more than one post-translational modification site;
/ start: the position of the PTM island on the
              protein sequence (the first residue of the PTM island);
/ length: the number of residues that encompass the PTM island.



In the same way as for the #guilabel[Protein list] window's table
        view of the identified proteins, the cells of the table are
        #quote[active]: clicking onto any cell triggers the opening of a
        window that provides details about the corresponding PTM island.





=== Delving Inside the PTM Island Identification Data<sect_delving-inside-ptm-island-identification-data>

The protein identifications list table view, as pictured in @fig:fig_xtpcpp-ptm-islands-list-window is actually an active matrix
        where the user can easily trigger the exposition of the data that yielded
        any PTM island identification element of the table. This is simply done by
        clicking onto any cell of the table at the row matching the PTM island for
        which scrutiny of the data is requested.

        
Depending on the column at which the mouse click happens, there might be two
        different windows showing up:
        
- Clicking onto a cell in either the #guilabel[accession] or
              #guilabel[description] column opens the #guilabel[Protein
              details] window shown in @fig:fig_xtpcpp-phospho-island-protein-details.
#figure(
caption: [Protein details window],
[
#image("../assets/print-xtpcpp-phospho-island-protein-details.png")
This window provides details about the protein identified as
                  bearing one or more post-translation modification(s). The peptides that
                  were matched as PSMs are highlighted in yellow. The
                  #guibutton[AT3G56150] provides a link to an external
                  resource#footnote[Here, the page #link("https://www.arabidopsis.org/servlets/TairObject?type=locus&amp;name=AT3G56150")[opens in the browser.]]. The other informational data
                  bits are self-explanatory.
]
)<fig_xtpcpp-phospho-island-protein-details>

- Clicking onto a cell in the #guilabel[PTM island ID] column
            opens the #guilabel[PTM peptide list] window for that specific
            island.  That window is displayed in both figures @fig:fig_xtpcpp-phospho-island-peptide-table-view-window-1 and @fig:fig_xtpcpp-phospho-island-peptide-table-view-window-2
            in the following section (@sect_phospho-ptm-peptides-list-window).




== The PTM Peptides List Window<sect_phospho-ptm-peptides-list-window>

Each PTM island is essentially defined by a set of related peptides that
    sport one or more post-translational modifications. The exploration of the
    PTM peptides thus involves looking into the different peptides of a
    PTM island. A number of characteristics of these peptides are described in
    the following text.

#figure(
caption: [PTM peptides list window (first columns)],
[
#image("../assets/print-xtpcpp-phospho-island-peptide-table-view-window-1.png")
Every PTM island has, associated to it, a list of
        post-translationally modified peptides. This figure illustrates the
        first columns of the table view that lists all the peptides making a PTM
        island. The contextual menu visible near peptide ID
        #guilabel[pepb87b23] will be detailed later (@sect_multi-site-modifications-ambiguities).
]
)<fig_xtpcpp-phospho-island-peptide-table-view-window-1>


#figure(
caption: [PTM peptides list window (last columns)],
[
#image("../assets/print-xtpcpp-phospho-island-peptide-table-view-window-2.png")
Every PTM island has, associated to it, a list of
      post-translationally modified peptides. This figure illustrates the last
      columns of the table view that lists all the peptides making a PTM
      island.
]
)<fig_xtpcpp-phospho-island-peptide-table-view-window-2>




The #guilabel[PTM peptide list] window contains a large set of
  peptide characteristics organized in a number of columns, as described below:

/ peptide ID: the unambiguous identity of the peptide;
/ sample: the file name of the sample in which the
        peptide was found and sequenced;
/ scan: the scan number of the MS/MS spectrum in
        which the precursor peptidic ion was fragmented;
/ RT: the retention time at which the peptide eluted;
/ charge: the charge of the peptidic ion;
/ sequence: the sequence of the peptide. Note that
        the residues that do (or might) bear a post-translational modification
        are printed in red color;
/ modifs: a semi-column-separated list of modified
        positions. The modification is identified by the net mass change that
        occurs upon the chemical modification. The #guilabel[1Y:32.06 4S:79.97
        15K:32.06] example indicates that the tyrosine at position 1
        is dimethylated, the serine at position 4 is phosphorylated and that
        the lysine at position 15 is methylated (the dimethyl modification was
        commented above).
/ start: the position of the peptide's first
        residue in the protein sequence;
/ length: the number of residues in the peptide;
/ top Evalue: best Evalue from those calculated for
        different peptide spectrum matches that occurred in a single
        fragmentation scan (see @sect_multi-site-modifications-ambiguities below for a
        thorough description of this situation);
/ theoretical MH+: the calculated mass of the #mh;
        peptidic ion;
/ delta MH+: the difference between the measured
        and the calculated #mh; masses;
/ top hyperscore: best Hyperscore from those
        calculated for different peptide spectrum matches that occurred in a
        single fragmentation scan (see @sect_multi-site-modifications-ambiguities below for a
        thorough description of this situation);
/ top PTM positions: positions of modified residues
        for the peptide having the best Evalue (see above) for the current scan;
/ observed PTM positions: space-separated list of all
        the modified positions found for the current scan.
        




//<!--Dans la feneêtre de liste des PTM peptide list, choisir ptma6a1a1 pour avoir le-->
//<!--cas de figure où on est certain d'une modif (phospho) mais pas de sa position (peptide pepb87b23 dans la liste).-->

//<!--Pour ce peptide on a trois lignes: il a donc été indentifié dans trois scans MS2-->
//<!--différents, on voit dans une ligne S rouge et S orange: il n'y a pas de-->
//<!--localisation sûre pour 1 événement d ephosphorylaiton. Une autre ligne montre un-->
//<!--seul S rouge, là la position est certaine au regard des données MS2.-->

//<!--Quand on clique sur une cellule de la lignes rouge/orange : sous-menu proposant-->
//<!--d'ouvrir les fenêtre peptide details pour l'un et l'autre scan (Ctrl pour ouviri-->
//<!--les deux séparément). Dans cet exemple on voit qu'il s'agit absolument du mêe-->
//<!--scan, donc on ne peut vriament pas choisir. (scan 4473).-->

//<!--Si l'on prend le scan 4187, alors on voit qu'en cliquant pas d'aleternative, pas-->
//<!--d'orange sur la ligne -> position établie.-->


=== Delving Inside the PTM Peptide Identification Data<sect_delving-inside-phospho-peptide-identification-data>

The #guilabel[PTM peptide list] table view (@fig:fig_xtpcpp-phospho-island-peptide-table-view-window-1) is
    actually an active matrix where the user can easily trigger the exposition o
    fthe data that yielded any PTM peptide identification element of the table.
    This is simply done by clicking onto any cell of the table at the row
    matching the peptide for which scrutiny of the data is requested.



==== The Peptide Details Window<sect_phospho-peptide-details-window>

Clicking a cell in the #guilabel[peptide ID] column opens the
      #guilabel[Peptide details] window shown in @fig:fig_xtpcpp-phospho-peptide-ms-ms-details-one-positive-phospho-site. Notice how this window is similar to the one described for
      conventional non PTM-based projects at @sect_peptide-details-window.


  
  
#figure(
caption: [Peptide details window],
[
#image("../assets/print-xtpcpp-phospho-peptide-ms-ms-details-one-positive-phospho-site.png")
This window displays a large amount of informational data bits that
            characterize the MS/MS spectrum #vs; peptide match (PSM) for the
            peptide ID that was clicked in the #guilabel[PTM peptide
            list] table view window. Almost all the information data
            bits shown in this figure are self-explanatory.
]
)<fig_xtpcpp-phospho-peptide-ms-ms-details-one-positive-phospho-site>

  
  

#block-tip()[
The nomenclature of the product ions in the MS/MS spectrum shown in the
        figure above is simple: when the ion under a given MS/MS spectrum peak is
        the result of a neutral loss of phosphoric acid, the ion is labelled
        #quote[y#emph[x];(-P#emph[z])], with
        #emph[x] being the index of the y ion, and
        (-P#emph[z]) indicating the loss of #emph[z]
        phosphoric acid neutral molecule(s).


When ions actually bear any number of post-translational modifications,
        these are not listed along with the ion series (#emph[b] or
        #emph[y]) and index text because that would quickly become
        unwieldy, from a graphical point of view.
]



==== Ambiguities on the post-translational modification sites <sect_multi-site-modifications-ambiguities>

The #guilabel[PTM peptide list] table view in @fig:fig_xtpcpp-phospho-island-peptide-table-view-window-1 lists
      the #guilabel[pepb87b23] peptide ID thrice because that peptide
      was identified in three different MS/MS scans (see the
      #guilabel[scan] column values for the three rows). What is
      interesting with these three rows, is that the peptide sequence showing
      the modification position(s) varies from a row to the other. In one row,
      the sequence only shows a single red-colored serine residue. That serine
      was positively identified as a phosphorylated residue. In the other two
      rows, one serine is red-colored and the other is orange-colored. For these
      two scans, the position of the phosphorylated residue is not ascertained:
      the MS/MS data only tell that it is possible that one or the other serine
      residue be phosphorylated (but not both).


      
#block-note()[
The coloring of these two serine residues is arbitrary in this case:
        since the PSMs that yielded this same sequence are absolutely identical
        from a score point of view (when opening the peptide details window, one
        can check that the Evalues are identical for the two PSMs), the software
        labels in red the first modification position and in orange all the
        remaining ones.
]



When clicking any cell of any one of the two rows where there is an
      ambiguity over the location of the phosphorylation event, &i2mcq; shows a
      contextual menu that displays the possible phosphorylation positions.
      The user can thus select one or the other of the menu items to display the
      details of the corresponding peptide.
      

The ambiguity about the phosphorylation site above, on #guilabel[Peptide
      ID] #guilabel[pepb87b23], is intereresting. Indeed, when
      the user selects the first item of the contextual menu and then (by keeping
      the #keycap[Ctrl] key pressed) selects the second item of that
      menu, the windows that open up show exactly the same informational data bits
      about the MS/MS scan: the two PSMs were calculated by the search engine on
      the basis of the very same MS/MS scan (@fig:fig_xtpcpp-phospho-peptide-ms-ms-details-two-psm-in-one-scan).
      This is one reason why both positions could not be ascertained: the engine
      says that one position is plausible (with a pretty low Evalue) and that the
      other position is equally plausible (with the exact same low Evalue).
      


  
#figure(
caption: [Two mass spectrum #vs; peptide matches (PSM) in a single MS/MS scan],
[
#image("../assets/print-xtpcpp-phospho-peptide-ms-ms-details-two-psm-in-one-scan.png")
This figure shows the two #guilabel[Peptide details]
            windows that are opened when the user clicks onto the first and the
            the second menu items of the contextual menu shown in the
            #guilabel[PTM peptide list] window shown in @fig:fig_xtpcpp-phospho-island-peptide-table-view-window-1. In
            both windows, the scan number is the same (#guilabel[4238]),
            demonstrating that both PSMs were computed from the same MS/MS
            spectrum acquisition.  Further, the same pretty low E-value should hint
            at a low reliability of the phosphorylation site
            identification.
]
)<fig_xtpcpp-phospho-peptide-ms-ms-details-two-psm-in-one-scan>

     

     
     


==== The XIC Viewer Window for the PTM Peptide Details <sect_xic-viewer-window-for-phospho-peptide-details>

One interesting feature of the #guilabel[Peptide details]
      window, is the #guilabel[XIC] button (top right) that triggers
      the calculation of an extracted ion current chromatogram, as pictured in
      @fig:fig_xtpcpp-xic-viewer-for-phospho-peptide-details.
      Although very similar to the window described at @sect_xic-viewer-window-for-peptide-details, this
      phospho-proteomics-specific version of the #guilabel[XIC viewer]
      has specific informational data bits described below.



  
#figure(
caption: [The extracted ion current (XIC) chromatogram viewer window],
[
#image("../assets/print-xtpcpp-phospho-xic-viewer-for-peptide-details.png")
The extracted ion current (XIC) chromatogram viewer shows the
            peptide sequence interspersed with post-translational modifications
            data. The modifications are listed as PSIMOD OBO accession number
            #guilabel[(MOD:xxxxx)] text elements#footnote[Montecchi-Palazzi #emph[et al.] 2008. PSI-MOD
                community standard for representation of protein modification
                data. #emph[Nature biotechnol.]].
]
)<fig_xtpcpp-xic-viewer-for-phospho-peptide-details>




The phospho-proteomics-specific informational data bits provided in the
      #guilabel[XIC viewer] window (see @fig:fig_xtpcpp-xic-viewer-for-phospho-peptide-details) are
      located right below the top border of the plot frame.  

