#import "../template.typ": *

= The main program window
//    <keyword>graphical user interface</keyword>
//    <keyword>GUI</keyword>


Proteomics data explorations, with #i2mcq;, entail, for a large part, the
  following steps:

- Configuration of the #xtandem; external software that runs the database
        searches (producing peptide vs mass spectrum
        matches#sym.dash.em;PSMs#sym.dash.em;, leading to the peptide identifications and
        ultimately to protein identifications);
- Configuration of the protein database files (both the organism-specific
        protein databases and optional contaminant-containing databases);        
- Loading of the mass spectrometry data acquisition files (the mzML format
        is recommended);
- Running #xtandem; from inside of #i2mcq;;        
- Loading of the identification results produced during the previous step;
  #block-note[#i2mcq; can also handle peptide vs spectrum matches data (peptide identification data) from other software with the following formats:
- mzIdentML;
- pepXML;
- Mascot DAT files
]
- Relentless scrutiny of the peptide identification results. Optional
        modification of the results;
- Protein inference, that is, protein identification on the basis of the
        peptide identifications. #i2mcq; implements a protein grouping
        algorithm, as described in @fig:fig_xtpcpp-protein-inference-grouping, that leads to
        consolidated protein identifications. The program has an interface
        geared towards the tweaking of the protein grouping process so as to let
        the user in full control of the stringency with which the protein
        identifications list is ultimately generated.


In this chapter, #i2mcq;'s main window's user interface is described in
  detail, in particular in the way it is a starting point for the main tasks
  briefly mentioned above.



== Starting a new #i2mcq; working session<sect_starting-xtpcpp-working-session>

To start a  session, run #i2mcq; and the main program windows shows up as
    described in @fig:fig_xtpcpp-main-program-window.
    

#include "a4a1_figure_main_window.typ"


The main program window contains three buttons that start the following main tasks:

- #guilabel[Run #xtandem; identifications]. See @sect_running-xtandem-identifications.
- #guilabel[Load identification results (mzIdentML, pepXML, Mascot, #xtandem;)]. See @sect_loading-protein-identification-results.
- #guilabel[Load an #i2mcq; project]. See @sect_loading-xtandempipeline-projects.



== Running #xtandem; identifications<sect_running-xtandem-identifications>


To run #xtandem;-based identifications, click onto the #guilabel[Run X!Tandem
    identifications] button. This triggers the opening of the window
    pictured in @fig:fig_xtpcpp-xtandem-configuration-window.
    
#include "a4a2_figure_xtandem_configuration_window.typ"

The configuration of an #xtandem; run entails defining the following:

/ Configure the X!Tandem execution: This setting
          allows one to specify the path to the #xtandem; software program. The
          version of the program, if found, is displayed below (in this case,
          #guilabel[Alanine 2017.2.1.4]). This feature is useful when
          the user wants to test multiple versions of the #xtandem; software.
          
/ Run X!Tandem through HTCondor: Only check the box
          if running #xtandem; over the network on a server supporting
          #application[HTCondor]#footnote[See #link("https://research.cs.wisc.edu/htcondor/").].

<listitem_choose-presets-xtandem-configuration>          
/ Choose presets: This setting defines the
              parameters that &xtandem; must use. Either load already known presets
              from the drop-down list widget or edit them (or create a new set) by
              clicking onto  the #guilabel[Edit] button. Note that to load
              an existing presets file, it might be necessary to point #i2mcq; to
              the directory that contains the presets file. Use the folder icon for
              this, as visible in @fig:fig_xtpcpp-xtandem-presets-configuration-spectrum-tab.

/ Choose database files: Add protein database files
              in the FASTA format. There must be at least one protein database that
              contains all the known proteins for the organism of interest (there
              might be as many such database files as necessary) and optionally
              protein databases containing known contaminant proteins (there might
              be as many such database files as necessary). Click onto the
              #guilabel[Clear list] button to clear the database files
              list and start anew if an error occurred (it is not possible to remove
              files one at a time).

/ Choose MS data files to process: Add the mass
              spectrometry data files (mzML or mzXML format) to be processed by the
              #xtandem; software. As many files as necessary might be added in the
              list.
              #block-tip[When using Bruker timsTOF data, click onto the #guibutton[Add Bruker
                  timsTOF folders] button to select folders containing this
                  kind of data. Bruker timsTOF data come as two files that must sit
                  in the same directory.]

/ Output directory: This setting specifies the
              directory into which new files output by the #xtandem; process need to
              be created. #xtandem; produces identification results in files in an
              XML format that #i2mcq; reads during a later step.

/ Number of threads: This setting defines the
              maximum number of execution threads that #xtandem; might be using
              during its run.
              #block-tip[Although #i2mcq; sets that number of execution threads to 1, it is
                beneficial to set that number to the highest value possible.]



== Setting the #xtandem; Run Presets<sect_xtandem-parameter-presets>

The #guilabel[Edit] button of the #guilabel[Choose presets] group box described above triggers the opening of a
        dialog window where the user might configure in the most detailed way the
        #xtandem; parameters. That dialog window is pictured in @fig:fig_xtpcpp-xtandem-presets-configuration-spectrum-tab. Only the
        #guilabel[Spectrum] tab is shown, but the interface is similar for
        all the other ones.


#include "a4a3_figure_tandem_preset_window.typ"




=== Loading existing presets configurations from file<sect_loading_existing_xtandem_preset_configurations>

It is possible to load existing #xtandem; presets (which is useful in
          particular if the samples most often come from the same instrument using
          the same configuration). To this end, first point #i2mcq; to the right
          directory that contains the presets file of interest (click onto the
          folder icon at the top right corner of the window shown in @fig:fig_xtpcpp-xtandem-presets-configuration-spectrum-tab).  The
          presets files in the chosen directory are automatically detected and
          listed in the drop-down list widget. At this point, select from that list
          the file of interest and click onto the #guilabel[Load] button.
          
#block-warning[It is compulsory to click onto the #guibutton[Load]
              button to confirm loading of the presets file contents, because these
              are not updated upon choosing the file name from the drop-down list
              only.]


=== Creating new presets configurations<sect_creating_new_xtandem_preset_configurations>

It is possible to create a new presets file by clicking onto the
#guibutton[New] button. This opens an input dialog window for
the user to provide a new file name (the edit widget is preset with the
currently loaded file's name suffixed with #emph[\_copy]).

#block-tip[One interesting feature of the new presets file creation process is
            that, if presets are already loaded, #i2mcq; copies the currently
            displayed settings to the new file. From there, it is possible to create
            a variant #xtandem; presets file, which eases the exploration of the
            right #xtandem; parameters for a given sample data set.]
            
            
            
            
=== Actual #xtandem; presets configuration<sect_actual-xtandem-presets-configuration>


The dialog window pictured in @fig:fig_xtpcpp-xtandem-presets-configuration-spectrum-tab contains
          a number of tabs where various aspects of the #xtandem; run settings are
          handled. Each parameter's documentation can be seen on the pane on the
          right hand side of the window by clicking onto the question mark button
          next to it.  These manual pages are authoritative because they are taken
          from the #xtandem; software package with no transformation whatsoever.

Once the configuration has been performed, click onto the
          #guilabel[OK] button. If the parameters were modified, #i2mcq;
          asks if they should be stored in the file.





=== Running a properly configured #xtandem; process<sect_running-properly-configured-xtandem-process>

Once the #xtandem; settings configuration dialog window has been closed,
          it is possible to run #xtandem; from inside #i2mcq; by clicking onto the
          #guilabel[Run] button at the bottom of the window pictured in @fig:fig_xtpcpp-xtandem-configuration-window.

While the computation is carried over, the program shows the feedback
          dialog window pictured in @fig:fig_xtpcpp-xtandem-run-feedback-window.

#include("a4a4_figure_tandem_feedback_window.typ")


Once the computation is finished, the feedback dialog window closes and
          the user is returned to the main program window (@fig:fig_xtpcpp-main-program-window) albeit with a message shown in @fig:fig_xtpcpp-xtandem-run-job-finished-window.
          
#include("a4a5_figure_tandem_job_finished.typ")

From the main program window, it is possible to open the #xtandem;
          results file(s) located in the output directory configured above.
          There are as many output files (XML-based format, and xml extension)
          as there were mass spectrometry data files to process. The loading of
          the results files is carried over by first clicking the button
          labelled #guibutton[Load identification results (mzIdentML, pepXML,
          #xtandem;)]. The process is described in @sect_loading-protein-identification-results.

          
== Loading the Protein Identification Results<sect_loading-protein-identification-results>

The loading of identification results comes with a minimal set of
        configuration required to instruct #i2mcq; on the way to handle contaminant
        proteins, for example. This process is pictured in @fig:fig_xtpcpp-load-identification-results-window and is described
        in the following section.


#include "a4a6_figure_load.typ"





=== Identification Data Loading Configuration<sect_configuring-identification-results-loading-parameters>


/ Results handling mode: there are two possibilities:
  / Combine: in this mode, all the identification
                    results coming from different identification results files are
                    merged into a single set. That single set is the basis for the
                    protein inference step and the identified proteins are listed into
                    a single protein list window.
                    
                    #figure(
caption: "Selecting a particular identification results file's data set",
[
#image("../assets/print-xtpcpp-main-program-window-loaded-multiple-combined-identifications.png")
When loading multiple identification results files in
                            #guilabel[Individual] mode, the selection of
                            any given identification results file is performed by
                            selecting its name from the drop-down list widget
                            #emph[and] by clicking onto the
                            #guilabel[View protein list] button. Note that
                            some metadata about the identifications are updated
                            beneath the drop-down list widget.
]
)<fig_xtpcpp-main-program-window-loaded-multiple-combined-identifications>

  / Individual: in this mode, the
                    identification results coming for various files are kept
                    separated. Thus, the identification results coming from each
                    file are used for a separate protein inference step. The
                    identified proteins list is thus displayed for #emph[each
                    single file] in turn. The selection of the file for
                    which the protein list needs to be displayed is done via the
                    main program window that changes its appearance:
                    
                    #figure(
caption: "Selecting a particular identification results file's data set",
[
#image("../assets/print-xtpcpp-main-program-window-loaded-multiple-individual-identifications.png")
When loading multiple identification results files in
                            #guilabel[Individual] mode, the selection of
                            any given identification results file is performed by
                            selecting its name from the drop-down list widget
                            #emph[and] by clicking onto the
                            #guilabel[View protein list] button. Note that
                            some metadata about the identifications are updated
                            beneath the drop-down list widget.
]
)<fig_xtpcpp-main-program-window-loaded-multiple-individual-identifications>
    Right after having selected an identification results file,
                    click onto the #guilabel[View protein list] to display
                    the protein identifications list. That list has been obtained by
                    performing the protein inference on the file's protein
                    identification results (see @sect_protein-inference-psms-to-protein-identities).
                    The window that opens up will be described later (see @sect_protein-list-window).

  #block-tip[It is possible to open multiple protein list windows, each
                      showing the identifications from a different file:
                      maintain the #keycap[Ctrl] keyboard key pressed
                      while clicking onto the #guibutton[View protein
                      list] button.]



/ Choose results files: by clicking onto the
                #guilabel[Add files] button, the user is provided a file
                selection dialog window from which any number of protein identification
                results files might be selected for loading.
                
                Note that it is possible to list all the opened protein
                identification results files by clicking onto the #guilabel[View
                MS identification list] button. The window that opens
                up will be described later (see @sect_ms-identifications-list-window).
        


/ Contaminants: there are two possibilities here.
  / Contaminants files: when this radio button
                      widget is selected, the list of contaminant proteins will be
                      loaded from the files selected by clicking onto the
                      #guilabel[Add files] button.
  / Contaminant regular expression: when
                      this radio button widget is selected, a text edit widget
                      is shown, replacing the widget listing the contaminants
                      database files.  In this text edit widget, the user may
                      enter a regular expression to match the accession number
                      field of the protein databases that were used for the
                      protein identification step.  In this situation, the user
                      must use specially crafted protein databases in which the
                      contaminant proteins were tagged on the accession number
                      using a particular text pattern.  That particular text
                      pattern is then matched against the #guilabel[Contaminant
                      regular expression] that the user enters in the
                      text edit widget.

/ Contaminant removal mode: there are two
                possibilities. The contaminant removal is the process by which,
                when identified proteins match proteins in the contaminants
                realm (either from the contaminants dabase files or as
                determined using the regular expression), they are disregarded
                for the later protein visualization steps.

  / Protein list: in this mode, as soon as a
                    protein identification loaded from a protein identification
                    results file matches a contaminant protein, it is disregarded.
  / Groups: in this mode, the protein
                    inference process goes all the way through to the
                    determination of the protein groups (see @fig:fig_xtpcpp-protein-inference-grouping). When
                    protein groups have sub-groups that contain a contamimant
                    protein, then the whole group is disregarded. This might
                    appear drastic, but our experience is that most often, the
                    sub-groups in a group do identify proteins belonging to the
                    same family. Therefore, if one protein is contaminant, all
                    the other proteins in the group are supposed to be such
                    also.
/ Peptide and protein filters: this group box
                widget holds some parameters that configure the way protein
                inference is to be performed.
  / Peptide threshold on: there are two possibilities:
    / #evalue;: all the PSMs having an
                        expectation value higher than that value are disregarded.
                        Enter the value in the spin box widget labelled
                        #guilabel[Peptide #evalue;]. A typical value for the
                        #xtandem; engine is 0.05. When more stringent results are
                        desirable, setting 0.02 should yield satisfactory results.
                        See @sect_computing-peptide-e-value of a
                        detailed explanation of the #evalue; computation.
    / FDR (false rate discovery): the PSMs
                        are disregarded if their FDR value does not match this
                        parameter.  Enter the value in the spin box widget labelled
                        #guilabel[Peptide FDR]. A typical setting is
                        #guilabel[1%].
                        #block-tip[Using #guilabel[FDR] is most useful when the
                            identification results come from a database searching
                            engine that does not compute an #evalue;. However, it
                            does only work if the searching step was performed
                            also on a decoy database. In #xtandem; the decoy
                            database is crafted by reversing the peptide sequences.
                            In this case, when proteins are identified on the basis
                            of the reversed peptide PSM, then the protein identity
                            is tagged with the #quote[reversed] string, which
                            might be used with the #guilabel[Contaminant regular
                            expression] setting defined earlier.]
  / Number of peptides per protein: this is
                    the minimal required number of peptides that must be
                    identified as belonging to a given protein in order to
                    consider that protein identity as a valid one. These
                    peptides have to be from non-contaminant proteins, of
                    course.
  / Overall samples: when checked and if
                    multiple identification results files are to be loaded, then
                    the #guilabel[Number of peptides per protein]
                    requirement might be fulfilled by looking for peptides in
                    all the loaded files. For example, if one results file
                    provides one peptide for a protein identification and
                    another file provide another peptide (different from the
                    first one) to identify the same protein, and if the
                    #guilabel[Number of peptides per protein] is 2,
                    then the protein is considered as a valid protein. If not
                    checked, that number of peptides requirement must be
                    fulfilled by looking into each results file separately.
                    This last setting is more stringent.  A typical value for
                    this setting is #guilabel[2].
  #block-tip[This setting needs to be checked in at least one case: when a
                      complex peptidic mixture is separated by ion chromatography
                      (typically on an SCX#sym.dash.em;strong cation
                      exchange#sym.dash.em;resin) and the different fractions are
                      analyzed by bottom-up proteomics. The peptides coming from a
                      given protein might be located in different fractions, and
                      thus in different protein identification results files!]
  / Protein Evalue: threshold above which a
                    protein identification is disregarded (see @sect_computing-protein-e-value).                      
  / Protein Evalue (log10): convenience spin
                    box widget for the user to easily set the protein #evalue;.
  / Pep repro: if set to 1, a peptide, to be
                    accounted for, needs to be found in one protein identification
                    results file. If set to a greater number, then that peptide
                    needs to be found in that number of results files. This setting
                    sets more stringent protein identification conditions each time
                    it is incremented.                    
                    




=== Displaying the MS Identifications List<sect_ms-identifications-list-window>

#figure(
caption: "Displaying the MS identifications list (first columns)",
[
#image("../assets/print-xtpcpp-ms-identifications-list-window-first-columns.png")
This window displays a list of all the files that were involved in
              the #xtandem; run (first columns).
]
)<fig_xtpcpp-ms-identifications-list-window-first-columns>


#figure(
caption: "Displaying the MS identifications list (last columns)",
[
#image("../assets/print-xtpcpp-ms-identifications-list-window-last-columns.png")
This window displays a list of all the files that were involved in
            the #xtandem; run (last columns).
]
)<fig_xtpcpp-ms-identifications-list-window-last-columns>




=== Saving #i2mcq; projects<sect_saving-xtandempipeline-projects>

Once exploration and optional modication of the identification data have
      been performed, the user can save the resulting data set into a #i2mcq;
      project by selecting the #guimenuitem[Save project] menu item
      of the #guimenu[File] menu in the main program window (the
      extension of the file name typically should be #filename_extension[xpip]). See @sect_loading-xtandempipeline-projects for loading such a
      project.
  
  
=== Loading #i2mcq; projects<sect_loading-xtandempipeline-projects>


Loading of #i2mcq; project files (file of #filename_extension[xpip]) extension) is only possible if the user
    has previously

- Loaded identification results;
- Saved the data to an #i2mcq; project file using the #guimenuitem[Save
          project] menu item of the #guimenu[File] menu in
          the main program window.
          
          
