Skip to content
Snippets Groups Projects
dmgrep 9.51 KiB
Newer Older
  • Learn to ignore specific revisions
  • Karl Fogel's avatar
    Karl Fogel committed
    #!/usr/bin/env python3
    
    # -*- coding: utf-8 -*-
    
    
    # Copyright (c) 2015, 2018 Karl Fogel
    
    # 
    # This program is free software: you can redistribute it and/or modify
    # it under the terms of the GNU General Public License as published by
    # the Free Software Foundation, either version 3 of the License, or
    # (at your option) any later version.
    # 
    # This program is distributed in the hope that it will be useful, but
    # WITHOUT ANY WARRANTY; without even the implied warranty of
    # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
    # General Public License for more details.
    # 
    # If you did not receive a copy of the GNU General Public License
    # along with this program, see <http://www.gnu.org/licenses/>.
    
    """Print delimited sections of input that match a given regexp.  
    
    
    Usage:
    
      $ CMD | dmgrep [-d|--delimiter DELIM_REGEXP] [-i|--ignore-case] REGEXP
      -OR-
      $ dmgrep [...] < FILE
    
    Use the '-v' ('--inverse') flag to invert the sense of the match.
    
    Detailed explanation:
    
    
    "Delimited" means there is some sort of single-line delimiter
    separating sections from each other.  This is common in many output
    formats.  For example, 'svn log' puts a line of 72 hyphens between
    each revision's log entry, and 'git log' output starts each commit
    message with a line matching the regexp "^commit [a-z0-9]{40}$".
    
    You might use this script to, say, find all the 'git log' entries that
    match a given regular expression.  The match is performed against the
    leading delimiter as well as against the body text between delimiters
    (this is to handle cases like 'git log' output, where you might want
    to match a pattern against both commit messages themselves and against
    the commit IDs appearing in the delimiters -- see an example below).
    
    Wherever there is a match, the entire matching entry is printed to
    standard output, including the leading delimiter if there was one
    (whether or not the match matched the delimiter itself), but not the
    trailing delimiter, whether or not there was one.  If you understood
    that sentence on the first reading, this program is probably for you.
    
    You can specify a custom delimiter regexp at run-time, but if you
    don't then a default delimiter regexp is used, matching a set of
    well-known delimiters that includes at least the two mentioned above.
    
    Examples:
    
    
      $ git clone git@github.com:OpenTechStrategies/smoke-alarm-portal.git
      $ cd smoke-alarm-portal
    
      $ git log --name-status | dmgrep "[Rr]einstate"
      commit 52964c56964cc454b4239d3cada61e0577db897b
      Author: Noel Taylor <example@opentechstrategies.com>
      Date:   Thu Jun 25 15:26:42 2015 -0500
      
          #22 Reinstate a couple of DCSOps style rules
          
          Commit 3ac3645 commented out a link to the DCSOps stlyesheet as a rule
          there was breaking bootstrap grid behavior. This commit reinstates two
          of those rules (for font family and button color) into our own
          stylesheet.
      
      M	public/smokeAlarmPortal_files/base.css
      $
     
      ### Hmm, interesting -- the commit message mentions another
      ### commit, 3ac3645.  Let's see all commit messages that either
      ### are that commit or mention it.
     
      $ git log --name-status | dmgrep "3ac3645"
      commit 52964c56964cc454b4239d3cada61e0577db897b
      Author: Noel Taylor <example@opentechstrategies.com>
      Date:   Thu Jun 25 15:26:42 2015 -0500
      
          #22 Reinstate a couple of DCSOps style rules
          
          Commit 3ac3645 commented out a link to the DCSOps stlyesheet as a rule
          there was breaking bootstrap grid behavior. This commit reinstates two
          of those rules (for font family and button color) into our own
          stylesheet.
      
      M	public/smokeAlarmPortal_files/base.css
      
      commit 3ac3645bac42f598911c9fc3829202a246bb0218
      Author: Noel Taylor <example@opentechstrategies.com>
      Date:   Thu Jun 25 14:52:51 2015 -0500
      
          #22 Errors now more obvious, yet less disruptive
          
          Red asterisks now mark required form fields. If validation script
          returns error messages, these now appear in place of the asterisks and
          the errored fields gain a red outline. I have also commented out the
          link to the DCSOps stylesheet, as one of the rules was breaking the
          behavior of the bootstrap grid especially for extra-small screen sizes
          but in other ways also.
      
      M	public/smokeAlarmPortal_files/base.css
      M	views/index.jade
      M	views/layout.jade
      $
    
      ### Good -- we now have an overview of that commit and its followups.
    
    Another example, a bit more contrived this time:
    
      $ cat delimited_file.txt
      This file has some delimited text
      by which I mean it is divided into sections
      divided by some known delimiter.
      ----------
      In this case the delimiter is a line of dashes,
      although it could have been anything.
      By the way, this section contains the word "fish".
      ----------
      The delimiter in this example isn't very interesting.
      ----------
      How odd -- this section also contains the word "fish"!
      What a coincidence.
      ----------
      The leading delimiter is printed, but not the trailing.
      ----------
      EOF is treated as another occurrence of the delimiter.
      By the way -- fish!
    
      $ dmgrep -d '-{10}' fish
      In this case the delimiter is a line of dashes,
      although it could have been anything.
      By the way, this section contains the word "fish".
      ----------
      How odd -- this section also contains the word "fish"!
      What a coincidence.
      ----------
      EOF is treated as another occurrence of the delimiter.
      By the way -- fish!
    
    Related programs and historical notes:
    
    I scripted this up at a time when I didn't have easy access to my
    preferred Internet search engines, thanks to the Great Firewall of
    China, and originally called it 'dgrep'.  But later when I could
    easily search the Internet again, I found there was already a program
    named 'dgrep'.  So next I tried 'sgrep' (for "sectioned grep"), but
    not only is that also already taken, the 'sgrep' ("structured grep")
    program by Jani Jaakkola and Pekka Kilpeläinen (btw, what is it with
    the over-representation of Finns among authors of useful utilities?)
    actually does something pretty close to what this script does --
    'sgrep' may even be a superset.  In desperation, I resorted next to my
    first initial, but 'kgrep' is already used too.  Hence 'dmgrep', a
    slightly odd abbreviation for "delimited grep".
    
    After writing this script, I realized it is a generalization of
    http://svn.apache.org/viewvc/subversion/trunk/contrib/client-side/search-svnlog.pl?view=co,
    
    which has been around for many years.
    
    
    Run 'dmgrep --summary' to see a compact summary of usage.
    """
    
    import sys
    import re
    import getopt
    
    
    
    def match_maybe_output(body, regexp, leading_delim, inverse, out):
    
      """If LEADING_DELIM or BODY match REGEXP, print them to OUT.
    BODY does not include any delimiters; REGEXP is a compiled regexp;
    
    LEADING_DELIM is a string; OUT is a file handle (e.g., sys.stdout).
    If INVERSE is true, invert the sense of the match."""
      match = regexp.search(body) or regexp.search(leading_delim)
      if (match and not inverse) or (inverse and not match):
    
        out.write(leading_delim)
        out.write(body)
    
    
    def main():
      delim_re = None
    
      inverse = False
    
      search_re = None
      search_flags = re.MULTILINE | re.DOTALL
    
      try:
    
        (opts, args) = getopt.getopt(sys.argv[1:], "ih?d:v", 
    
                                     ["ignore-case", 
                                      "help",  # same as '--usage'
                                      "usage", 
                                      "summary",
    
                                      "inverse",
    
                                      "delimiter="])
    
    Karl Fogel's avatar
    Karl Fogel committed
      except getopt.GetoptError as err:
    
        sys.stderr.write(str(err))
        sys.stderr.write("\n")
        sys.exit(1)
    
      for opt, optarg in opts:
        if opt in ("-d", "--delimiter"):
          delim_re = re.compile(optarg)
        elif opt in ("-h", "-?", "--help", "--usage"):
    
    Karl Fogel's avatar
    Karl Fogel committed
          print(__doc__)
    
          sys.exit(0)
        elif opt == "--summary":
    
    Karl Fogel's avatar
    Karl Fogel committed
          print("dmgrep [-d|--delimiter DELIM_REGEXP] [-i|--ignore-case] " \
                "REGEXP < INPUT")
          print("")
          print("(Run 'dmgrep --usage' for full usage.)")
    
          sys.exit(0)
        elif opt in ("-i", "--ignore-case"):
          search_flags |= re.IGNORECASE
    
        elif opt in ("-v", "--inverse"):
          inverse = True
    
    Karl Fogel's avatar
    Karl Fogel committed
          print("Unrecognized option '%s'" % opt)
    
          sys.exit(1)
    
      if len(args) < 1:
        sys.stderr.write("ERROR: missing REGEXP argument.\n")
        sys.stderr.write("       (Run with '--usage' flag to see documentation.)\n")
        sys.exit(1)
      elif len(args) > 1:
        sys.stderr.write("ERROR: too many arguments -- " \
                         "expected exactly one REGEXP argument.\n")
        sys.stderr.write("       (Run with '--usage' flag to see documentation.)\n")
        sys.exit(1)
        
      if delim_re is None:
    
        delim_re = re.compile(
            # Revision separator in a series of Subversion log messages.
            "^(-{72}"
            "|"
            # Git commits.
            "commit [a-z0-9]{40})$"
            "|"
            # Output of 'find-dups' (also in 'ots-tools', like 'dmgrep').
            "[a-z0-9]{32} \\([0-9]+ identical files, each [0-9]+ bytes\\):")
    
        
      search_re = re.compile(args[0], flags=search_flags)
    
      accum = ''
      latest_leading_delim = ''
      line = sys.stdin.readline()
      if delim_re.match(line): # if file starts with a delimiter, eat it now
        latest_leading_delim = line
        line = sys.stdin.readline()
      while line != '':
        if delim_re.match(line):
    
          match_maybe_output(accum, search_re, latest_leading_delim, 
                             inverse, sys.stdout)
    
          accum = ''
          latest_leading_delim = line
        else:
          accum += line
        line = sys.stdin.readline()
    
      match_maybe_output(accum, search_re, latest_leading_delim, 
                         inverse, sys.stdout)
    
    
    
    if __name__ == '__main__':
      main()