Newer
Older
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful, but
# WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
# General Public License for more details.
#
# If you did not receive a copy of the GNU General Public License
# along with this program, see <http://www.gnu.org/licenses/>.
"""Print delimited sections of input that match a given regexp.
Usage:
$ CMD | dmgrep [-d|--delimiter DELIM_REGEXP] [-i|--ignore-case] REGEXP
-OR-
$ dmgrep [...] < FILE
Use the '-v' ('--inverse') flag to invert the sense of the match.
Detailed explanation:
"Delimited" means there is some sort of single-line delimiter
separating sections from each other. This is common in many output
formats. For example, 'svn log' puts a line of 72 hyphens between
each revision's log entry, and 'git log' output starts each commit
message with a line matching the regexp "^commit [a-z0-9]{40}$".
You might use this script to, say, find all the 'git log' entries that
match a given regular expression. The match is performed against the
leading delimiter as well as against the body text between delimiters
(this is to handle cases like 'git log' output, where you might want
to match a pattern against both commit messages themselves and against
the commit IDs appearing in the delimiters -- see an example below).
Wherever there is a match, the entire matching entry is printed to
standard output, including the leading delimiter if there was one
(whether or not the match matched the delimiter itself), but not the
trailing delimiter, whether or not there was one. If you understood
that sentence on the first reading, this program is probably for you.
You can specify a custom delimiter regexp at run-time, but if you
don't then a default delimiter regexp is used, matching a set of
well-known delimiters that includes at least the two mentioned above.
Examples:
$ git clone git@github.com:OpenTechStrategies/smoke-alarm-portal.git
$ cd smoke-alarm-portal
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
$ git log --name-status | dmgrep "[Rr]einstate"
commit 52964c56964cc454b4239d3cada61e0577db897b
Author: Noel Taylor <example@opentechstrategies.com>
Date: Thu Jun 25 15:26:42 2015 -0500
#22 Reinstate a couple of DCSOps style rules
Commit 3ac3645 commented out a link to the DCSOps stlyesheet as a rule
there was breaking bootstrap grid behavior. This commit reinstates two
of those rules (for font family and button color) into our own
stylesheet.
M public/smokeAlarmPortal_files/base.css
$
### Hmm, interesting -- the commit message mentions another
### commit, 3ac3645. Let's see all commit messages that either
### are that commit or mention it.
$ git log --name-status | dmgrep "3ac3645"
commit 52964c56964cc454b4239d3cada61e0577db897b
Author: Noel Taylor <example@opentechstrategies.com>
Date: Thu Jun 25 15:26:42 2015 -0500
#22 Reinstate a couple of DCSOps style rules
Commit 3ac3645 commented out a link to the DCSOps stlyesheet as a rule
there was breaking bootstrap grid behavior. This commit reinstates two
of those rules (for font family and button color) into our own
stylesheet.
M public/smokeAlarmPortal_files/base.css
commit 3ac3645bac42f598911c9fc3829202a246bb0218
Author: Noel Taylor <example@opentechstrategies.com>
Date: Thu Jun 25 14:52:51 2015 -0500
#22 Errors now more obvious, yet less disruptive
Red asterisks now mark required form fields. If validation script
returns error messages, these now appear in place of the asterisks and
the errored fields gain a red outline. I have also commented out the
link to the DCSOps stylesheet, as one of the rules was breaking the
behavior of the bootstrap grid especially for extra-small screen sizes
but in other ways also.
M public/smokeAlarmPortal_files/base.css
M views/index.jade
M views/layout.jade
$
### Good -- we now have an overview of that commit and its followups.
Another example, a bit more contrived this time:
$ cat delimited_file.txt
This file has some delimited text
by which I mean it is divided into sections
divided by some known delimiter.
----------
In this case the delimiter is a line of dashes,
although it could have been anything.
By the way, this section contains the word "fish".
----------
The delimiter in this example isn't very interesting.
----------
How odd -- this section also contains the word "fish"!
What a coincidence.
----------
The leading delimiter is printed, but not the trailing.
----------
EOF is treated as another occurrence of the delimiter.
By the way -- fish!
$ dmgrep -d '-{10}' fish
In this case the delimiter is a line of dashes,
although it could have been anything.
By the way, this section contains the word "fish".
----------
How odd -- this section also contains the word "fish"!
What a coincidence.
----------
EOF is treated as another occurrence of the delimiter.
By the way -- fish!
Related programs and historical notes:
I scripted this up at a time when I didn't have easy access to my
preferred Internet search engines, thanks to the Great Firewall of
China, and originally called it 'dgrep'. But later when I could
easily search the Internet again, I found there was already a program
named 'dgrep'. So next I tried 'sgrep' (for "sectioned grep"), but
not only is that also already taken, the 'sgrep' ("structured grep")
program by Jani Jaakkola and Pekka Kilpeläinen (btw, what is it with
the over-representation of Finns among authors of useful utilities?)
actually does something pretty close to what this script does --
'sgrep' may even be a superset. In desperation, I resorted next to my
first initial, but 'kgrep' is already used too. Hence 'dmgrep', a
slightly odd abbreviation for "delimited grep".
After writing this script, I realized it is a generalization of
http://svn.apache.org/viewvc/subversion/trunk/contrib/client-side/search-svnlog.pl?view=co,
Run 'dmgrep --summary' to see a compact summary of usage.
"""
import sys
import re
import getopt
def match_maybe_output(body, regexp, leading_delim, inverse, out):
"""If LEADING_DELIM or BODY match REGEXP, print them to OUT.
BODY does not include any delimiters; REGEXP is a compiled regexp;
LEADING_DELIM is a string; OUT is a file handle (e.g., sys.stdout).
If INVERSE is true, invert the sense of the match."""
match = regexp.search(body) or regexp.search(leading_delim)
if (match and not inverse) or (inverse and not match):
out.write(leading_delim)
out.write(body)
def main():
delim_re = None
search_re = None
search_flags = re.MULTILINE | re.DOTALL
try:
(opts, args) = getopt.getopt(sys.argv[1:], "ih?d:v",
["ignore-case",
"help", # same as '--usage'
"usage",
"summary",
sys.stderr.write(str(err))
sys.stderr.write("\n")
sys.exit(1)
for opt, optarg in opts:
if opt in ("-d", "--delimiter"):
delim_re = re.compile(optarg)
elif opt in ("-h", "-?", "--help", "--usage"):
print("dmgrep [-d|--delimiter DELIM_REGEXP] [-i|--ignore-case] " \
"REGEXP < INPUT")
print("")
print("(Run 'dmgrep --usage' for full usage.)")
sys.exit(0)
elif opt in ("-i", "--ignore-case"):
search_flags |= re.IGNORECASE
elif opt in ("-v", "--inverse"):
inverse = True
sys.exit(1)
if len(args) < 1:
sys.stderr.write("ERROR: missing REGEXP argument.\n")
sys.stderr.write(" (Run with '--usage' flag to see documentation.)\n")
sys.exit(1)
elif len(args) > 1:
sys.stderr.write("ERROR: too many arguments -- " \
"expected exactly one REGEXP argument.\n")
sys.stderr.write(" (Run with '--usage' flag to see documentation.)\n")
sys.exit(1)
if delim_re is None:
delim_re = re.compile(
# Revision separator in a series of Subversion log messages.
"^(-{72}"
"|"
# Git commits.
"commit [a-z0-9]{40})$"
"|"
# Output of 'find-dups' (also in 'ots-tools', like 'dmgrep').
"[a-z0-9]{32} \\([0-9]+ identical files, each [0-9]+ bytes\\):")
search_re = re.compile(args[0], flags=search_flags)
accum = ''
latest_leading_delim = ''
line = sys.stdin.readline()
if delim_re.match(line): # if file starts with a delimiter, eat it now
latest_leading_delim = line
line = sys.stdin.readline()
while line != '':
if delim_re.match(line):
match_maybe_output(accum, search_re, latest_leading_delim,
inverse, sys.stdout)
accum = ''
latest_leading_delim = line
else:
accum += line
line = sys.stdin.readline()
match_maybe_output(accum, search_re, latest_leading_delim,
inverse, sys.stdout)
if __name__ == '__main__':
main()