When you need to split a text file by lines or columns there are plenty of
ways to accomplish that. But what if you need to split a file by lines, where each record consists of multiple lines, and the number of lines in each record is not fixed?
If there is any deterministic string in each record you could use as record separator, you should consider csplit.
I wanted to monitor system load with thread-level detail using top in batch-mode.
So basically I ran the command:
top -H -b -n2 -d10
where
|
|
This command produces 2 records of information like this, and the line-count of each record depends on the number of processes/threads currently running.
|
|
The first iteration of top is printed immediately, while the next one is delayed by 10 seconds as specified by argument -d 10
(to obtain CPU metrics data averaged over 10 seconds, which is the information that I actually wanted). For that I needed to discard that first record, and log only the second one. As you can’t say how long, ie. how many lines the first record has, you cannot simple use head/tail to cut it off.
Fortunately, there is csplit
from coreutils package!
As each record produced by top starts with the string top -
it is dead easy to
define a multi-line record separator using a regular expression.
csplit has the ability to perform 2 major “pattern actions” on the input file.
Either copy or skip up to a matching line (where an optional +/- line OFFSET can fine-tune what how many context lines should be copied or ignored).
|
|
So, coming back to my example, I needed to drop the first record, and print the second one. I accomplished that using this csplit command:
|
|
csplit will will create one output file, csplit-00, containing only the second record.
Explanation: My pattern consists of 2 parts:
%^top%
- Read and skip any lines up to the record separatortop -
, which will match and exit at line 1, which is the first line of the first record.{1}
- Repeat the previous action to read and skip the first record untiltop -
matches again, at the start of the second record. Then, csplit writes the rest of the buffer to the output file.