Using vim to stream-process text

Table Of Contents ↓

Problem

Imagine we have a one-off task of copying subset of files residing in an s3 bucket to another bucket.

We’re going to use aws s3 ls s3://a_bucket/path/ command to get the list of files and then only copy the ones that match a pattern.

Let’s assume the resulting file-list looks something like this:

$ cat in.txt
2013-03-29 14:35:03   10314739 20130314T011325_WCBK_SCHEDULE.XML
2013-03-29 14:35:07      81378 20130314T012706_CBK_TOURNAMENT.XML
2013-03-29 14:35:07   11659596 20130314T012735_CBK_SCHEDULE.XML
2013-03-29 14:35:18        421 20130314T100002_CBK_LIVE.XML
2013-03-29 14:35:18        421 20130314T100028_CBK_LIVE.XML
2013-03-29 14:35:18        452 20130314T100028_CBK_SCORES.XML
2013-03-29 14:35:18        457 20130314T100634_WCBK_SCORES.XML
2013-03-29 14:35:22        421 20130314T131835_CBK_LIVE.XML
2013-03-29 14:35:22   11707386 20130314T131911_CBK_SCHEDULE.XML
2013-03-29 14:35:22        452 20130314T131938_CBK_SCORES.XML

#and 10K more lines to follow.

Now we want to transform this file-list into a script that copies selected files into a destination bucket.

Solution

While there are many ways to transform text I’d like to use Vim’s ex mode this time to do the job, just like I’d have used sed or awk or perl or a ruby script.

Because I use Vim most of the time I often think in “Vim motions” when working with text.

Here’s the resulting command:

$ vim                                             \
  -N                                              \
  -u NONE                                         \
  ./in.txt                                        \
  -c ':%norm $Bd^'                                \
  -c ':%norm ^ytTP^3lylpr/2lylpr/2lylpr/'         \
  -c ":%norm Iaws s3 cp 's3://a_bucket/path/"     \
  -c ":%norm \$a'"                                \
  -c '%:norm $a s3://dest_bucket/path2/'          \
  -c ':saveas! ./out.txt'                         \
  -c ':qall!'

Running the command above produces text saved to ./out.txt

aws s3 cp 's3://a_bucket/path/2013/03/14/20130314T011325_WCBK_SCHEDULE.XML' s3://dest_bucket/path2/
aws s3 cp 's3://a_bucket/path/2013/03/14/20130314T012706_CBK_TOURNAMENT.XML' s3://dest_bucket/path2/
aws s3 cp 's3://a_bucket/path/2013/03/14/20130314T012735_CBK_SCHEDULE.XML' s3://dest_bucket/path2/
aws s3 cp 's3://a_bucket/path/2013/03/14/20130314T100002_CBK_LIVE.XML' s3://dest_bucket/path2/
aws s3 cp 's3://a_bucket/path/2013/03/14/20130314T100028_CBK_LIVE.XML' s3://dest_bucket/path2/
aws s3 cp 's3://a_bucket/path/2013/03/14/20130314T100028_CBK_SCORES.XML' s3://dest_bucket/path2/
aws s3 cp 's3://a_bucket/path/2013/03/14/20130314T100634_WCBK_SCORES.XML' s3://dest_bucket/path2/
aws s3 cp 's3://a_bucket/path/2013/03/14/20130314T131835_CBK_LIVE.XML' s3://dest_bucket/path2/
aws s3 cp 's3://a_bucket/path/2013/03/14/20130314T131911_CBK_SCHEDULE.XML' s3://dest_bucket/path2/
aws s3 cp 's3://a_bucket/path/2013/03/14/20130314T131938_CBK_SCORES.XML' s3://dest_bucket/path2/

#and 10K more lines to follow.

Now we can just source out.txt to get the copying going.

What’s happened

Refer back to how vim is being run above, line by line:

Conclusion

Related Posts
Read More
Experimenting with Go pipelines
Simple execution pipelines with Ruby
Comments
read or add one↓