Quantcast
Channel: Autarchy of the Private Cave » how-to
Viewing all articles
Browse latest Browse all 35

How to replace newlines with commas, tabs etc (merge lines)

$
0
0

Imagine you need to get a few lines from a group of files with missing identifier mappings. I have a bunch of files with content similar to this one:

ENSRNOG00000018677 1368832_at 25233
ENSRNOG00000002079 1369102_at 25272
ENSRNOG00000043451 25353
ENSRNOG00000001527 1388013_at 25408
ENSRNOG00000007390 1389538_at 25493

In the example above I need ’25353′, which does not have corresponding affy_probeset_id in the 2nd column.

It is clear how to do that:

  1. sort -u *_affy_ensembl.txt | grep -v '_at' | awk '{print $2}'

This outputs a column of required IDs (EntrezGene in this example):

116720
679845
309295
364867
298220
298221
25353

However, I need these IDs as a comma-separated list, not as newline-separated list.

There are several ways to achieve the desired result (only the last pipe commands differ):

  1. sort -u *_affy_ensembl.txt | grep -v '_at' | awk '{print $2}' | gawk '$1=$1' ORS=', '
  1. sort -u *_affy_ensembl.txt | grep -v '_at' | awk '{print $2}' | tr '\n' ','
  1. sort -u *_affy_ensembl.txt | grep -v '_at' | awk '{print $2}' | sed ':a;N;$!ba;s/\n/, /g'
  1. sort -u *_affy_ensembl.txt | grep -v '_at' | awk '{print $2}' | sed ':q;N;s/\n/, /g;t q'
  1. sort -u *_affy_ensembl.txt | grep -v '_at' | awk '{print $2}' | paste -s -d ","

These solutions differ in efficiency and (slightly) in output. sed will read all the input into its buffer to replace newlines with other separators, so it might not be best for large files. tr might be the most efficient, but I haven’t tested that. paste will re-use delimiters, so you cannot really get comma-space “, ” separation with it.

Sources: linuxquestions 1 (explains used sed commands), linuxquestions 2, nixcraft.

Share


Viewing all articles
Browse latest Browse all 35

Trending Articles