Lab 2 Some Unix tools

The aim of this lab is to look at methods of batch processing data (scene files) using both built in Unix / Linux tools as well as python.

Standard unix tools for finding things

Unix has a number of tools to allow searching and locating files.

The find command at it’s simplest can be used to search and print files that match a pattern.

find . -name \*.obj

Note in the above example the wildcard * is escaped as \* sometimes it is easier to put the search string in quotes.

find . -name "*.obj"

Better searching

This works well for finding files, however usually we want to find data within a file. To do this we can use the standard unix tool grep. Its name comes from the ed command g/re/p (globally search for a regular expression and print matching lines), which has the same effect.

Whilst most systems will have grep installed I usually use a more modern version called ripgrep or rg. This is installed in the labs but we need to add it to our paths. In the .profile file add the following

alias rg='/public/devel/2021/bin/rg'

We can then list all files containing a search string as follows.

rg -l "TransformBegin" /public/devel/2021/Renderman24Examples/

The real power of grep and rg comes from the ability to use regular expressions.

Some basic RegEx rules

The following cheatsheet is from https://regexr.com/

Character Classes

Expression	Meaning
`.`	any character except newline
`\w\d\s`	word, digit, whitespace
`\W\D\S`	not word, digit, whitespace
`[abc]`	any of a, b, or c
`[^abc]`	not a, b, or c
`[a-g]`	character between a & g

Anchors

Expression	Meaning
`^abc$`	start / end of the string
`\b\B`	word, not-word boundary

Escaped characters

Expression	Meaning
`\.\*\\`	escaped special characters
`\t\n\r`	tab, linefeed, carriage return

Groups & Lookaround

Expression	Meaning
`(abc)`	capture group
`\1`	backreference to group #1
`(?:abc)`	non-capturing group
`(?=abc)`	positive lookahead
`(?!abc)`	negative lookahead

Quantifiers & Alternation

Expression	Meaning
`a*a+a?`	0 or more, 1 or more, 0 or 1
`a{5}a{2,}`	exactly five, two or more
`a{1,3}`	between one & three
`a+?a{2,}?`	match as few as possible
`ab\|cd`	match ab or cd

We can now apply this to rg for example

rg -l '^[Attr|Trans].*Begin$'  /public/devel/2021/Renderman24Examples/

will list all files that contain either AttrBegin or TransBegin at the beginning of a line in the folder.

(Text) Stream Editing

There are many tools that can work on streams of text (either from the command line or file). This is very useful for editing lots of files and batch processing of data.

The simplest of these tools is sed.

The following python script will generate a CSV style file with windows style line endings

#!/usr/bin/env python
import random,string

def randName() :
    return ''.join(random.choices(string.ascii_letters)+(random.choices(string.ascii_letters + string.digits, k=random.randint(5, 20))))

with open('data.txt','w') as file :
    for i in range(0,200) :
        for l in range(0,random.randint(2,10)+2) :
            file.write(randName()+',')
        file.write('\r\n')

We can use sed to remove the \r line ending using the search and global replace mode

sed 's/\r$//g' data.txt

You will notice that this prints to the standard output and in this case the output now has a comma at the end (as the program to generate the file always prints a comma).

We can get rid of this using sed and a unix pipe as follows

sed 's/\r$//g' data.txt | sed 's/,$//g'

Finally lets remove the commas and replace them with tabs and re-direct to another file

sed 's/\r$//g' data.txt | sed 's/,$//g' | sed 's/,/\t/g' > final.txt

awk

Like sed and grep awk is a tool for filtering data from a stream, it is good for

Text processing
Producing formatted text reports/labels
Performing arithmetic operations on fields of a file
Performing string operations on different fields of a file

like sed awk can be used on the command line, or used with a script, awk also has a more complex programming language and setup.

 echo "hello awk" |  awk 'BEGIN{print "start";}{print $0}END{print "end"}'

The following example uses awk in script mode which can be very useful we are going to count the contents of an obj file.

#!/usr/bin/awk -f
# set the counters to zero
BEGIN{
  verts=0;
  normals=0;
  uv=0;
  faces=0;
}
# this gets executed per line
{
  if($1 == "v"){verts++;}
  else if($1 == "vn"){normals++;}
  else if($1 == "vt"){uv++;}
  else if($1 == "f"){faces++;}
}
# this happens at the end
END{
  print "Faces ",faces;
  print "verts ",verts;
  print "UV's ",uv;
  print "normals ",normals;
}

In this case the scrip is using the -f option which means it will use a prog file as the source of the input. All files passed in on the command line will be processed.

The BEGIN block is used to setup the initial variable states, then per line we check the first column $1 and see if it is one of the obj tokens and add to the vatiable.

Finally the END block is called once finished to print out the results.

AWK as a filter

The following example takes the input obj file and scales the vertices.

#!/usr/bin/awk -f
# set the counters to zero
BEGIN{
}
# this gets executed per line
{
  if($1 == "v"){
    printf "%s %f %f %f \n", $1,$2*0.5,$3*0.5,$4*0.5 
  }
  else
  {
    print $0
  }
}
# this happens at the end
END{
}

By default this will just print the output to stdout however we can pipe the output to a new file as follows

scale.awk test.obj > new.obj

Some more modern versions of awk allow inplace editing however the version we have installed doesn’t support this so we can use a temporary file.

./scale.awk sphere.obj > tmp.obj ; mv tmp.obj sphere.obj

decimate

In this example we remove faces based on a count

#!/usr/bin/awk -f
# set the counters to zero
BEGIN{
  count = 0
  limit=ARGV[1] 
  ARGV[1] = ""  # need to clear this else it will think it's an input file
}
# this gets executed per line
{
  if($1 == "f" && (count % limit) == 1){
    count=0;
  }
  else
  {
    count++;
    print $0
  }
}
# this happens at the end
END{
}

Exercise

The following maya.standalone script is designed to generate really bad maya scenes.

#!/usr/bin/env mayapy

import maya.standalone
import maya.cmds as cmds
import maya.mel as mel
import math
import random
import string
import os
import argparse


def random_point_on_sphere(radius=1,hemisphere=False) :
    xiTheta=random.uniform(0,1)
    temp=2.0*radius*math.sqrt(xiTheta*(1.0 - xiTheta))
    twoPiXiPhi=math.pi*2*random.uniform(0,1)
    x=temp*math.cos(twoPiXiPhi)
    y=temp*math.sin(twoPiXiPhi)
    if hemisphere == True :
        y=abs(y)
    z=radius*(1.0 - 2.0*xiTheta)
    return x,y,z

def randName() :
    """ 
    generate a random name, in python3 we can use choices however in py2 we need a different version 
    return ''.join(random.choice(string.ascii_letters)+(random.choice(string.ascii_letters + string.digits, k=random.randint(5, 20))))
    """
    return ''.join(random.choice(string.ascii_letters)+(random.choice(string.ascii_letters + string.digits)) for i in range(0,random.randint(5,20)))
colours=[(1,0,0),(0,1,0),(0,0,1),(1,1,1)] # red green blue white


if __name__ == '__main__' :

    parser = argparse.ArgumentParser(description='Create random Maya Scenes')
    parser.add_argument('--nscenes' , '-n' ,nargs='?',const=2, default=2,type=int,help='how many scenes to create 2 default')
    parser.add_argument("--fname", "-f", type=str, default='testScene',help="filename")
    parser.add_argument('--maxlights' , '-m' ,nargs='?',const=100, default=100,type=int,help='max lights in scene')

    args = parser.parse_args()


    maya.standalone.initialize(name='python')

    location=os.getcwd()
    for i in range(0,args.nscenes) :
        cmds.file( f=True, new=True )
        cmds.file( rename='{}/{}.{}.ma'.format(location,args.fname,i) )

        for i in range(0,random.randint(10, 200)) :
            x,y,z=random_point_on_sphere(14,hemisphere=True)
            name=randName()
            colour=random.choice(colours)
            cmds.shadingNode('pointLight', asLight=True, name=name)
            cmds.move(x,y,z)
            cmds.rename('pointLight1',  name)
            cmds.setAttr(name+'|'+name+'.color',colour[0],colour[1],colour[2],type = 'double3' )



        commands=[ "polyCone", "polyCube", "polySphere", "polyTorus" ]
        for i in range(0,random.randint(5,200)) :
            mel.eval('{} -n "{}";'.format(commands[random.randint(0,len(commands)-1)],randName()))
            cmds.move(random.uniform(-10,10),0,random.uniform(-10,10))
        # now save scene
        cmds.file( save=True, de=False, type='mayaAscii' )

    print('closing down maya-standalone')
    maya.standalone.uninitialize()

The scenes have really bad naming (randomly generated) no use of groups or namespaces.

There are many pointLights in each scene and these lights are either Red [1,0,0], Green [0,1,0] Blue [0,0,1] or White [1,1,1]

Write some batch processing scripts (I would suggest python may be best) to search for each of the pointLights and rename them based on the colour. You will also need to re-name the parent transform as well as the pointLight.

References

https://www.digitalocean.com/community/tutorials/the-basics-of-using-the-sed-stream-editor-to-manipulate-text-in-linux

https://www-users.york.ac.uk/~mijp1/teaching/2nd_year_Comp_Lab/guides/grep_awk_sed.pdf

https://learnbyexample.github.io/learn_gnuawk/cover.html

Pipeline TD Python Sed Awk Grep

Last updated on Jan 4, 2022