SourceForge.net Logo

CGRU

LinkedIn
Since 1.6.7 (at 2012.12.03) site moved to cgru.info

FREE AND OPEN SOURCE

AFANASY

RENDER FARM MANAGER


Afanasy is a free and open source tool to control remote computing. You can compute something more quickly using render farm - remote computers connected by network. Afanasy designed for computer graphics (3d rendering and 2d compositing) parallel calculation. It can compute different frames (or even parts of frames) on several computers simultaneously.

Afanasy provides render farm monitoring. It is very important to watch computers resources during render process. You can see what kind of resource (CPU, memory, network etc.) is more needed to render. It is very useful to know what are farm hosts doing.

Afanasy engine simple runs different command lines on hosts and controls processes running. You can use Afanasy to parallel calculate anything you can describe (split) through command lines.

Render computer resources monitoring.

You can monitor CPU usage: User, Nice, System, I/O Wait and Load Average. Memory, Swap and HDD usage. Network traffic. Disk I/O operations speed and utilization percentage. This can help you to diagnose what slows rendering process. Especially swap, I/O wait and network traffic.

You can write your own custom resource(s) meter(s) on python. And Watch will plot their graphs.

Multiply Tasks Renders (clients, slaves, hosts)

Render can run several tasks at the same time.

Useful for "powerful" hosts with several CPUs. Render and Task has a capacity attribute, each task checks available capacity on render, and if its enough, task launches and take its capacity value from render.

Tasks can variate capacity 'on-the-fly'
Capacity variation can be described through its minimum and maximum coefficients. If task with 1000 capacity units runs on render with 4000 free capacity, task can take capacity coefficient 4. This coefficient can be caught in task command line (so Mantra can be launched 'mantra -j 4 ...', or 3000 capacity can be used by one 'houdini' task, for example to generate something, which use only one CPU, and 'mantra -j 3'). Hosts capacity values described in 'Farm Setup' ('xml' file).

Paths Map

Every host can has different paths and a map to translate paths to server and from server. With this feature you can setup a multi-OS render. You can use different operating systems on farm hosts and to submit jobs from any OS to render it on any OS.

Dependencies

Job can depends on other job(s) of the same user (depend mask) or on some job(s) from any user (global depend mask). Job will wait other jobs to be done matching this mask regular expression (Afanasy uses Qt Regular Expressions, they are Perl-like).

Job Block can wait other block(s) to be done (depend mask). Block tasks can wait other block tasks to be done (tasks depend mask).

Sub Task Dependence

Tasks can wait only some part of other task to be done. Useful to render simulations. First frames of a simulation can be started to render without waiting the whole simulation to be done.

Errors Solving

Job Errors Hosts List - block store host names where error was occurred and number of errors for each host. If this number greater it's limit, block will 'avoid' this host (not to run tasks on render with such hostname).

Tasks Errors Hosts List - the same as described above, but for each task.

Tasks Auto Retry - maximum number of errors to retry task automatically.

Tasks Maximum Run Time - the time after running considered as an error task.

Errors Forgive Time - time from last host error to forgive all it's errors (remove from error hosts list).

User define this default values for his new jobs and can to override them in each job, watch and reset any error hosts list.

Services Limits - Software Licenses Setup

You can describe various services (tasks) limits:
- Maximum number of total service starts on an entire farm.
- Maximum number of hosts which started a service (each host can start several tasks of a service).
- Maximum number of service starts on each host.

Priorities

User priority - user with greater priority can get more hosts.

Job priority - to simple sort user jobs queue.

Hosts Masks

Users and Jobs can have a hosts names mask to be able to run on and exclude mask for hosts to avoid.

Maximum Running Tasks Number

Users and Jobs can have a limit for maximum running tasks number - no new tasks will be started above this limit.

Maximum Running Tasks Per Host

You can limit job (and block) to start tasks on the same host.

JSON Protocol

Any information can be send ti and retrieved from server via JSON. You can send jobs and write custom GUI.

Python API

You can create and send job using python.

Setup services and parsers through python classes inherited from base service and parser.

Services and Parsers

Service - 'block tasks type', for example: 'hbatch', 'mantra', 'prman', 'nuke', 'maya'.

Service setups a default task output parser type (which can be overridden later). Services are python classes. They all inherits a base 'service' class. Service class describes command line manipulations, such as task capacity (how to specify how much CPUs to use ('mantra - j @AF_CAPACITY@')).

Parser - read task output to calculate running progress. Parsers are python classes and can be combined (multiply inheritance), for example: 'hbatch_mantra' - inherits 'hbatch' and 'mantra' parsers to listen to tasks when Houdini generates a files for mantra, such parser listen for frames switching (when several frames per render set) and calculates valid percentage.
Parser can produce errors or warnings if it founds "bad" text. So render can stop task with error if it produced bad output.

Job - Block - Task

Job consists of blocks, which produce tasks. There are two types of blocks numeric and not. Numeric block don't store tasks data, it generates tasks 'on-the-fly' by some rules. Not numeric blocks contains data for each task. Task inherits most attributes from block (for example all block tasks has the same working directory). But some other attributes individual (commands for example).

1 000 000 Tasks Job !
As Tasks are generated by job blocks "on demand", Afanasy (and Server and GUI) can handle numeric blocks with huge tasks number.

Farm setup

Map hosts names and their abilities through patterns described in XML file. You can describe job and block 'need' properties to be able to run only on hosts which satisfy this needs(OS, free memory etc.). Server can reload this file (and reconfigure itself) "on-the-fly", without restart and keep tasks running.

Watch - minimalistic GUI to afanasy

Watch - afanasy monitor. It can be in three modes - USER, VISOR and GOD.
USER ( common mode ) - user can change any his and any his job parameter. He can manipulate render host which was registered with it's user name or if it contains user computer name
VISOR ( super user mode ) - can do same as user but he change or remove job of any user.
GOD ( super user mode ) - can do anything. Change any parameter of any user, job or host. Add or delete users. Start, stop, restart any render host.

Web Visor

Web interface to afanasy database to show current state tables or draw history usage statistics diagrams.

All calculations and database queries are on the server side, simple HTML text and PNG images provided to client. So client needs no libraries or plug-ins for it, any browser will be enough.

SQL Database Connection

Afanasy server can connect SQL database. It process it through Qt classes, you can query available SQL drivers for your system by afcmd command line utility. Postgre SQL, MySQL and SQLite are available for most common systems. It can be chosen in configuration file. Postgre SQL (QPSQL) is set as default.
Afanasy stores jobs and users information in database. It get this information back on start. So you can shut down afanasy, and it will restore it's state on start again. Also you can type any SQL query to get some afanasy information yourself.

Command Line Interface

Afanasy has a command line interface for various purposes. It can connect to Afanasy server from remote Linux machine. You can set some server parameter "on-the-fly". Change some user, job and host attributes. Query users, jobs, hosts attributes and traffic statistics.

JSON Job Example


  {
    "job":
    {
      "name"                : "job name",
      "user_name"           : "jimmy",
      "host_name"           : "host",
      "blocks":[
      {
        "name"              : "Nuke",
        "tasks_name"        : "frames @#@-@#@",
        "service"           : "nuke",
        "parser"            : "nuke",
        "frame_first"       : 1,
        "frame_last"        : 100,
        "frames_per_task"   : 10,
        "frames_inc"        : 2,
        "command"           : "nuke -F@#@,@#@ -x scene.nk -X Write1",
        "working_directory" : "/home/jimmy/work",
        "files"             : "img_L.@####@.jpg;img_R.@####@.jpg"
      }
      ],
      "command_post"        : "deletefiles path/to/some.scene",
    }
  }

* This is an example of a job with a "numeric" block. Tasks will be created automatically on demand, and on server and on GUI side. By such "on-the-fly" tasks creation way Afanasy can handle jobs with a huge tasks number.

Python Job Example


   #!/usr/bin/env python

   import af

   job = af.Job('example job')

   block = af.Block('block of tasks')  
   block.setWorkingDirectory('/home')  

   task = af.Task('simple task')
   task.setCommand('ls -l')

   block.tasks.append(task)
   job.blocks.append(block)

   job.send()

You can write any custom job directly in python, or use some GUI to generate jobs. Use some already written job generators (see 'Software Integration' chapter) or write your own.

Farm Patterns Example


   <farm>
      <pattern name="Some Host">
         <mask>.*</mask>
         <description>Unrecognized machine</description>
         <capacity>1200</capacity>
         <maxtasks>2</maxtasks>
         <power>100</power>
         <service name="generic" />
      </pattern>
      <pattern name="Render Host">
         <mask>r.*</mask>
         <description>Some render machine</description>
         <os>linux</os>
         <capacity>2400</capacity>
         <maxtasks>4</maxtasks>
         <power>1000</power>
         <properties>intel nvidiagpu</properties>
         <service name="generic" />
         <service name="nuke" />
         <service name="hbatch" />
         <service name="prman" />
         <service name="mantra" />
         <service name="hbatch_prman" />
         <service name="hbatch_mantra" />
         <service name="maya" />
      </pattern>
   </farm>

When some machine registers on server, it finds last pattern with mask that match it's hostname. So '.*' first mask put at the top to give any machine some properties.

See full documentation for details.

Watch Renders
watch/renders.png


Watch Jobs
watch/jobs.png


1 000 000 Tasks Job:
watch/million_job.png


Watch Job Tasks
watch/job_tasks_block.png


Watch Users
watch/users.png


Web Visor Statistics Chart - Service Type / Tasks Number
visor/chart_service_tasks.png


Web Visor Statistics Chart - User Name / Jobs Quantity
visor/chart_user_jobs.png

Since 1.6.7 (at 2012.12.03) site moved to cgru.info
SourceForge.net Project

CGRU

Libre Graphics World