Steven Tattersall's personal blog
Created on: 2004-10-25
Posted in siggraph conferences reports
Months late of course, but I finally realised that I could cannibalise the report I did at work and whack my thoughts (for what they are worth) on the web...
Point-based graphics is using clouds of points to model and render objects, rather than triangles or patches or other structures; then, in theory, you don't have to store as much structural information. It's undergone a rebirth since about 2000 with newer, faster machines and the increasing use of scanning in real-world models, cf. The Michelangelo project, which generate point sets. This course was an introduction to the whole point-based pipeline, from model generation through manipulation to rendering.
This was a pretty impressive demonstration of a couple of researchers' model capture set-up. They were using only a few cheap cameras and lights; so cheap in fact that they kept having to cannibalise their old equipment for different experiments after someone “borrowed” half their equipment. Using controllable lights and lots and lots of camera passes for each light position, they created some pretty impressive scans of their models, as well as huge amounts of data. The actual geometry calculated was quite rough, but with the additional colour and alpha information (they used a cunning multicoloured background system to avoid those classic blue-screen edges), the final results were rather good.
Most of their ideas seemed to be “inspired” by research in image-based rendering. They did, however, have a novel way of calculating their reflectance maps for each point on a model. They measured reflectance per point using hundreds of random images, rather than a fixed special background like other systems. They then tried to build a matrix of values that would map a simplified picture of the random images to the final colour value in each case. This used a least-squares solving algorithm, plus a interesting framework of guessing how the resulting matrix should “look” (it tried to keep coherency with blocks of similar values). As a result they could make a good stab at calculating a new reflectance for any new lighting image they wanted to apply on to that object.
As an example, they recorded a fixed view of a rooftop for about a day, along side the associated image of the sky at each time the view was taken (with a glass ball). Then, with a bit of number-crunching, they relit the scene with a black sky and a white light whizzing round, without any knowledge of the 3d representation of the actual rooftop view. Very nice!
Once you've got your point-cloud, to do anything interesting with it you need to generate a representation of your model. This was a quite mind-blowing intro to how to do it, including a light-speed tour of several different curve-fitting techniques in 30 minutes flat. (I just have the word “hardcore” written in capital letters on my notepad here). It is, however, a really good introduction to curve fitting if you can take the time to look at the slides. There is also a nice iterative way of taking any point and mapping it on to the surface you have defined (see the slides starting with the ones labelled “projection” and “surface definition”).
In short, it turns out they are using piece-wise curve approximations within a recursive space-partitioned hierarchy, with implicit surfaces describing whether a point lies on the surface. They can calculate the surfaces more or less on-the-fly now (it is used in the PointShop3d software described later). Even with some fairly noisy data the results looked quite good, although they may have just been choosing their models carefully.
Interesting, but badly-explained, talk on how to produce a point-based image that didn't look like a nasty voxel model.
What seems to happen is that the point cloud has to be re-sampled at real-time, by:
converting it into image space (still in 3d though);
then making an image-space continuous surface over a section, a bit like making a triangle rasterization. This is done by techniques inspired by signal processing (but not just FFTs, because the points are not distributed uniformly) called “local weighted average filtering”;
filtering this 3d surface to look right on screen;
then regenerating a new point cloud for render.
The filtering technique is similar to Heckbert's texture mip-mapping system, but extended to pre-filter the 3D points, so the blurring on the points depended on screen space (I think – the talk wasn't too clear on this). A comparison against pure mip-mapping actually looked better on the classic checkboard models, as at mid-range its blurring was only done horizontally which reduced some of the smearing.
The actual rendering was done by “splatting” lots of 2d ellipses on the screen, then doing a final pass to normalise the accumulated splats (“splat” is actually the technical term). Unfortunately, it requires 3 passes on current hardware (an extra one to prevent too many splats accumulating on the surface, using some Z-buffer trickery), and ironically has to use triangles to use the splats. So performance isn't currently great.
Disappointing talk on how to render old-style voxel-like models on current PC hardware. Although LOD with PBR is nice and easy (just render fewer points), trying to avoid holes is tricky. This section showed a way of converting a hierarchical way of rendering objects to avoid holes (recursion - not good for speed) to a flat-sorted one, with a GPU pixel shader to do some very simple Z-culling. Which seemed pretty obvious from about 10 minutes' thought, even for someone brain-damaged such as myself. The results didn't look very good either.
These 3 talks joined together to form a more cohesive unit. The first two detailed how point clouds were split up into patches then run through FFTs and suchlike to do smoothing. The actual FFTs cost far less than the patch-splitting, which took ages, as it turned out. The second pulled in techniques from the “surface representation” talk earlier to demonstrate how you could actually do some modelling; ironically of course, doing this means you have to create an implicit structure that original people were trying to avoid by using point-based graphics! The third had some quite nice demos of a point-based 3D modelling package (http://graphics.ethz.ch/pointshop3d/ - free to download), which did manipulation and point-based CSG operations in (near) real-time.
The PBR crowd reminded me a bit of the train-spotters the Siggraph crowd; hanging on to their unfashionable, but somehow quite nice, theories, while every went off to play with the whizzy super-fast triangles. I'd quite like it to come back into fashion, somehow.
A breezy morning filled with an all-round introduction to lots of collision detection methods. Interestingly it didn't deal with collision resolution, which very tightly bound with the detection (I guess they didn't have time).
Note: the talks actually given didn't always match the conference materials.
A gentle introduction to the whole topic, along with some of the issues and problems you have to solve (time aliasing, inter-penetration depths, discrete vs. continuous detection etc).
There was also a nice set of “general strategies in algorithm design” that acted as pointers to most of the later talks (e.g. Cache your results, exploit time and spatial coherency)
This was mainly for use in areas like cloth modelling - the presenter was a “cloth expert”. He pointed out that subdividing the collision hierarchy in terms of the polys of the cloth, rather than the space that the cloth occupied was more efficient, as it took advantage of coherence better (this is true for most continuous surfaces e.g. Game navigation meshes). Plus:
a quick introduction to space partitioning techniques (including DOPs)
how to build a “good” hierarchy of areas using heuristics;
the pain of detection of self-intersection; the algorithmic solution is quite elegant
a compact way of storing and calculating whether sections of the meshes are adjacent. This exploited the hierarchical nature of the data structures to allow you to only store each vertex once in the structure and quickly get a result in O(n) time.
The resultant algorithms are very.... slooow. But look good.
(This mainly seemed to actually be about convex polyhedra, and the conference notes reflect this. Cedrick tells me that the “non-convex” part was just handled by having a hierarchy of convex objects – I missed this)
This appeared to be an explanation of bits of I-COLLIDE and other early systems by this team. Rushed talk by Ming Lin about handling a system with quite a few bodies flying around. In essence, they keep a queue of probable future collisions between object pairs to reduce the number of tests to do. Between pairs they use Voronoi diagrams to keep track of the 2 closest features, and use separating axis tests on bounding boxes. There was also mention of a quick lookup table on objects to guess where the closest features between objects would be when starting up. That was about it. From my limited understanding of collision systems, this basic system is quite old.
http://www.google.com/search?q=proximity+queries+%22convex+polyhedra%22
Christer Ericson of SCEA's slow but very useful explanation of how GJK works underneath. I've actually never seen it explained like this before. He also showed how the algorithm can exploit coherence by remembering closest points on the collision test objects to do subsequent tests (very similar to the feature tracking in the previous talk). The slides are on the conference DVDs and are nice and easy to follow.
There was actually a bit of controversy in the questions afterwards, as a V-Clip fan-boy had a bit of a go at GJK's robustness.
Exceedingly quick introduction to how to detect continuous collision of triangles, using some genuine “napkin diagrams”. So quick, in fact, that I missed the point of it and had to have Cedrick explain it to me afterwards. Then it was actually quite simple, and led on quite easily from section 1.
A couple of the guys were keen to see this, as it did collisions with full meshes at (what appeared to be) reasonable speed. The main point of interest seemed to be that he was using “interval arithmetic” to calculate position bounds for objects as they proceeded along time-steps, and using this recursively to filter out collision times. That is, he modelled the object flying through space using an approximation function and calculated the bounds of points under this approximation over ever-reducing time steps.
Then it looked like he actually fell back on normal triangle-triangle collision after this (i.e. slow). He seemed to be a bit vague as to how the whole system fitted together too, preferring to just concentrate on certain aspects, but in essence it looked like he had a 3 stage system:
Bounding boxes using separating axis tests
Interval arithmetic to cull swept objects
Full mesh collision
... or “Adaptively-Sampled Distance Fields”. A rough precalculation of the distance (and optionally the surface normal) to the object around it, in a sampled mesh structure. Good for things such as hair collision. They can also compress down quite well using recursion. Probably still a bit wasteful on memory for console use though (unless you use lots of similar-shaped objects).
A rushed Ming Lin again, with 2 approaches to offloading collision work onto a GPU. (Using the GPU seemed to be one of this year's fashionable topics). One approach used the GPU to render distance fields between objects, but this was slow as you had to get the results back off the GPU for the normal processor to do the rest of the work (i.e. Collision resolution), and then had to often do similar tests again back on the GPU.
The other, and apparently better approach, used the GPU to create a potentially-colliding set of objects, and could cull a load of primitives very quickly, before the main processor got to work doing all the rest. This meant that the pipeline wasn't as blocked (I think, things were going past very quickly by this point as the talk was way over the allotted time).
If we are really going to be using lots of CELL processors in future, it might be worth bearing some of this in mind (particularly the approaches that didn't work...)
Quite a scatter-gun approach to collision. Although the morning did cover lots of topics, it didn't do so with much coherency, with lots of bits seeming to be out-of-order or skipped. As a result, it wasn't in-depth enough for an experienced user, or clear enough for a novice like myself.
The course notes are quite exhaustive though and well worth looking at (although they too don't match the actual course too well).
Bayesian learning looks to be another fashionable topic in graphics at the moment, with people using it in all areas from texture generation to animation control. This was a whirlwind tour of Bayesian reasoning, from simple coin-tossing examples to face-shape modelling.
The underlying idea was to use Bayesian theory to best match an approximation to a vector of data (e.g. Coin-toss results, vector of joint positions), so that, given new data, it could predict results (where would the other joints be if I set these N joints to given positions? Will it come up tails next time?) It had some impressive advantages claimed for it in that it required less tuning than other similar learning/approximation methods.
However, the presenter skipped through quite a lot again, and didn't really set out much of how this reasoning tied in with actual research. So much so that I've had to spend quite a long time reading the conference materials to work out much of the underlying reasoning (even the simple bits), and even that has big gaps in it.
A very popular course, with barely an empty seat in the house, and frustratingly one of the hardest to follow. Again it was a case of trying to pack too much content into too tight a schedule.
Probably the best-presented talk I saw this year, clear and precise.
A method for trying to estimate the skeleton of a model purely from the raw motion capture data. Firstly it looked for markers between which lengths didn't vary much, using a method tolerant to the usual motion capture “noise” (RANSAC), and assigned them to the same bone of the skeleton. Then it did a similar thing to fit all the markers to a joint. It also managed to construct a decent skeleton topology by calculating which structure would have the lowest total “cost”, but biasing away from joining joints far away from the skeleton centre (this seem like a bit of black magic to get better results).
Obviously for games this is of lesser use because we generally have a fixed skeleton and have to fix the animation to that. But it could be quite a good time-saver for ad-hoc models such as cutscene characters.
Quite a fascinating presentation where the researchers were trying to take the “style” from one set of animations and applying them to a different action. As an example, they took an animation cycle and decomposed certain components of the animation using Independent Component Analysis (ICA). The software would then show you several different aspects of the animation: one window showing the feet moving laterally, another showing the overall posture, applied to a skeleton as the animations played.
Then the user could select one or more of these elements and merge/replace it with a matching element from a different animation. This would produce a new animation with a new “style” applied (e.g. Apply “sneakiness” to walk). It would also let you interpolate between the two style poses to progress from a sneak into a walk or vice versa.
Some of the results are a bit stiff, though, but it might be a way of saving time and producing a rough animation preview for an artist to clean up. Also, user intervention is needed to decide which of the differences really is the “style” component to apply, so the process isn't really automatic.
This presenter had decided to merge about 4 other areas of research into one. His aim was to transfer animations and rendering from one human to another using just camera capture (i.e. Without mocap markers). What this boiled down to was:
Model capturing using techniques from image-based rendering (i.e. Taking lots of shots of your humans and trying to build models from them);
Joint capture – the humans wiggled each joint in turn while the camera was on them, then it attempted to match points onto each joint;
Markerless motion capture – using this model, it then tried to match its internal image of where it thought joints would be to the actual image it had captured. Then it tried to fit the two together as an error minimization problem;
Image-based rendering – once it had the data, it would attempt to apply the mo-cap onto another human it had captured.
The results were rough, but surprisingly OK in a woolly-voxel-kind-of-way, given the cheap nature of his cameras and the inherent noise in the problems (most of these areas had already been well-researched by others though, as far as I can tell). It also wasn't clear how much of the data had been manually cleaned up.
(aka the “Japanese leg-stretching exercise”).
This was attempting to apply some of the classic rules of anime animation to mo-capped data. In essence this boiled down to (a) holding poses before actions etc to emphasize speed change (b) accelerating motion during actions (c) stretching some joints during actions etc to exaggerate the force of the action. It wasn't clear if the authors were doing this automatically in any way (in the sketch paper it sounds as if the user selects timeframes, but the exaggeration is done by code); they freely admitted they didn't have much time before presenting the sketch.
A selection of quite off-the-wall ideas. I mainly went to these for some blue-sky thinking about future technologies/peripherals and interface design (something we don't seem to concentrate on that much). It actually turned out to be one of the best sessions I went to, as it opened up quite a lot of ideas.
This team will be forever remembered for having sung the name of their invention to the tune of “YMCA” at the Fast Forward presentations earlier in the week. RFIDs are best known for being the privacy-invading way of stopping people shoplifting razorblades, but here they were being used with a light sensor attached to them.
The basic idea was that using light sensors and a torch, you could recover the positions of tags attached to, say, boxes in warehouses. Then you can have an interactive 3D display, shone on the boxes by a hand-held projector, to show boxes that were out of date, computer components that needed replacing etc. in an “augmented reality” display. It also let you do “drag and drop”-style interaction using the projector, which was quite nice. It felt like it should really have been in the Emergent Technologies hall though.
Interface designer trying to solve the problem of novice users getting lost in directory listings where all the icons for files, or file names looked the same. To get round the problem he was trying to automatically generate unique icons of “doodles”, using an L-system, from the file name itself as a seed. The idea was that people remember unique pictures better than similar pictures or words. He used the analogy of people losing their cars in car parks because cars all look the same.
Although some testing seemed to show some improvement, the author admitted to failure (a first?).
The actual icons generated were quite pretty though. Computer games have generally used L-systems to generate random names (Populous). I wonder if you could use it for browsing or trying to spot users in online games, for example.
Possibly my personal highlight of the conference. This was using a really simple interface to let users create models and animation using a light pen or equivalent. The user would draw a cartoon-like shape in 2D and the program would generate a 3D skeleton, and allow you to draw detail on your cartoon.
Then by using a gestural interface you could easily control your character to walk, jump, stomp, shuffle, run, do somersaults, even moonwalk. The results were great, and the sight of seeing a 3-year-old mesmerized by it makes me think it's a really good thing to look at for EyeToy (although the limited EyeToy resolution could be a big problem) to do a “build your own adventure” application. Or even just using the gestural interface to control a game character would be neat.
A funky front-end to MatLab which allowed you to draw your diagrams and formulas, notebook-style, on screen, and link them together to control the action. As well as having a handwriting system built in, it was smart enough to be able to link lines and numbers (to denote distances) and so on. While the usefulness was limited by the difficulty of expressing problems in 2d, it still allowed you to model such things as springs and connecting objects. And it looked great, particularly with a “lined-paper” background behind it!
Panel/discussion session. Here “consumer electronics” was generally taken to mean set-top boxes, televisions, and most of the panellists ignored game consoles (at their peril?)
The panellists were:
CEO of EchoStar, who make set-top boxes;
a senior programmer of Starz Encore, making video-on-demand software for set-top boxes;
Evan Hirsch, an ex-EA designer (and now unemployed?);
A Sony employee involved in a lot of the upcoming home networking technology allowing inter-operability between different vendors.
The panel was a bit of a mess with no coherent thoughts. What did become clear was:
Most manufacturers have real trouble designing simple enough interfaces as it is – the usual tales of people not plugging equipment in, calling support with unrelated problems. They're not too bothered about whizzier technology (many set-top boxes have never been replaced from their first iterations, because it's not been needed);
No-one has really thought about interfaces for the next generation of home networks and television inter-operability, even though the industry has invested billions in getting the technology up and running. This is quite scary;
There could be a big gap in the market for consoles to provide user interfaces for inter-operability (my personal conclusion from all this): they are updated frequently and can take advantages of new technology; they generally have good user interfaces; and people are now very comfortable in using them.
Paper couched in buzzwords such as “thought capture”, but really a way of splicing together animations and speech samples for in-game characters to provide variation and matching hand-gestures. The example given was a game NPC from SSX 3, giving you feedback after messing up or completing a race. (EA donated the model and I think they were interested in the technology).
The ideas underneath seemed interesting, including a nice way of inserting words and building basic sentences (e.g. “you <still> need to avoid the nasty spikes”, if you hit the spikes several times), but was hampered by some dodgy sound and acting, which meant that it wasn't initially clear what was going on underneath. Also the general lack of intonation served to confuse the actual message being given.
One other interesting note was that because it was a computer game character, I wasn't expecting her to be dynamically generating sentences – I was just used to seeing pre-baked setups. As a result, the output generally didn't feel as slick as normal in-game stuff, so it felt disappointing.
People have been trying to generate movement for years by trying to move joints from one set of constrained positions to another, using some kind of constraints problem (e.g. Use as little muscle energy or torque as possible to get from pose A to pose B). Unfortunately, computing this kind of thing takes ages because the problem has so many dimensions, typically a multiple of the number of joints of the character.
This paper tried to simplify the problem by looking at similar animations, e.g. similar walks or jumps, and then trying to determine by how many dimensions you can reduce the problem to without losing visual quality. The main thinking behind this appears to be that over the course of an animation, several joints work in sync, so you can collapse these degrees of freedom down. The technique used was Principal Component Analysis, which appeared to be very similar to solving a least-squares error problem, looking at the paper.
It then used these lower dimensions to solve the constraints problem and then generated animations (this took in the order of minutes to hours depending on complexity)
The results varied; jumping and spinning and getting over stepping-stones looked very good, whereas a back-flip looked unnatural (the solver was “too perfect”). You could also try generating anims using unrelated mo-cap data e.g. Use captured walk movement to generate a jump, but that looked all wrong.
This was a very impressive paper and one of the authors won the “Significant New Researcher” award this year. It also links to the course on Bayesian techniques (see above). Last but not least, it also had the longest abbreviation I saw this year (the Scaled Gaussian Process Latent Variable Model, or SGPLVM).
The basic idea was to take a load of mo-cap data from a skeleton and perform Bayesian analysis on it. Using that it could, given some known positions of joints, calculate the probable positions of other joints, by biasing the IK of joints towards probable poses for that set of known values. It could also work out the “likelihood” of any given pose by comparing it with its own internal model (e.g. A pose with a leg thrown out at a funny angle would be an unlikely pose).
The results were very good. With the same constrained positions it generated convincing, but different, poses for characters with different skeletons. In addition, it helped in motion capture to fill in the gaps when markers are occluded.
It also seemed to not require very much training if the instructor on the Bayesian course was to be believed; you could just throw animation data at it and it would adapt to the data.
On a much simpler level, I wonder if in future we can use our motion capture data to provide initial limits and extra constraint/biasing limits on our real-time IK solvers?
This was trying to generate ways of animating characters moving boxes between pigeon-holes, or opening cupboards. It used a searching algorithm to plan a course for the objects being moved (e.g. Plan the path of the box first, or the doors), using a randomized direction algorithm and some collision detection to stop impossible paths. Then tried to use a weighted combination of known poses, plus a bit of IK, to try to generate a decent animation, plus another round of collision detection to check it hasn't dragged an arm through an obstruction.
Some of the results looked quite good; the main problem was that the object to be moved often had a very unnatural path, due to the nature of the path solver. Also, generation of the animations was quiiite slooow (about 1 – 10 minutes per anim). The use of pose databases did though mean that you could again get different animations from the same action – so, for instance, tall people were posed differently from short ones.
Full-day course on animating bunches of characters without making it extremely obvious that they were actually quite dumb compared to computer game agents (biased view)
Presented by Daniel Thalmann of EPFL Lab in Switzerland, who are focussing on lots of aspects of virtual humans. The most real-time-oriented session of the day; this was a simple introduction to character AI. Some interesting but simple points e.g. people change their behaviour in a crowd from being an individual. It covered all the usual bases of perception, memory, cheap(ish) rendering. Included yet another explanation of PCA.
However, the talk was let down by the fairly miserable engine demonstration shown. Not only did it chug terribly, the behaviour was extremely unconvincing.
The ILM crowd system was interesting mainly because it was less of a system, more a set of plugins hacked into the Maya particle systems. It appeared that the first crowds they did had to be hacked in because no-one had realised it was needed in Episode 2. A lot of their effort was based on limiting rendering time, and so they pre-baked a lot of rendering of character parts into files (a not inconsiderable amount of data!) and piped them as-is into their RenderMan rendering stream. Also they had to do a lot of LOD for large scenes, based on the on-screen character bounding boxes.
As for their behaviour and logic, it seemed to be a lot of specially-coded plugin scripts for each particular scene rather than anything generic. Jurassic Park 3 used boids-type motion for a top-down shot of dinosaurs being buzzed by a plane; Episode 2 used canned probabilistic movement for agents hopping along stairways in the big arena fight scene. Otherwise crowds wouldn't move, or use very simple flow-based movement (i.e. Paint their movement on a floor).
Most of the animation was handled by very simple flow-networks of animations; an animation cycle just specified future animations that ensued, with probability distributions e.g. A walk could be split into slightly different sequential walk cycles to give variety, wait animations were spliced together from deliberately bland bits of movement (they stressed that blandness was essential, otherwise the observer can spot the repeated characteristics of notable animations).
For anyone interested, the conference materials have actual MEL-script included, so you can see what they did.
Dreamworks' systems were, if anything, even simpler. Generally they had very limited movement which was just “move/turn to this interest point if I tell you to”, or simple keyframing of paths.
There was a similar use of cycled animations and LOD, plus simplified head looking where they just distorted points on the head part of the character mesh to point the face in the correct direction.
Any interesting movements, for example, birds eating corn and hopping about, were done by cleverly linking animation cycles to the keyframed paths they followed. The “flap wings” animation would be timed to coincide with a bird moving from point to point.
One interesting part was how they procedurally generated costumes for crowds. Although they tried to give the entities a “digital fashion sense”, it never really worked 100% and they had to use manual overrides a lot. They also tried to generate populations in “couples” (e.g. Boy-girl, people with similar interest points) to make it look convincing.
This was a potted history of agent AI for the first half, followed by a big advertisement for his Massive crowd simulator for the second. The two didn't really have much in common.
Massive, the system used in the Lord of the Rings films, was a bit weird. It seemed to be a very elaborate system for building what, in essence was very simple character behaviour (there didn't seem to be much statefulness, so the characters generally just reacted to impulses in a very limited manner). There were a couple of nice features like a full dynamics system and per-character image-based vision (each character rendered its view of the scene to a low-res texture, but didn't seem to use the results very much).
The actual core of the logic simulator just seemed to be a very simple fuzzy-based editor with simple outputs that tweaked variables such as speed, direction, animation etc. It needed massive (sic) networks of logic to get an interesting behaviour, and even that wasn't very complicated.
The later talk by one of the WETA team who actually used the software in LOTR was more revealing. They had essentially set up every shot individually, with completely different behaviour scripts each time, and spent huge amounts of time editing out characters going wrong when the producers saw the initial rushes. They also heavily relied on the shot lengths not being more than a few seconds to set up all the simple flow fields, attractors etc. So in essence the bots weren't doing anything remotely intelligent (in contrast to all the hype kicking around about the system), but they were easy to script and manipulate, and because there were 1000s of them kicking about they looked good. The demands for game crowd characters, though, are entirely different.
Crowds in movies don't do much at all. More or less “zombies”. Much simpler than modern game character logic.
They have the advantage of “cheating” - remove/tweak anything that doesn't look right to suit the shot. They do a lot of manual overriding to get things right, a luxury we don't have.
They often add fully-animated characters to distract the eye
Things they didn't do:
Any sort of navigation or path-finding
No real spatial awareness; nearly all paths are hard coded as target points or flow fields
Any sort of complicated interaction
No real use of memory, knowledge in decisions.
Very limited Reaction etc.
Not remotely real-time!
A rag-bag of talks about games and cartoons.
Not quite an extended advert for EA, but very nearly.
EA look to have a policy of hiring senior staff from the film animation industry. This presenter, Henry LaBounta, had previously been a visual effects supervisor at PDI/DreamWorks. Now he is the art director on SSX.
This was a look at how they polished the game from SSX Tricky to SSX3, while still more-or-less using the same engine. So it was a run-through of better sky boxes, pre-baked lighting with Mental Ray, depth fogging, light blooms, but done in a very professional way and switched on and off as you progressed down the mountain. There was also a few pointers as to how they had designed the art style and their extensive use of concept art as almost “throwaway” material.
One note: they claimed to run only untextured levels until the gameplay was completely sorted. Which is interesting if true!
Slightly rambling overview of the joys of training/managing a huge art team to churning out huge amounts of data (out of 40 artists, only 2 of them had ever used normal maps before). This presenter claimed that it was better to use a team without previous game experience, as they hadn't picked up any bad habits by that stage.
There was a heavy emphasis on good tools, ideally with quick on-screen changes for feedback, plus cutting out even small delays in the pipeline as when they calculated it, it actually accumulated to an enormous overall time that they were wasting.
To keep visual cohesion, they split the whole team into sub-teams, each responsible for a realm. They already had 5 former art directors on the team (!), none of whom wanted to do any management again. With small groups, they were happier to take on some kind of overseeing role.
There was also praise for the programmers because they discussed what technology the artists would want to use, rather than just what the technology could do. Which was a nice point.
Faced with having to generate so much output, the team responsible for “SD GUNDAM FORCE” stripped out several layers of their normal live action and cartoon pipeline (no animatics, layering, offline editing or post production stages any more). In addition, they took a much stronger control of outsourcing to try and get rid of bottlenecks in production. They also had a web-based version control for passing data back and forth between their outsourced studios.
On the tech side, they used motion capture a lot to save hand animation, and developed their own in-house shaders which gave near-real-time preview of animations and modelling.
(Apparently, “Monkey Turn” is a particular manoeuvre in Japanese powerboat racing circles)
This cartoon was a mixture of CG and anime, but the two were not mixed in the same shot. As a result they could work on both in parallel. The main feature of this talk was that the cartoon makers built something very similar to a simple “game” engine using 3ds Max plugins, and generated all boat movement and animation using a spline-based editor. In addition, they wrote their own particle systems to generate the water splashes automatically from the path of the boats and choppiness of the water.
As a result almost nothing was hand-animated, apart from the paths that the boats followed and any additional “polish” effects they wanted to add. Slap in a different background per episode, and they could chunter out this stuff at the rate of one 26-minute episode per week. Result!
I just have the word “ADVERTS” written in my notes for this session. For a start, the 2 EA presentations were about their current projects.
A near word-for-word recap of the previous SSX talk, by the same presenter. I was hanging on expecting some kind of deviation – surely he couldn't be so brazen as to do the same PR job twice? Yes he could.
An engaging demonstration of how they made this mind-boggling CG intro. This used the full range of techniques normally used by the film industry, including a massive mo-cap studio and dozens of professional martial arts stuntmen. The attention to detail and costs involved were extraordinary, and I wonder if they can justify the expense to do this again.
Another extended advert for EA from a recent refugee from a very senior post at ILM. Basically showed of NFS and the last Bond game and said “you need good tools and the ability to preview stuff in real-time”.
The one everyone had been waiting for... and a bit of a disappointment. Part of the problem was that the presenter's (the concept designer's) delivery was extremely dry. The talk mainly consisted of detailing how he wanted the architecture of City 17 to look realistic and mix styles. There wasn't much new footage, and the examples he showed actually didn't show off the nice rendering engine to its full advantage – there were a lot of static screen shots that, while quite realistic looked rather bland. From the new footage shown, the animation was still a bit ropey: lots of foot-sliding, and rather stilted facial stuff. Ironically, the game, so long in production, was starting to look a bit old-hat!
Christian's panel played to a quite full house. While you got the impression that the panellists were less forthcoming than when they'd had a chat the day before, there were still quite a few ideas kicking around.
The two panellists from EA were looking at the perspective of generating masses of content, so focussed on things such as automatic content generation (e.g. landscapes, scanning of model data). One of them also thought that the days of “just hacking a solution” were gone, as with ever-bigger teams it didn't work any more. [It then occurred to me that this is edging further towards the CGI film industry production model, with more fixed/structure pipelines than current game development where things can still be radically reworked even at a late stage]. There was also a lot of concern about whether just “super-sizing” existing teams would work any more, or whether you needed a much more structured team model: EA already have different “levels” of art team, and also different sets of tools teams. It certainly seems that EA are working towards having central teams producing technology and others producing content, although they were obviously a bit cagey about revealing details.
Toby Saulnier of Vicarious Visions looked at the problem from the opposite end; she is at a smaller studio trying to keep up with the big boys. It seemed that they wouldn't even try to compete in terms of scale, but was still looking at re-using other people's resources e.g. Tony Hawks and Spiderman assets from Activision, data directly from film studios if doing conversions etc, or using more outsourcing. She also made a good point about whether producing enormous content was actually worth it, that is, did it add anything to the game?
One other thing that was obvious in retrospect was that, similar to cross platform development where there is a top-resolution art resource which is scaled down for less powerful machines (e.g. PC/X-Box original version, cut-down PS2 version), or LOD schemes, you could create art assets at extremely high resolution/detail, high enough for future console iterations, and just algorithmically chop the quality down a bit for PS3. That way you wouldn't need to build nearly as many model versions.