Player modeling in TORCS exploiting SVMs and GPUs parallelism
Contents
Part 1: Project profile
Project name
Player modeling in TORCS exploiting SVMs and GPUs parallelism
Project short description
This project is aimed at building a TORCS' driving agent, using SVMs (Support Vector Machines) as the main Machine Learning algorithm. This agent (also called Bot) must be able to drive versus other agents, with comparable or better performances. To fulfill this achievement, and to be able to train SVMs in feasible time, this work will initially focuses on using the CUDA architecture to rewrite and optimize a well-known SVM Toolkit, LibSVM.
Dates
Start date: ~2008/03/01
End date: ~2009/03/01
Website(s)
People involved
Project head(s)
P. L. Lanzi - User:PierLucaLanzi
D. Loiacono - User:DanieleLoiacono
Other Politecnico di Milano people
None.
Students currently working on the project
Fabrizio Bonfanti - User:FabrizioBonfanti
Students who worked on the project in the past
None.
External personnel
None.
Laboratory work and risk analysis
This is meant to be a software project, and all required hardware is used remotely, so no laboratory work is expected. No risks are associated to this project, hopefully.
Part 2: Project description
Currently, my CUDA implementation is being developed, and has already survived lots of problems. Read on for a small briefing of the most important milestones.
The really first implementation
This implementation was the first funcional one, it supported c-SVM and was tested against a good package of datasets.
Pros:
- Same results (nu, rho, sv, bsv, prediction %) compared to LibSVM
Cons:
- The Q matrix (#data x #data) was completely calculated before the real training
- Memory cost
- Speed cost
- Could handle problems with less than ~7000 vectors
- Slower than LibSVM on nearly all the training datasets
The second implementation
Since results weren't as good as I expected, I decided to take another direction, effectively trashing the old work and starting fron scratch.
Pros:
- New algorithm, focalised on c-SVM without any LibSVM's optimization
- Nearly same results (nu, rho, sv, bsv, prediction %) compared to LibSVM
- Lines of Q are now calculated on demand
- Freed a lot of memory
- Can solve problems of nearly any cardinality (limited by the GPU memory, however the biggest correctly solved was ~60000 vectors)
- Reached 3x speedup on a pair of dataset, still slower than 1x on the vast majority
- Tested on more datasets
Cons:
- Necessity of some kind of Q-caching, not easy to do on this implementation
- Time depends heavily on the number of iterations, since rows are calculated on demand
- Why the speedup is so variable was still not clear
The third implementation
My assignment changed, and the implementation of epsilon-SVR was required too. Looking at the math behind the two earlier implementations, I saw that there really weren't a lot of conceptual differences, so I decided to return back on the first implementation, adding the features that were working on the second one. Speedup is very variable, but it seems that this implementation works better with big problems. I suspect that the major point of speedup is about calculate the lines of Q, so problems with higher cardinality gain effectively a lot on this phase.
Pros:
- epsilon-SVR ossibility added to the c-SVC implementation
- Same results (nu, rho, sv, bsv, prediction %) compared to LibSVM
- Lines of Q are now calculated on demand
- Freed a lot of memory
- Lines of Q are now cached
- Speed gain
- Can solve problems of nearly any cardinality (limited by the GPU memory, however the biggest correctly solved was ~60000 vectors)
- Reached 4x speedup on a pair of dataset, still slower than 1x on the smaller problems
- Tested on more and more datasets
Cons:
- Some point of the algorithm are nearly blocking
- Overlaps of data transfer and CUDA executions are not exploited
- The algorithm itself is very serial
- Caching needs to serialize the lookup phase (CPU) and the calculus (GPU, if required)
- Time depends heavily on the number of iterations, since rows are calculated on demand
Now
Now, I'm working again on the last implementation. I'm testing times/performances on a lot of datasets, and I'm also studying how the SVM's parameters can change the overall performance.
I'm also looking if parallelizing multiple SVMs is feasible. My guess is that it is possible, but looking at the nature of the algorithm, I think the performance gain wouldn't be so high.
Part 3: Useful links
TORCS - http://torcs.sourceforge.net/