research

Thursday, May 7, 2009

A trick for the pbs scripts on usc high performance computing cluster (HPC)

I'm trying to run LDA for different numbers of topics. I wrote a
bunch of m-files with all different number of topics hard coded but
then I realized it was being dumb. Actually, it might have saved time
this time, but in the long run I wanted to figure out the hpc pbs
scripts work. Basically its possible to pass variables from the shell
into the pbs script (you can't use the standard ENV vars b/c the job
is forked to different machines). The good idea I had was to pass the
qsub script the command I want to run (distribute) in the variable
CMD. This takes the responsibility out of the pbs script and puts it
back into a normal shell command, which I think is easier for my
purpos. All that's in the run.pbs script is:


#!/bin/bash

source source /usr/usc/matlab/default/setup.sh     #puts matlab on path
cd /auto/rcf-proj3/sn/kazemzad/machineLearningTest #goes to the dir that I want


echo $CMD             #assumes $CMD is passed using the -v switch
$CMD;                 #runs #CMD

(END)

Here's an example of how to use it for a test matlab script:


qsub -v CMD="matlab -nosplash -nodesktop -r \"2+2,exit\"" run.pbs

Here's an example of trying to put it all together for a number of topics


for x in 10 50 100 200 300 400 500 600 700 800 900 1000; do echo $x; qsub -l walltime=23:59:59,nodes=1:ppn=2 -v CMD="matlab -nosplash -nodesktop -r \"numTopicsExperiment($x),exit\"" run.pbs; done

This just forks all the different experiments with $x as a parameter.
These commands take 7+ hours to run, so I'm not actually sure it
works. At least it hasn't barfed yet. If I don't post back, assume
it worked and try it yourself if it suits your needs.

Thursday, March 19, 2009

prtools svm on linux

the last few days I've been trying to get svm working with prtools. Prtools seems convenient, but I got an error that qld wasn't found and that it was using the matlab quadratic programming function instead. I found a forum where they have a download for the qld.c function. It was easy to compile with mex and it seems to be running much faster now. I don't know why they didn't include the .c file to begin with, but I figured it out anyway.... here's the link to the forum http://prsysdesign.net/index.php/forums/viewthread/22/

Tuesday, March 3, 2009

Granule for Leither cardfile methodology on linux

I started trying granule for keeping a flashcards so I can remember stuff better. I have a lot of reading for my two classes and for my research. So far I like it, though I haven't tried other programs. Granule http://granule.sourceforge.net/ is not too fancy. Figuring out the user interface was a tiny bit non-intuitive, but not too hard either. One thing that I had some trouble with was figuring out how to name a particular deck of cards. It turns out you have to do a "save as" to actually rename the deck.

I looked a little for other linux flash card programs but I couldn't find to many that looked robust (my criterion was for it to be in portage). I found some linux flashcards for reviewing linux commands : http://www.proprofs.com/flashcards/search.php?search=Linux

in the future, I'll post some of the flash card files I make.

Friday, April 25, 2008

user modeling idea in jelinek, statisticial methods for speech processing

there was a short idea for user modeling in Jelinek 2001 that I wanted to make a note of.

"A vocabulary is particular to a speaker and his task in at least four ways..."
1) habits of expression (related to things like level of education
2) domain of discourse
3) current interests
4) current document/subject

Monday, April 21, 2008

Fuzzy logic progress

Before meeting w/ Prof. Mendel about my fuzzy logic project, I wanted to collect my thoughts here so that our meeting would be more productive.

First, I'll start by going over the methodology that I used. The web survey was used to collect data from a fair amount of people (32), between 2 different surveys. the stimuli for the first survey was reported in the class final project [ Prof. Mendel's class project]. It used 7 emotion words as stimuli (angry, disgusted, fearful, happy, neutral, sad, surprised) and also 3 modifiers (very, sort of, not). The second experiment [ interspeech 2008 submission ] used 40 words from mood labels of blogs (the site livejournal.com, to be precise). In the interspeech paper submission, I combined these two experiments to make a computing with words type application where the words from the first experiment (excluding the modifiers) were used as inputs and the output were the words from the second experiment. The results agreed pretty well intuitively and based on an small evaluation. However, using the averaged midpoints of the type-2 fuzzy sets to find the euclidean distance gave about the same performance.

Some points that I want to ask him about are:

- is the 3-D approach (valence, activation, and dominance) and the combination methods (sum, product) valid?

- are my conclusions correct: is the fact that Euclidean distance is comparable to the fuzzy/jaccard distance metric evidence that we don't gain much by moving to type-2 fuzzy logic?

Some other ideas:

- use less data, or at least be more person-specific. This would allow me to see more interpersonal differences, which goes well with my user modeling interests. Also, I think it's clear that the data was a bit noisy, especially in the dominance dimension.

user modeling for emotions in text idea

========================================
user modeling for emotions in text idea
========================================

:author: Abe Kazemzadeh
:date: 2008-04-21

This weekend I had an idea about how to implement a user model of emotions in text. Using the liveJournal blog data [Mishne Dissertation], I can try to make a mixture language model, like described in [Stolke Dissertation]. The user model will be the mixture priors (am I using the terminology correctly?). This will give me a good change to move away from the tfidf approach into the language modeling framework, which I haven't explored enough.

TODO: make a simple example of the mixture language model, play around with srilm toolkit, apply to data [liveJournal, IEMOCAP]

Wednesday, September 5, 2007

Fuzzy Logic, Tues 4 Sept. 2007

This class extended the previous class and brought up combining fuzzy statements by composition. These can be visualized by a relational matrix or a saggital diagram. These compositions can be in the same or different product spaces (if different, the cartesian product is ussed. Eg's... same product space: "x is far away from y or x is close to y"; different product space "x is close to y and y is near z". Also, the composition of two fuzzy relations can also apply when one of the fuzzy relations is just a fuzzy set. This special case is important in the representation of rules.