My Photo

  Deniz Yuret's Homepage

I am an assistant professor in Computer Engineering
 at Koç University  in
Istanbul working at the Artificial Intelligence Laboratory
. Previously I was at the MIT AI Lab
 and later co-founded Inquira, Inc.
 My research is in natural language processing
and machine learning. This year I am helping organize SemEval-2012
, *SEM
, CMN
 and LREC
. For prospective students here are
some research topics
, papers
, classes
, blog posts
 and past students
.
	
Koç Üniversitesi  Bilgisayar
Mühendisliği Bölümü 'nde öğretim üyesiyim ve
Yapay Zeka Laboratuarı 'nda çalışıyorum. Bundan
önce MIT Yapay Zeka Laboratuarı 'nda çalıştım
ve Inquira, Inc.  şirketini kurdum. Araştırma
konularım doğal dil işleme ve yapay öğrenmedir. Bu yıl SemEval-2012
, *SEM
, CMN
 ve LREC
'te görev alıyorum. İlgilenen
öğrenciler için araştırma konuları
, makaleler
, verdiğim
dersler , Türkçe
yazılarım
, ve
mezunlarımız .





    July 15, 2012


      ACL-EMNLP 2012 Highlights
      


We demonstrated 80% unsupervised part-of-speech induction accuracy in
our EMNLP-2012 paper

using paradigmatic representations of word context and co-occurrence
modeling. Here are some interesting talks I attended at ACL-EMNLP this year.

Check out this tutorial:

Inderjeet Mani; James Pustejovsky
Qualitative Modeling of Spatial Prepositions and Motion Expressions
http://aclweb.org/supplementals/P/P12/P12-4001.Presentation.pdf

And this paper from today on learning language from navigation
instructions and user behavior was interesting:

David Chen
Fast Online Lexicon Learning for Grounded Language Acquisition
http://aclweb.org/anthology-new/P/P12/P12-1045.pdf

All papers can be found on the ACL Anthology page:
http://aclweb.org/anthology-new/P/P12/

Here is a work that builds on CCM for unsupervised parsing that works
better on longer sentences:

P12-2004 [bib]: Dave Golland; John DeNero; Jakob Uszkoreit
A Feature-Rich Constituent Context Model for Grammar Induction
http://aclweb.org/anthology-new/P/P12/P12-2004.pdf

Here is another interesting talk about grounded language acquisition,
computer learning to follow instructions in a virtual world. They have
collected nice corpora of instructions and behaviors of people following
those instructions.

P12-2036 [bib]: Luciana Benotti; Martin Villalba; Tessa Lau; Julian Cerruti
Corpus-based Interpretation of Instructions in Virtual Environments
http://aclweb.org/anthology-new/P/P12/P12-2036.pdf

And here is another similar work on grounded language acquisition:

P12-1045 [bib]: David Chen
Fast Online Lexicon Learning for Grounded Language Acquisition
http://aclweb.org/anthology-new/P/P12/P12-1045.pdf

I nominate this as the best paper on computer humor generation of ACL
2012. Note the resources ConceptNet and SentiNet mentioned in this work
which may be independently useful.

P12-2030 [bib]: Igor Labutov; Hod Lipson
Humor as Circuits in Semantic Networks
http://aclweb.org/anthology-new/P/P12/P12-2030.pdf

We have been working on reordering for SMT. Here is a paper that
modifies distortion matrices instead to allow more flexible reorderings.

P12-1050 [bib]: Arianna Bisazza; Marcello Federico
Modified Distortion Matrices for Phrase-Based Statistical Machine
Translation
http://aclweb.org/anthology-new/P12-1050.pdf

Interesting tree transformation operations.

Transforming Trees to Improve Syntactic Convergence
D. Burkett and D. Klein .
http://aclweb.org/anthology-new/D/D12/D12-1079.pdf

The following paper utilizes n-gram language models in unsupervised
dependency parsing:

D12-1028 [bib]: David Mareček; Zdeněk Žabokrtský
Exploiting Reducibility in Unsupervised Dependency Parsing
http://aclweb.org/anthology-new/D/D12/D12-1028.pdf

Another reordering paper from EMNLP. Mentions a string to tree version
of Moses that is publicly available.

D12-1077 [bib]: Graham Neubig; Taro Watanabe; Shinsuke Mori
Inducing a Discriminative Parser to Optimize Machine Translation Reordering
http://aclweb.org/anthology-new/D/D12/D12-1077.pdf

Must read paper from emnlp. Very likely our paradigmatic representation
would do better here.

D12-1130 [bib]: Carina Silberer; Mirella Lapata
Grounded Models of Semantic Representation

Full post...

0 comments






    June 10, 2012


      KUTalks - Bilgisayarlara neden dil öğretmeye çalışıyoruz
      


Kaynak: Koç Üniversitesi'nin Youtube kanalı.

Full post...


Labels: Türkçe
 0
comments






    June 04, 2012


      Language visualization
      

This is a presentation on our ongoing language visualization project by
Emre Unal. We thank the Alice project  at CMU for
giving us the platform for 3D visualization. This work is inspired by
the work of Patrick Winston
's Genesis Group
 at MIT and Bob Coyne
's WordsEye
 project. We are also working
on going from vision to language as demonstrated in this video
.
Full post...

Labels: Notes  0
comments






    May 24, 2012


      Learning Syntactic Categories Using Paradigmatic Representations
      of Word Context
      


Mehmet Ali Yatbaz, Enis Sert, Deniz Yuret. EMNLP 2012. (Download the
paper , presentation
,
code , fastsubs paper
, lm training data
 (250MB), wsj substitute data 
(1GB), scode visualization demo

(may take a few minutes to load). More up to date versions of the code
can be found at github .)



*Abstract:* We investigate paradigmatic representations of word context
in the domain of unsupervised syntactic category acquisition.
Paradigmatic representations of word context are based on potential
substitutes of a word in contrast to syntagmatic representations based
on properties of neighboring words. We compare a bigram based baseline
model with several paradigmatic models and demonstrate significant gains
in accuracy. Our best model based on Euclidean co-occurrence embedding
combines the paradigmatic context representation with morphological and
orthographic features and achieves 80% many-to-one accuracy on a 45-tag
1M word corpus.
Full post...


Labels: Publications
 0 comments






      FASTSUBS: An Efficient Admissible Algorithm for Finding the Most
      Likely Lexical Substitutes Using a Statistical Language Model
      


Deniz Yuret, 2012. (Download the paper
, code
, and data 
(1GB). Our EMNLP-2012

paper uses FASTSUBS to get the best published result in part of speech
induction.).

*Abstract:* Lexical substitutes have found use in the context of word
sense disambiguation, unsupervised part-of-speech induction,
paraphrasing, machine translation, and text simplification. Using a
statistical language model to find the most likely substitutes in a
given context is a successful approach, but the cost of a naive
algorithm is proportional to the vocabulary size. This paper presents
the Fastsubs algorithm which can efficiently and correctly identify the
most likely lexical substitutes for a given context based on a
statistical language model without going through most of the vocabulary.
The efficiency of Fastsubs makes large scale experiments based on
lexical substitutes feasible. For example, it is possible to compute the
top 10 substitutes for each one of the 1,173,766 tokens in Penn Treebank
in about 6 hours on a typical workstation. The same task would take
about 6 days with the naive algorithm. An implementation of the
algorithm and a dataset with the top 100 substitutes of each token in
the WSJ section of the Penn Treebank are available from the author's
website at http://goo.gl/jzKH0.
Full post...


Labels: Publications
 0 comments




Older Posts

Home 
Subscribe to: Posts (Atom)

My Photo 

Deniz Yuret
Koç University
İstanbul Turkey
+90-212-338-1724
dyuret@ku.edu.tr




  * Home 
  * AI Lab 
  * GitHub 
  * Scholar 




  * Türkçe
     (49)
  * Notes  (41)
  * Publications
     (33)
  * Books  (24)
  * Downloads  (10)
  * Math  (10)
  * Links  (8)
  * Students  (8)
  * Classes  (1)
  * Projects  (1)




    *Teaching* 
    Comp130  Introduction to Programming

    *Curriculum Vitae* 
    2002-curr Koç University 
    2000-2002 Inquira, Inc. 
    1988-2000 B.S.  M.S.
     Ph.D.  MIT
    
    1985-1988 İzmir Fen Lisesi 

    *Bibliography*
    

    *Dissertation*  


  Deniz Yuret's Homepage 

Showing posts with label *Publications*. Show all posts

Showing posts with label *Publications*. Show all posts



    May 24, 2012


      Learning Syntactic Categories Using Paradigmatic Representations
      of Word Context
      


Mehmet Ali Yatbaz, Enis Sert, Deniz Yuret. EMNLP 2012. (Download the
paper , presentation
,
code , fastsubs paper
, lm training data
 (250MB), wsj substitute data 
(1GB), scode visualization demo

(may take a few minutes to load). More up to date versions of the code
can be found at github .)



*Abstract:* We investigate paradigmatic representations of word context
in the domain of unsupervised syntactic category acquisition.
Paradigmatic representations of word context are based on potential
substitutes of a word in contrast to syntagmatic representations based
on properties of neighboring words. We compare a bigram based baseline
model with several paradigmatic models and demonstrate significant gains
in accuracy. Our best model based on Euclidean co-occurrence embedding
combines the paradigmatic context representation with morphological and
orthographic features and achieves 80% many-to-one accuracy on a 45-tag
1M word corpus.
Full post...


Labels: Publications
 0 comments






      FASTSUBS: An Efficient Admissible Algorithm for Finding the Most
      Likely Lexical Substitutes Using a Statistical Language Model
      


Deniz Yuret, 2012. (Download the paper
, code
, and data 
(1GB). Our EMNLP-2012

paper uses FASTSUBS to get the best published result in part of speech
induction.).

*Abstract:* Lexical substitutes have found use in the context of word
sense disambiguation, unsupervised part-of-speech induction,
paraphrasing, machine translation, and text simplification. Using a
statistical language model to find the most likely substitutes in a
given context is a successful approach, but the cost of a naive
algorithm is proportional to the vocabulary size. This paper presents
the Fastsubs algorithm which can efficiently and correctly identify the
most likely lexical substitutes for a given context based on a
statistical language model without going through most of the vocabulary.
The efficiency of Fastsubs makes large scale experiments based on
lexical substitutes feasible. For example, it is possible to compute the
top 10 substitutes for each one of the 1,173,766 tokens in Penn Treebank
in about 6 hours on a typical workstation. The same task would take
about 6 days with the naive algorithm. An implementation of the
algorithm and a dataset with the top 100 substitutes of each token in
the WSJ section of the Penn Treebank are available from the author's
website at http://goo.gl/jzKH0.
Full post...


Labels: Publications
 0 comments






    July 30, 2011


      RegMT System for Machine Translation, System Combination, and
      Evaluation
      


Ergun Bicici; Deniz Yuret. /Proceedings of the Sixth Workshop on
Statistical Machine Translation./ pp. 323-329. Edinburgh, Scotland.
July, 2011. (PDF ,
BIB , Proceedings
, Poster
)


*Abstract:* We present the results we obtain using our RegMT system,
which uses transductive regression techniques to learn mappings between
source and target features of given parallel corpora and use these
mappings to generate machine translation outputs. Our training instance
selection methods perform feature decay for proper selection of training
instances, which plays an important role to learn correct feature
mappings. RegMT uses L2 regularized regression as well as L1 regularized
regression for sparse regression estimation of target features. We
present translation results using our training instance selection
methods, translation results using graph decoding, system combination
results with RegMT, and performance evaluation with the
F1 measure over target features as a metric for evaluating translation
quality.

Full post...

Related link 
Labels: Publications
 0 comments






      Instance Selection for Machine Translation using Feature Decay
      Algorithms
      


Ergun Bicici; Deniz Yuret. /Proceedings of the Sixth Workshop on
Statistical Machine Translation./ pp. 272-283. Edinburgh, Scotland.
July, 2011. (PDF ,
BIB , Proceedings
, Presentation
)


*Abstract:* We present an empirical study of instance selection
techniques for machine translation. In an active learning setting,
instance selection minimizes the human effort by identifying the most
informative sentences for translation. In a transductive learning
setting, selection of training instances relevant to the test set
improves the final translation quality. After reviewing the state of the
art in the field, we generalize the main ideas in a class of instance
selection algorithms that use feature decay. Feature decay algorithms
increase diversity of the training set by devaluing features that are
already included. We show that the feature decay rate has a very strong
effect on the final translation quality whereas the initial feature
values, inclusion of higher order features, or sentence length
normalizations do not. We evaluate the best instance selection methods
using a standard Moses baseline using the whole 1.6 million sentence
English-German section of the Europarl corpus. We show that selecting
the best 3000 training sentences for a specific test sentence is
sufficient to obtain a score within 1 BLEU of the baseline, using 5% of
the training data is sufficient to exceed the baseline, and a ~ 2 BLEU
improvement over the baseline is possible by optimally selected subset
of the training data. In out-of-domain translation, we are able to
reduce the training set size to about 7% and achieve a similar
performance with the baseline.

Full post...

Related link 
Labels: Publications
 0 comments






    October 27, 2010


      Anharmonicity, mode-coupling and entropy in a fluctuating native
      protein
      


A. Kabakçıoğlu, D. Yuret, M. Gür, B. Erman. 2010 Phys. Biol. 7 046005
doi: 10.1088/1478-3975/7/4/046005
 (PDF
,
PDF
,
HTML , Online
, Hermite code
)

*Abstract:* We develop a general framework for the analysis of residue
fluctuations that simultaneously incorporates anharmonicity and
mode-coupling in a unified formalism. We show that both deviations from
the Gaussian model are important for modeling the multidimensional
energy landscape of the protein Crambin (1EJG) in the vicinity of its
native state. The effect of anharmonicity and mode-coupling on the
fluctuational entropy is in the order of a few percent.

Full post...

Related link 
Labels: Publications
 0 comments






    August 26, 2010


      Unsupervised Part of Speech Tagging Using Unambiguous Substitutes
      from a Statistical Language Model
      


Mehmet Ali Yatbaz and Deniz Yuret. Coling 2010. pp. 1391--1398. Beijing,
China. (PDF , Poster
)
*Abstract:* We show that unsupervised part of speech tagging performance
can be significantly improved using likely substitutes for target words
given by a statistical language model. We choose unambiguous substitutes
for each occurrence of an ambiguous target word based on its context.
The part of speech tags for the unambiguous substitutes are then used to
filter the entry for the target word in the word--tag dictionary. A
standard HMM model trained using the filtered dictionary achieves 92.25%
accuracy on a standard 24,000 word corpus.

Full post...

Related link 
Labels: Publications
 0 comments






    July 15, 2010


      SemEval-2010 Task 12: Parser Evaluation using Textual Entailments
      


Deniz Yuret, Aydın Han, Zehra Turgut. /Proceedings of the 5th
International Workshop on Semantic Evaluation. (SemEval-2010
)/ pp. 51--56. July, 2010. Uppsala,
Sweden. (PDF ,
Presentation
, Task
website , Proceedings
, Journal submission
).



*Abstract:* Parser Evaluation using Textual Entailments (PETE) is a
shared task in the SemEval-2010 Evaluation Exercises on Semantic
Evaluation. The task involves recognizing textual entailments based on
syntactic information alone. PETE introduces a new parser evaluation
scheme that is formalism independent, less prone to annotation error,
and focused on semantically relevant distinctions.

Full post...

Related link 
Labels: Publications
 0 comments






      L1 Regularized Regression for Reranking and System Combination in
      Machine Translation
      


Ergun Bicici, Deniz Yuret. /Proceedings of the Joint Fifth Workshop on
Statistical Machine Translation and MetricsMATR./ pp. 282--289. July
2010. Uppsala, Sweden. (PDF
, Slide
,
Poster
)

*Abstract:* We use L1 regularized transductive regression to learn
mappings between source and target features of the training sets derived
for each test sentence and use these mappings to rerank translation
outputs. We compare the effectiveness of L1 regularization techniques
for regression to learn mappings between features given in a sparse
feature matrix. The results show the effectiveness of using L1
regularization versus L2 used in ridge regression. We show that
regression mapping is effective in reranking translation outputs and in
selecting the best system combinations with encouraging results on
different language pairs.

Full post...

Related link 
Labels: Publications
 0 comments






    February 21, 2010


      Preprocessing with Linear Transformations that Maximize the
      Nearest Neighbor Classification Accuracy
      


Mehmet Ali Yatbaz and Deniz Yuret. /1st CSE Student Workshop (CSW’10)/,
21 February 2010, Koc Istinye Campus, Istanbul. (PDF
,
PPT )

*Abstract*
We introduce a preprocessing technique for classification problems based
on linear transformations. The algorithm incrementally constructs a
linear transformation that maximizes the nearest neighbor classification
accuracy on the training set. At each iteration the algorithm picks a
point in the dataset, and computes a transformation
that moves the point closer to points in its own class and/or away from
points in other classes. The composition of the resulting linear
transformations lead to statistically significant improvements in
instance based learning algorithms.



Full post...

Related link


Labels: Publications
 0 comments






      L1 Regularization for Learning Word Alignments in Sparse Feature
      Matrices
      


Ergun Bicici and Deniz Yuret. /1st CSE Student Workshop (CSW’10)/, 21
February 2010, Koc Istinye Campus, Istanbul. (PDF
,
Poster
)

*Abstract*
Sparse feature representations can be used in various domains. We
compare the effectiveness of regularization techniques for regression to
learn mappings between features given in a sparse feature matrix. We
apply these techniques for learning word alignments commonly used for
machine translation. The performance of the learned mappings are
measured using the phrase table generated on a larger corpus by a state
of the art word aligner. The results show the effectiveness of using
regularization versus used in ridge regression.

Full post...

Related link


Labels: Publications
 0 comments






    February 19, 2010


      The Noisy Channel Model for Unsupervised Word Sense Disambiguation
      


Deniz Yuret and Mehmet Ali Yatbaz. /Computational Linguistics, Volume
36, Number 1 , March 2010./
(Abstract
,
PDF )



*Abstract:* We introduce a generative probabilistic model, the noisy
channel model, for unsupervised word sense disambiguation. In our model,
each context C is modeled as a distinct channel through which the
speaker intends to transmit a particular meaning S using a possibly
ambiguous word W. To reconstruct the intended meaning the hearer uses
the distribution of possible meanings in the given context P(S|C) and
possible words that can express each meaning P(W|S). We assume P(W|S) is
independent of the context and estimate it using WordNet sense
frequencies. The main problem of unsupervised WSD is estimating context
dependent P(S|C) without access to any sense tagged text. We show one
way to solve this problem using a statistical language model based on
large amounts of untagged text. Our model uses coarse-grained semantic
classes for S internally and we explore the effect of using different
levels of granularity on WSD performance. The system outputs fine
grained senses for evaluation and its performance on noun disambiguation
is better than most previously reported unsupervised systems and close
to the best supervised systems.

Full post...


Labels: Publications
 0 comments






    December 11, 2009


      Unsupervised morphological disambiguation using statistical
      language models
      


Mehmet Ali Yatbaz and Deniz Yuret. /NIPS 2009 Workshop on Grammar
Induction, Representation of Language and Language Learning./ December
2009. (PDF
,
Poster
)
*Abstract:*
In this paper, we present a probabilistic model for the unsupervised
morphological disambiguation problem. Our model assigns morphological
parses T to the contexts C instead of assigning them to the words W. The
target word determines the possible parse set that can be used in 's
context . To assign the correct morphological parse to , our model finds
the parse that maximizes . 's are estimated using a statistical language
model and the vocabulary of the corpus. The system performs
significantly better than an unsupervised baseline and its performance
is close to a supervised baseline.

Full post...

Labels: Publications
 0 comments






    August 04, 2009


      Modeling Morphologically Rich Languages Using Split Words and
      Unstructured Dependencies
      


Deniz Yuret and Ergun Bicici. In /the Joint conference of the 47th
Annual Meeting of the Association for Computational Linguistics and the
4th International Joint Conference on Natural Language Processing of the
Asian Federation of Natural Language Processing (ACL-IJCNLP 2009)/ (PDF
).



*Abstract:* We experiment with splitting words into their stem and suffix
components for modeling morphologically rich languages. We show that
using a morphological analyzer and disambiguator results in a significant
perplexity reduction in Turkish. We present flexible n-gram models,
Flex-Grams, which assume that the n−1 tokens that determine the
probability of a given token can be chosen anywhere in the sentence
rather than the preceding n − 1 positions. Our final model achieves 27%
perplexity reduction compared to the standard n-gram model.


Full post...

Related link 
Labels: Publications
 0 comments






    July 22, 2009


      Morphological cues vs. number of nominals in learning verb types
      in Turkish: The syntactic bootstrapping mechanism revisited
      


A. Engin Ural; Deniz Yuret; F. Nihan Ketrez; Dilara Koçbaş; Aylin C.
Küntay. /Language and Cognitive Processes, 24(10), pp. 1393-1405,
December 2009/ (PDF
,
PDF
,
HTML
).

Abstract: The syntactic bootstrapping mechanism of verb learning was
evaluated against child-directed speech in Turkish, a language with rich
morphology, nominal ellipsis and free word order. Machine-learning
algorithms were run on transcribed caregiver speech directed to two
Turkish learners (one hour every two weeks between 0;9 to 1;10) of
different socioeconomic backgrounds. We found that the number of
nominals in child-directed utterances plays a small, but significant,
role in classifying transitive and intransitive verbs. Further, we found
that accusative morphology on the noun is a strong cue in clustering
verb types. We also found that verbal morphology (past tense and
bareness of verbs) is useful in distinguishing between different
subtypes of intransitive verbs. These results suggest that syntactic
bootstrapping mechanisms should be extended to include morphological
cues to verb learning in morphologically rich languages.

Keywords: Language development; Turkish; Child-directed speech,
Syntactic bootstrapping; Morphology

Full post...

Related link

Labels: Publications
 0 comments






    March 03, 2009


      Classification of semantic relations between nominals
      


Roxana Girju, Preslav Nakov, Vivi Nastase, Stan Szpakowicz, Peter Turney
and Deniz Yuret. /Language Resources and Evaluation, 2009, 43(2),
105-121./ (PDF
,
PDF ,
HTML ).

Abstract: The NLP community has shown a renewed interest in deeper
semantic analyses, among them automatic recognition of semantic
relations in text. We present the development and evaluation of a
semantic analysis task: automatic recognition of relations between pairs
of nominals in a sentence. The task was part of SemEval-2007, the fourth
edition of the semantic evaluation event previously known as SensEval.
Apart from the observations we have made, the long-lasting effect of
this task may be a framework for comparing approaches to the task. We
introduce the problem of recognizing relations between nominals, and in
particular the process of drafting and refining the definitions of the
semantic relations. We show how we created the training and test data,
list and briefly describe the 15 participating systems, discuss the
results, and conclude with the lessons learned in the course of this
exercise.

Full post...

Related link 
Labels: Publications
 0 comments






    October 31, 2008


      Morphological cues vs. number of nominals in learning verb types
      in Turkish: Syntactic bootstrapping mechanism revisited
      


Deniz Yuret, A. Engin Ural, Nihan Ketrez, Dilara Koçbaş and Aylin C.
Küntay. In /The Boston University Conference on Language Development
(BUCLD)/ (Long abstract
, poster
, PDF
)

Abstract: The syntactic bootstrapping mechanism of verb classification
was evaluated against child-directed speech in Turkish, a language with
rich morphology, nominal ellipsis and free word order. Machine-learning
algorithms were run on transcribed caregiver speech (12,276 and 20,687
utterances) directed to two Turkish learners (one hour every two weeks
between 0,9 to 1;10) of different socioeconomic backgrounds. The corpora
contained 12,276 and 20,687 child-directed utterances. Study 1 found
that the number of nominals in child-directed utterances plays some role
in classifying transitive and intransitive verbs. Study 2 found that
accusative morphology on the noun is a stronger cue in clustering verb
types. Study 3 found that verbal morphology is useful in distinguishing
between different subtypes of intransitive verbs. These results suggest
that syntactic bootstrapping mechanisms should be extended to include
morphological cues to verb learning in morphologically rich languages.

Full post...

Related link

Labels: Publications
 0 comments






    August 16, 2008


      Discriminative vs. Generative Approaches in Semantic Role Labeling
      


Deniz Yuret, Mehmet Ali Yatbaz and Ahmet Engin Ural. In /Proceedings of
The Twelfth Conference on Natural Language Learning (CoNLL-2008)/ (PDF
, ACM
)

Abstract: This paper describes the two algorithms we developed for the
CoNLL 2008 Shared Task “Joint learning of syntactic and semantic
dependencies”. Both algorithms start parsing the sentence using the same
syntactic parser. The first algorithm uses machine learning methods to
identify the semantic dependencies in four stages: identification and
labeling of predicates, identification and labeling of arguments. The
second algorithm uses a generative probabilistic model, choosing the
semantic dependencies that maximize the probability with respect to the
model. A hybrid algorithm combining the best stages of the two
algorithms attains 86.62% labeled syntactic attachment accuracy, 73.24%
labeled semantic dependency F1 and 79.93% labeled macro F1 score for the
combined WSJ and Brown test sets.

Full post...

Related link 
Labels: Publications
 0 comments






    June 15, 2008


      Smoothing a Tera-word Language Model
      


Deniz Yuret. In /The 46th Annual Meeting of the Association for
Computational Linguistics: Human Language Technologies (ACL-08: HLT)/
(Download PDF )
Download the software used in this study: glookup.tgz

reads ngram patterns (possibly containing wildcards) from stdin, finds
their counts in one pass from Google Web1T data, and prints the results.
glookup.pl

quickly searches for a given pattern in uncompressed Google Web1T data.
Use the first one for bulk processing, the second one to get a few
counts quickly.




Abstract: Frequency counts from very large corpora, such as the Web 1T
dataset, have recently become available for language modeling. Omission
of low frequency n-gram counts is a practical necessity for datasets of
this size. Naive implementations of standard smoothing methods do not
realize the full potential of such large datasets with missing counts.
In this paper I present a new smoothing algorithm that combines the
Dirichlet prior form of (Mackay and Peto, 1995) with the modified
back-off estimates of (Kneser and Ney, 1995) that leads to a 31%
perplexity reduction on the Brown corpus compared to a baseline
implementation of Kneser-Ney discounting.

Full post...

Related link 
Labels: Downloads
, Publications
 0 comments






    June 28, 2007


      The CoNLL 2007 Shared Task on Dependency Parsing
      


Joakim Nivre, Johan Hall, Sandra Kübler, Ryan McDonald, Jens Nilsson,
Sebastian Riedel and Deniz Yuret. In /Proceedings of the 2007 Joint
Conference on Empirical Methods in Natural Language Processing and
Computational Natural Language Learning (EMNLP-CoNLL)/

Abstract: The Conference on Computational Natural Language Learning
features a shared task, in which participants train and test their
learning systems on the same data sets. In 2007, as in 2006, the shared
task has been devoted to dependency parsing, this year with both a
multilingual track and a domain adaptation track. In this paper, we
define the tasks of the different tracks and describe how the data sets
were created from existing treebanks for ten languages. In addition, we
characterize the different approaches of the participating systems,
report the test results, and provide
a first analysis of these results.

Full post...

Related link 
Labels: Publications
 0 comments






    June 23, 2007


      KU: Word Sense Disambiguation by Substitution
      


Deniz Yuret. In /Proceedings of the Fourth International Workshop on
Semantic Evaluations (SemEval-2007)/



Abstract: Data sparsity is one of the main factors that make word sense
disambiguation (WSD) difficult. To overcome this problem we need to find
effective ways to use resources other than sense labeled data. In this
paper I describe a WSD system that uses a statistical language model
based on a large unannotated corpus. The model is used to evaluate the
likelihood of various substitutes for a word in a given context. These
likelihoods are then used to determine the best sense for the word in
novel contexts. The resulting system participated in three tasks in the
SemEval 2007 workshop. The WSD of prepositions task proved to be
challenging for the system, possibly illustrating some of its
limitations: e.g. not all words have good substitutes. The system
achieved promising results for the English lexical sample and English
lexical substitution tasks.

Full post...

Related link 
Labels: Publications
 0 comments






      SemEval-2007 Task 04: Classification of Semantic Relations between
      Nominals
      


Roxana Girju, Preslav Nakov, Vivi Nastase, Stan Szpakowicz, Peter Turney
and Deniz Yuret. In /Proceedings of the Fourth International Workshop on
Semantic Evaluations (SemEval-2007)/



Abstract: The NLP community has shown a renewed interest in deeper
semantic analyses, among them automatic recognition of relations between
pairs of words in a text. We present an evaluation task designed to
provide a framework for comparing different approaches to classifying
semantic relations between nominals in a sentence. This is part of
SemEval, the 4th edition of the semantic evaluation event previously
known as SensEval. We define the task, describe the training/test data
and their creation, list the participating systems and discuss their
results. There were 14 teams who submitted 15 systems.

Full post...

Related link 
Labels: Publications
 0 comments






    April 11, 2007


      Locally Scaled Density Based Clustering
      


Ergun Biçici and Deniz Yuret. In /International Conference on Adaptive
and Natural Computing Algorithms (ICANNGA 2007), LNCS 4431, Part I,
Springer-Verlag. / (PDF
,
PS
,
Presentation
,
Code
,
Readme
)

Abstract: Density based clustering methods allow the identification of
arbitrary, not necessarily convex regions of data points that are
densely populated. The number of clusters does not need to be specified
beforehand; a cluster is defined to be a connected region that exceeds a
given density threshold. This paper introduces the notion of local
scaling in density based clustering, which determines the density
threshold based on the local statistics of the data. The local maxima of
density are discovered using a k-nearest-neighbor density estimation and
used as centers of potential clusters. Each cluster is grown until the
density falls below a pre-specified ratio of the center point’s density.
The resulting clustering technique is able to identify clusters of
arbitrary shape on noisy backgrounds that contain significant density
gradients. The focus of this paper is to automate the process of
clustering by making use of the local density information for
arbitrarily sized, shaped, located, and numbered clusters. The
performance of the new algorithm is promising as it is demonstrated on a
number of synthetic datasets and images for a wide range of its parameters.

Full post...

Related link


Labels: Publications
 0 comments






    December 01, 2006


      Quantum Mechanical Calculations of Tryptophan and Comparison with
      Conformations in Native Proteins
      


Ersin Yurtsever, Deniz Yuret and Burak Erman. /J. Phys. Chem. A, 2006,
110 (51), pp 13933–13938./ (PDF
)

Abstract: We report a detailed analysis of the potential energy surface
of N-acetyl-l-tryptophan-N-methylamide, (NATMA) both in the gas phase
and in solution. The minima are identified using the
density-functional-theory (DFT) with the 6-31g(d) basis set. The full
potential energy surface in terms of torsional angles is spanned
starting from various initial configurations. We were able to locate 77
distinct L-minima. The calculated energy maps correspond to the
intrinsic conformational propensities of the individual NATMA molecule.
We show that these conformations are essentially similar to the
conformations of tryptophan in native proteins. For this reason, we
compare the results of DFT calculations in the gas and solution phases
with native state conformations of tryptophan obtained from a protein
library. In native proteins, tryptophan conformations have strong
preferences for the β sheet, right-handed helix, tight turn, and bridge
structures. The conformations calculated by DFT, the solution-phase
results in particular, for the single tryptophan residue are in
agreement with native state values obtained from the Protein Data Bank.

Full post...

Related link 
Labels: Publications
 0 comments






    November 01, 2006


      The Greedy Prepend Algorithm for Decision List Induction
      


Deniz Yuret and Michael de la Maza. In /Proceedings of the 21st
International Symposium on Computer and Information Sciences (ISCIS
2006). LNCS 4263, Springer-Verlag/
Download a C implementation of the GPA algorithm with a Weka interface
here
,
presentation slides are here
,
the paper is here
.



Abstract: We describe a new decision list induction algorithm called the
Greedy Prepend Algorithm (GPA). GPA improves on other decision list
algorithms by introducing a new objective function for rule selection
and a set of novel search algorithms that allow application to large
scale real world problems. GPA achieves state-of-the-art classification
accuracy on the protein secondary structure prediction problem in
bioinformatics and the English part of speech tagging problem in
computational linguistics. For both domains GPA produces a rule set that
human experts find easy to interpret, a marked advantage in decision
support environments. In addition, we compare GPA to other decision list
induction algorithms as well as support vector machines, C4.5, naive
Bayes, and a nearest neighbor method on a number of standard data sets
from the UCI machine learning repository.

Full post...

Related link 
Labels: Downloads
, Publications
 0 comments






    October 08, 2006


      Secondary structure prediction using decision lists
      


Deniz Yuret and Volkan Kurt. Presentation at the /Protein-Bioinformatics
Workshop at Koç University Center for Computational Biology and
Bioinformatics./



Full post...


Labels: Publications
 0 comments






    June 21, 2006


      Clustering Word Pairs to Answer Analogy Questions
      


Ergun Biçici and Deniz Yuret. In /Proceedings of the Fifteenth Turkish
Symposium on Artificial Intelligence and Neural Networks (TAINN 2006)/

Abstract: We focus on answering word analogy questions by using
clustering techniques. The increased performance in answering word
similarity questions can have many possible applications, including
question answering and information retrieval. We present an analysis of
clustering algorithms' performance on answering word similarity
questions. This paper's contributions can be summarized as: (i) casting
the problem of solving word analogy questions as an instance of learning
clusterings of data and measuring the effectiveness of prominent
clustering techniques in learning semantic relations; (ii) devising a
heuristic approach to combine the results of different clusterings for
the purpose of distinctly separating word pair semantics; (iii)
answering SAT-type word similarity questions using our technique.

Full post...

Related link 
Labels: Publications
 0 comments






    June 08, 2006


      Dependency Parsing as a Classification Problem
      


Deniz Yuret. In /Proceedings of the Tenth Conference on Computational
Natural Language Learning (CoNLL-X)/



Abstract: This paper presents an approach to dependency parsing which
can utilize any standard machine learning (classification) algorithm. A
decision list learner was used in this work. The training data provided
in the form of a treebank is converted to a format in which each
instance represents information about one word pair, and the
classification indicates the existence, direction, and type of the link
between the words of the pair. Several distinct models are built to
identify the links between word pairs at different distances. These
models are applied sequentially to give the dependency parse of a
sentence, favoring shorter links. An analysis of the errors, attribute
selection, and comparison of different languages is presented.

Full post...

Related link 
Labels: Publications
 0 comments






    June 05, 2006


      Learning Morphological Disambiguation Rules for Turkish
      


Deniz Yuret and Ferhan Türe. In /Proceedings of the Human Language
Technology Conference - North American Chapter of the Association for
Computational Linguistics Annual Meeting (HLT-NAACL 2006)/
You can download the paper here
, the stand-alone
Turkish morphological disambiguator here
,
and the code used in this paper to train the model here
.




Abstract: In this paper, we present a rule based model for morphological
disambiguation of Turkish. The rules are generated by a novel decision
list learning algorithm using supervised training. Morphological
ambiguity (e.g. lives = live+s or life+s) is a challenging problem for
agglutinative languages like Turkish where close to half of the words in
running text are morphologically ambiguous. Furthermore, it is possible
for a word to take an unlimited number of suffixes, therefore the number
of possible morphological tags is unlimited. We attempted to cope with
these problems by training a separate model for each of the 126
morphological features recognized by the morphological analyzer. The
resulting decision lists independently vote on each of the potential
parses of a word and the final parse is selected based on our confidence
on these votes. The accuracy of our model (96%) is slightly above the
best previously reported results which use statistical models. For
comparison, when we train a single decision list on full tags instead of
using separate models on each feature we get 91% accuracy.


Full post...

Related link 
Labels: Downloads
, Publications
 1 comments






    October 18, 2005


      Method of utilizing implicit references to answer a query
      


Deniz Yuret. /United States Patent 6,957,213/

Abstract: A method of utilizing implicit references to answer a query
includes receiving segments of text, wherein individual segments have
elements. Implicit references are inferred between elements of the
segments. The inferring operation includes identifying implicit
references to antecedent elements. A query is received. In response to
the query, one or more segments are identified as relevant to the query
based at least in part on the implicit references.

Full post...

Related link


Labels: Publications
 0 comments






    July 21, 2004


      Some experiments with a Naive Bayes WSD system
      


Deniz Yuret. In /Proceedings of SENSEVAL-3, the Third International
Workshop on the Evaluation of Systems for the Semantic Analysis of Text/



Abstract: This document describes the architecture of a WSD system that
participated in the SENSEVAL-3 English all words evaluation exercise.
The system uses two independent statistical models, one based on local
collocations and another based on a bag of words around the target. The
model with the higher confidence provides the final answer for each
instance. Both models use Naive Bayes and supervised training with
different feature sets. The experiments using this system indicate that
the specific smoothing parameters used for Naive Bayes make a big impact
on the performance, smaller context sizes give better accuracy, and that
the bag of words model adds little to the performance.

Full post...

Related link 
Labels: Publications
 0 comments






    April 02, 2004


      Relationships Between Amino Acid Sequence and Backbone Torsion
      Angle Preferences
      


Özlem Keskin, Deniz Yuret, Attila Gürsoy, Metin Türkay and Burak Erman.
In /Proteins: Structure, Function, and Bioinformatics. 55(4):992-8./
(PDF
)


Abstract: Statistical averages and correlations for backbone torsion
angles of chymotrypsin inhibitor 2 are calculated by using the
Rotational Isomeric States model of chain statistics. Statistical
weights of torsional states of phi-psi pairs, needed for the statistics
of the full chain, are obtained in two different ways: 1) by using
knowledge-based pairwise dependent phi-psi energy maps from Protein Data
Bank (PDB) and 2) by collecting torsion angle data from a large number
of random coil configurations of an all-atom protein model with volume
exclusion. Results obtained by using PDB data show strong correlations
between adjacent torsion angle pairs belonging to both the same and
different residues. These correlations favor the choice of the
nativestate torsion angles, and they are strongly context dependent,
determined by the specific amino acid sequence of the protein. Excluded
volume or steric clashes, only, do not introduce context-dependent
phi-psi correlations into the chain that would affect the choice of
native-state torsional angles.

Full post...

Related link

Labels: Publications
 0 comments






    June 27, 2002


      Omnibase: Uniform Access to Heterogeneous Data for Question
      Answering
      


Boris Katz, Sue Felshin, Deniz Yuret, Ali Ibrahim, Jimmy J. Lin, Gregory
Marton, Alton Jerome McFarland, Baris Temelkuran. In /Birger Andersson,
Maria Bergholtz, Paul Johannesson (Eds.): Natural Language Processing
and Information Systems, 6th International Conference on Applications of
Natural Language to Information Systems, NLDB 2002, Stockholm, Sweden,
June 27-28, 2002, Revised Papers. Lecture Notes in Computer Science 2553
Springer 2002, ISBN 3-540-00307-X. /

Abstract: Although the World Wide Web contains a tremendous amount of
information, the lack of uniform structure makes finding the right
knowledge difficult. A solution is to turn the Web into a virtual
database and to access it through natural language. We built Omnibase, a
system that integrates heterogeneous data sources using an
object-property-value model. With the help of Omnibase, our Start
natural language system can now access numerous heterogeneous data
sources on the Web in a uniform manner, and answers millions of user
questions with high precision.

# Download PDF .


Full post...

Related link 
Labels: Publications
 0 comments






    March 01, 2002


      Alpha-beta-conspiracy search
      


David McAllester and Deniz Yuret. /ICGA Journal Vol. 25, No. 1 - March
2002/ (PDF
)


Abstract: We introduce a variant of α-β search in which each node is
associated with two depths rather than one. The purpose of α-β search is
to find strategies for each player that together establish a value for
the root position. A max strategy establishes a lower bound and the min
strategy establishes an upper bound. It has long been observed that
forced moves should be searched more deeply. Here we make the
observation that in the max strategy we are only concerned with the
forcedness of max moves and in the min strategy we are only concerned
with the forcedness of min moves. This leads to two measures of depth
--- one for each strategy --- and to a two-depth variant of α-β called
ABC search. The two-depth approach can be formally derived from
conspiracy theory and the structure of the ABC procedure is justified by
two theorems relating ABC search and conspiracy numbers.

Full post...

Related link 
Labels: Publications
 0 comments




Home 
Subscribe to: Posts (Atom)

My Photo 

Deniz Yuret
Koç University
İstanbul Turkey
+90-212-338-1724
dyuret@ku.edu.tr




  * Home 
  * AI Lab 
  * GitHub 
  * Scholar 




  * Türkçe
     (49)
  * Notes  (41)
  * Publications
     (33)
  * Books  (24)
  * Downloads  (10)
  * Math  (10)
  * Links  (8)
  * Students  (8)
  * Classes  (1)
  * Projects  (1)




    *Teaching* 
    Comp130  Introduction to Programming

    *Curriculum Vitae* 
    2002-curr Koç University 
    2000-2002 Inquira, Inc. 
    1988-2000 B.S.  M.S.
     Ph.D.  MIT
    
    1985-1988 İzmir Fen Lisesi 

    *Bibliography*
    

    *Dissertation*  


  Deniz Yuret's Homepage 

Showing posts with label *Students*. Show all posts

Showing posts with label *Students*. Show all posts



    August 16, 2011


      Ergun Biçici, Ph.D. 2011
      

*The Regression Model of Machine Translation*
Ergun Biçici. Ph.D. Dissertation. Koç University, Department of Computer
Engineering. August, 2011. (PDF
,
Presentation
)


*Abstract:*
Machine translation is the task of automatically finding the translation
of a source sentence in the target language. Statistical machine
translation (SMT) use parallel corpora or bilingual paired corpora that
are known to be translations of each other to find a likely translation
for a given source sentence based on the observed translations. The task
of machine translation can be seen as an instance of estimating the
functions that map strings to strings.

Regression based machine translation (RegMT) approach provides a
learning framework for machine translation, separating learning models
for training, training instance selection, feature representation, and
decoding. We use the transductive learning framework for making the
RegMT approach computationally more scalable and consider the model
building step independently for each test sentence. We develop training
instance selection algorithms that not only make RegMT computationally
more scalable but also improve the performance of standard SMT systems.
We develop better training instance selection techniques than previous
work from given parallel training sentences for achieving more accurate
RegMT models using less training instances.

We introduce L_1 regularized regression as a better model than L_2
regularized regression for statistical machine translation. Our results
demonstrate that sparse regression models are better than L_2
regularized regression for statistical machine translation in predicting
target features, estimating word alignments, creating phrase tables, and
generating translation outputs. We develop good evaluation techniques
for measuring the performance of the RegMT model and the quality of the
translations. We use F_1 measure, which performs good when evaluating
translations into English according to human judgments. F_1 allows us to
evaluate the performance of the RegMT models using the target feature
prediction vectors or the coefficients matrices learned or a given SMT
model using its phrase table without performing the decoding step, which
can be computationally expensive.

Decoding is dependent on the representation of the training set and the
features used. We use graph decoding on the prediction vectors
represented in n-gram or word sequence counts space found in the
training set. We also decode using Moses after transforming the learned
weight matrix representing the mappings between the source and target
features to a phrase table that can be used by Moses during decoding. We
demonstrate that sparse L_1 regularized regression performs better than
L_2 regularized regression in the German-English translation task and in
the Spanish-English translation task when using small sized training
sets. Graph based decoding can provide an alternative to phrase-based
decoding in translation domains having low vocabulary.
Full post...

Labels: Students 
0 comments






    August 14, 2009


      Önder Eker, M.S. 2009
      

*Parser Evaluation Using Textual Entailments*
Önder Eker. M.S. Thesis. Boğaziçi Üniversitesi Department of Computer
Engineering, August 2009. (PDF
)

*Abstract*
Syntactic parsing is a basic problem in natural language processing. It
can be defined as assigning a structure to a sentence. Two prevalent
approaches to parsing are phrase-structure parsing and dependency
parsing. A related problem is parser evaluation. PETE is a
dependency-based evaluation where the parse is represented as a list of
simple sentences, similar to the Recognizing Textual Entailments task.
Each entailment focuses on one relation. A priori training of annotators
is not required. A program generates entailments from a dependency
parse. Phrase-structure parses are converted to dependency parses to
generate entailments. Additional entailments are generated for
phrase-structure coordinations. Experiments are carried out with a
function-tagger. Parsers are evaluated on the set of entailments
generated from the Penn Treebank WSJ and Brown test sections. A
phrase-structure parser obtained the highest score.

Full post...

Labels: Students 
0 comments






    January 28, 2009


      Neşe Aral, M.S. 2009
      



*Dynamics of Gene Regulatory Cell Cycle Network in Saccharomyces Cerevisiae*
Neşe Aral. M.S. Thesis, Koç University Department of Physics, January 2009.

Abstract: In this thesis, the genetic regulatory dynamics within the
cell cycle network of the yeast Saccharomyces Cerevisiae is examined. As
the mathematical approach, an asynchronously updated Boolean network is
used to model the time evolution of the expression level of genes taking
part in the regulation of the cell-cycle. The attractors of the model’s
dynamics and their stability are investigated by means of a stochastic
transition matrix. It is shown that the cell cycle network has unusual
dynamical properties when compared with similar random networks.
Furthermore, an entropy measure is employed to monitor the sequential
evolution of the system. It is observed that the experimentally
identified cell cycle phases G1, S, G2 and M correspond to the stages of
the network where the entropy goes through a local extremum.

Download PDF 

Full post...
 Related
link 
Labels: Students 
0 comments






    August 01, 2008


      Ahmet Engin Ural, M.S. 2008
      


*Evolution of Compositionality with a Bag of Words Syntax*
Ahmet Engin Ural, M.S. Thesis, Koç University Department of Computer
Engineering, August 2008.

In the last two decades, the idea of an emerging and evolving language
has been studied thoroughly. The main question behind this kind of
studies is how a group of humans reaches an agreement on the phonology,
lexicon and syntax. The improvements in computational tools led the
researchers build and test models that have been ran computer
simulations to answer the question. Although the models are mere
reflections of the reality, the results have been often useful and
insightful. This dissertation follows the same line and proposes a new
model, tested in a game based simulation methodology. Besides, this work
tries to fill the gap in the studies of lexicon compositionality and
proposes a plausible explanation for the transition from single word
naming to multi word naming. The direction of the results is in line
with the previous research such as the emergence of a stable and
communicative language. Moreover compositionality in lexicon is observed
with a very simple bag of words syntax. The parameters influencing the
results are analyzed in depth. Even though the model does not meet the
standards of the real world, future work hints insightful facts about
the transition from single word naming to syntax.

Download thesis. 

*A selection of Evolution of Language Resources *

Biannual Evolution of Language Conference: evolang 2008


University of Edinburgh, Language Evolution and Computation Research
Unit 

A literature overview  by
Simon Kirby 

The book Language Evolution
 by Morten Christiansen
and Simon Kirby

A PhD thesis by Joris van Looveren, Design and Performance of
Pre-Grammatical Language Games




Full post...

Related link 
Labels: Students 
0 comments






    October 01, 2007


      Mehmet Ali Yatbaz, M.S. 2007
      




*Stretch: A Feature Weighting Method for The k Nearest Neighbor Algorithms*
Mehmet Ali Yatbaz. M.S. Thesis, Koç University Department of Computer
Engineering, October 2007.

Abstract: The k nearest neighbor learning algorithm (kNN) is one of the
well studied nonparametric learning algorithms. kNN assumes that the
underlying joint probability density function of the training set is
unknown and it estimates the underlying joint probability density
functions using the labeled data set (training set). Although this is a
realistic assumption in terms of the real world problems, it introduces
some limitations on the predictive
accuracy, the storage complexity and computational compexity of the kNN.

The goal of this thesis is to understand kNN and techniques that are
used to increase the predictive accuracy of kNN. This thesis mainly
focuses on the effect of the irrelevant features on the predictive
accuracy of the kNN and introduces the Stretch method, a new
preprocessing method to increase the predictive accuracy of kNN by doing
linear transformation on the training data matrix. The method
incrementally constructs a linear transformation that maximizes the
nearest neighbor classification accuracy on the training set. At each
iteration the method picks an instance from the data set, and computes a
transformation that moves the instance closer to the instances with the
same category and/or away from the instances in other categories. The
composition of these iterative linear transformations can lead to
statistically significant improvements in kNN learning algorithms.

Download PDF 


Full post...

Related link 
Labels: Students 
0 comments






    September 11, 2006


      Bengi Mizrahi, M.S. 2006
      



*Paraphrase Extraction from Parallel News Corpora*
Bengi Mizrahi. M.S. Thesis, Koç University Department of Computer
Engineering, September, 2006.

Different expressions of the same statement is said to be paraphrases of
each other. An example is the phrases 'solved' and 'found a solution to'
in 'Alice solved the problem' and 'Alice found a solution to the
problem'. Paraphrase Extraction is the method of finding and grouping
such paraphrases from free text. Finding equivalent paraphrases and
structures can be very beneficial in a number of NLP applications, such
as Question Answering, Machine Translation, and Multi-text
Summarization, e.g. in Question Answering, alternative questions can be
created using alternative paraphrases. We attack the problem by first
grouping news articles that describe the same event and then collecting
sentence pairs from these articles that are semantically close to each
other, and then finally extracting paraphrases out of these sentence
pairs to learn paraphrase structures. The precision of finding two
equivalent documents turned out to be 0.56 and 0.70 on average, when
matching criterion was strict and flexible, respectively. We tried 9
different evaluation techniques for sentence-level matching. Although,
exact word match count approach had a better precision value than the
n-gram precision count approaches, paraphrase extraction phase shows
that the latter approaches catch sentence pairs with higher quality
pairs for paraphrase extraction. Our system can extract paraphrases with
0.66 precision when only equivalent document pairs are used as a test set.

Download PDF 

Full post...

Related link 
Labels: Students 
0 comments






    September 21, 2005


      Volkan Kurt, M.S. 2005
      



*Protein Structure Prediction Using Decision Lists*
Volkan Kurt. M.S. Thesis, Koç University Department of Computational
Sciences and Engineering, September 2005.

Proteins are building blocks of life. Structure of these building
blocks plays a vital role in their function, and consequently in the
function of living organisms. Although, increasingly effective
methods are developed to determine protein structure, it is still
easier to determine amino acid sequence of a protein than its folded
structure and the gap between number of known structures and known
sequences is increasing in an accelerating manner. Structure
prediction algorithms may help closing this gap.

In this study, we have investigated various aspects of structure
prediction (both secondary and tertiary structure). We have
developed an algorithm (Greedy Decision List learner, or GDL) that
learns a list of pattern based rules for protein structure
prediction. The resulting rule lists are short, human readable and
open to interpretation. The performance of our method in secondary
structure predictions is verified using seven-fold cross validation
on a non-redundant database of 513 protein chains (CB513). The
overall three-state accuracy in secondary structure predictions is
62.5% for single sequence prediction and 69.2% using multiple
sequence alignment. We used GDL to predict tertiary structure of a
protein based on its backbone dihedral angles phi and psi. The
effect of angle representation granularity to the performance of
tertiary structure predictions has been investigated.

Existing structure prediction approaches build increasingly
sophisticated models emphasizing accuracy at the cost of
interpretability. We believe that the simplicity of the GDL models
provides scientific insight into the relationship between local
sequence and structure in proteins.

Download PDF


Full post...

Related link 
Labels: Students 
0 comments






    September 11, 2005


      Başak Mutlum, M.S. 2005
      



*Word Sense Disambiguation Based on Sense Similarity and Syntactic Context*
Başak Mutlum. M.S. Thesis, Koç University Department of Computer
Engineering, September 2005.

Word Sense Disambiguation (WSD) is the task of determining the
meaning of an ambiguous word within a given context. It is an open
problem that has to be solved effectively in order to meet the needs
of other natural language processing tasks. Supervised and
unsupervised algorithms have been tried throughout the WSD research
history. Up to now, supervised systems achieved the best
accuracies. However, these systems with the first sense heuristic
have come to a natural limit. In order to make improvement in WSD,
benefits of unsupervised systems should be examined.

In this thesis, an unsupervised algorithm based on sense similarity
and syntactic context is presented. The algorithm relies on the
intuition that two different words are likely to have similar
meanings if they occur in similar local contexts. With the help of a
principle-based broad coverage parser, a 100-million-word training
corpus is parsed and local context features are extracted based on
some rules. Similarity values between the ambiguous word and the
words that occurred in a similar local context as the ambiguous word
are evaluated. Based on a similarity maximization algorithm,
polysemous words are disambiguated. The performance of the algorithm
is tested on SENSEVAL-2 and SENSEVAL-3 English all-words task data
and an accuracy of 59% is obtained.

Download PDF 

Full post...

Related link 
Labels: Students 
0 comments




Older Posts

Home 
Subscribe to: Posts (Atom)

My Photo 

Deniz Yuret
Koç University
İstanbul Turkey
+90-212-338-1724
dyuret@ku.edu.tr




  * Home 
  * AI Lab 
  * GitHub 
  * Scholar 




  * Türkçe
     (49)
  * Notes  (41)
  * Publications
     (33)
  * Books  (24)
  * Downloads  (10)
  * Math  (10)
  * Links  (8)
  * Students (8)
  * Classes  (1)
  * Projects  (1)




    *Teaching* 
    Comp130  Introduction to Programming

    *Curriculum Vitae* 
    2002-curr Koç University 
    2000-2002 Inquira, Inc. 
    1988-2000 B.S.  M.S.
     Ph.D.  MIT
    
    1985-1988 İzmir Fen Lisesi 

    *Bibliography*
    

    *Dissertation*