Panasonic Lumix DMC-FX07
|
|
Bookmark Panasonic Lumix DMC-FX07 |
About Panasonic Lumix DMC-FX07Here you can find all about Panasonic Lumix DMC-FX07 like charger and other informations. For example: accessories, usb cable, price, digital camera, software, manual, review, battery.
Panasonic Lumix DMC-FX07 manual (user guide) is ready to download for free.
On the bottom of page users can write a review. If you own a Panasonic Lumix DMC-FX07 please write about it to help other people. [ Report abuse or wrong photo | Share your Panasonic Lumix DMC-FX07 photo ]
Manual
Download
(French)Panasonic Lumix DMC-FX07 Digital Camera, size: 9.5 MB |
Download
(English)Check if your language version is avaliable. Most of manuals are avaliable in many languages. |
Panasonic Lumix DMC-FX07
Video review
Panasonic Lumix DMC FX07 Video Demo
User reviews and opinions
| jake49387 |
9:05pm on Monday, October 4th, 2010 ![]() |
| Excellent battery life, effective image stabilization, low price, image quality, wide angle, size Noise a bit high, black case affected by smudges. | |
| JonhSmith |
1:26pm on Thursday, September 16th, 2010 ![]() |
| While the FX07 may not be as enticing to users who own a similar, but older, shooter, we feel it warrants a quick look to unearth the differences. | |
| anders.hagman |
5:15pm on Wednesday, September 1st, 2010 ![]() |
| many features, small size, durable, great picture quality, many extras. none ease of use, picture and video quality, ultra-compact size wrist strap installation | |
| slaveofone |
12:38am on Wednesday, July 14th, 2010 ![]() |
| I love this little camera Compact, neat, takes great pictures and video clips - everyone who sees my camera coments on what a nice camera it is. Great size, great build, great image stabliser, great pics! Excellent pocket snapper with professional results ! | |
| OOO-Wanter |
1:35am on Tuesday, July 13th, 2010 ![]() |
| I got this camera to replace my DMC-FX7, which is 5 megapixels but essentially the same camera. Normally the DMC-FX07 costs more. I love the size of this camera,it is easy to carry and the picture quality is outstanding. | |
| jertferd101 |
3:00am on Thursday, June 3rd, 2010 ![]() |
| Panasonic lumix is such a great mobile. I want to purchase any camera which fulfill my requires. Just bought this camera after our older Canon A75 died after about three or four years of very good service. Panasonic lumix is such a great mobile. I want to purchase any camera which fulfill my requires. | |
| FlatAlex |
3:15am on Friday, April 16th, 2010 ![]() |
| This camera is the bees knees!The combination... This camera has a wonderful set of options that include all the regualrs and a few extras. I love this camera. Not being very knowledgeable about photography, I find this camera very easy to operate. The camera feels good in my hand. | |
| gaia |
5:35pm on Wednesday, April 14th, 2010 ![]() |
| Great Camera! I love this little camera! It is light weight and durable. I have owned this camera for about 3-4 years now. Good purchase. I had the same model of camera for over a year and when I dropped it (damaging the lens enclosure). | |
Comments posted on www.ps2netdrivers.net are solely the views and opinions of the people posting them and do not necessarily reflect the views or opinions of us.
Documents

Matching Unstructured Product Offers to Structured Product Specications
Anitha Kannan ankannan@microsoft.com Rakesh Agrawal rakesha@microsoft.com ABSTRACT
An e-commerce catalog is typically comprised of specications for millions of products. The search engine receives millions of sales oers from thousands of independent merchants that must be matched to the right products. This problem is hard for several reasons. First, unique identiers are absent in most oers. Second, although the product specications are well structured, oers are described in the form of free text. Third, oers mention the values of the attributes without providing the corresponding attribute names. Fourth, values of a large number of attributes are often missing from the oer description. Finally, oers may also contain words other than attribute names and values. We present an automated technique for matching unstructured oers to structured product descriptions. A novel aspect of our approach is the semantic parsing of oer descriptions using dictionaries built from the structured catalog. Another novelty is that the matching function we learn factors in not only matches but also mismatches of attribute values as well as the missing attribute values. Our approach has been implemented in an experimental search engine and is used to match all the oers received by Bing shopping to the Bing product catalog on a daily basis. We present extensive experimental results from this implementation that demonstrate the eectiveness of the proposed approach.
Inmar E. Givoni
Microsoft Research
University of Toronto
inmar@psi.utoronto.ca Ariel Fuxman arielf@microsoft.com
viding them with information to help them make buying decisions. More than 70% of consumers said they are likely to shop online before making an oine purchase. A comprehensive product catalog is a prerequisite for the eectiveness of an e-commerce search service. Such a catalog at web-scale will contain information about every product as well as sales oers from various merchants. For instance, the Bing Shopping catalog (shopping.bing.com) has information on more than ve million products and more than ten million oers from upwards of tens of thousands of merchants. The product information consists of various attributes and their corresponding values, stored in a structured record comprised of attribute name, value pairs. Many products do not have universally agreed unique identiers. The product information is obtained from multiple product aggregators (e.g., CNET, PriceGrabber), each of them having only partial but dierent information. Consequently, the catalog can have multiple data records, each somewhat dierent from the other, corresponding to the same product. Similarly, oers come from multiple merchants (e.g., buy.com, gadgettown.com). Generally, there is very little structure in the oers. Typically, an oer consists of a textual description of the product for which the oer is being made. Embedded in the description are some attribute values and sometimes attribute names along with other terms, which the merchant presumes might be sucient for the oer to be matched to the intended product. Dierent merchants often use dierent names for the same attribute. Many oers have no identier that could be used for matching the oer to the corresponding product. The matching is currently done using rules written by experts a costly, error-prone, and brittle process. Consequently, many oers are matched incorrectly and millions of oers go unmatched. Fig. 1 shows part of the structured record for Panasonic DMC-FX07 digital camera as well as three merchant oers for this product as they appear in the Bing Shopping catalog. We make the following observations: While Oer-1 is the most detailed one shown, it still contains only a small part of the information in the structured record. The phrase Panasonic Lumix indicates both brand (Panasonic) as well as product line (Panasonic Lumix). Some of the attribute values only match approximately (7.2 megapixel vs. 7 megapixel, LCD monitor vs. LCD display). The only attribute name present in the oer is optical zoom (called lens system: optical zoom in the structured record). The
1. INTRODUCTION
With the increasing widespread use of the Internet, there has been tremendous growth in the amount of commerce conducted over the Web. A recent Comscore study [1] estimates that the yearly retail e-commerce sales in the U.S. alone has topped $100 Billion. Nearly seven out of ten consumers said that the Internet has become important in proWork done while author was an intern at Microsoft Research
Structured Record (Product)
Attribute Name category brand product line model sensor resolution color weight width height depth display: type display: technology display: diagonal size audio input type flash memory: form factor flash memory: storage capacity video input: still image format video input: digital video format lens system: optical zoom Attribute Value digital camera Panasonic Panasonic Lumix DMC-FXmegapixel silver 132 g 9.4 cm 5.1 cm 2.4 cm LCD display TFT active matrix 2.5 in none memory stick 8 MB JPEG MPEG-1 3.6
1.1 Problem Description and Highlights of the Solution
We have a large database of product specications. Each product specication (which we shall interchangeably call product) consists of a set of attribute name, value pairs and is represented in the database as a structured record. Some of the attributes can be numeric, while the others can be categorical. The unstructured oer descriptions (which we shall call oer, for short) are comprised of free text. The text has embedded in it some of the values and possibly some attribute names corresponding to one of the products. The text may also contain additional words. The attribute names and values in the text may not precisely match those found in the database. The text does not contain an identier that uniquely identies the corresponding product. Dierent textual descriptions may be provided for the same product. An oer may match more than one product as only partial descriptions are provided in the oers and because the same real-world product might have multiple representations in the product database. Our goal is to enable automated matching of the oers to corresponding products. Highlights of our proposed solution include: 1. Developing semantic understanding of the oers by leveraging structured information in the database. Specifically, we identify and assign attribute names to the values present in the oers. This semantic parsing serves to identify the product the oer corresponds to. 2. Learning a matching function that nds the product which has the largest probability of match to the given oer. This function is designed to have the following properties: It takes into account matches as well as mismatches in attribute values between oer-product pairs. It dierentiates between missing attribute values and mismatch of attribute values. It infers the relative importance of dierent attributes in the matching. 3. Built-in strategies for the solution to work at web scale. These strategies include Avoiding domain-specic features in the matching system. Reducing the candidate set of products that can potentially match a given set of oers. To evaluate the quality of the proposed solution, we perform large scale experiments using the Bing product catalog and merchant oers. The precision and recall values obtained from these experiments demonstrate the eectiveness of the solution. Our approach has been implemented in a working search engine and is used to match all the oers received by Bing shopping to the Bing product catalog on a daily basis. The rest of the paper is organized as follows. Section 2 discusses related work. In Section 3, we present our approach. The data-sets and metrics we use for evaluating the performance of the algorithm are given in Section 4. We also present experimental results in this section. We conclude with a summary and directions for future work in Section 5.
Unstructured Text (Offer-1)
Panasonic Lumix DMC-FX07 digital camera [7.2 megapixel, 2.5, 3.6x optical zoom, LCD monitor ]
Unstructured Text (Offer-2)
Panasonic DMC-FX07EB digital camera silver
Unstructured Text (Offer-3)
Lumix FX07EB-S, 7.2 MP
Figure 1: Structured product record for Panasonic DMC-FX07 digital camera and textual descriptions from three matching oers. corresponding values for this attribute are 3.6x vs. 3.6. Information provided in Oer-2 is largely a subset of what is provided in Oer-1. This oer provides the values of category and brand, but the value of the model has an extra sux. It additionally provides the value of the color attribute. Oer-3 is even more interesting. It provides part of the value of the product line (Lumix) and a somewhat dierent value for sensor resolution (7.2 MP vs. 7 megapixel) as well as model (FX07EB-S vs. DMCFX07). It neither provides category nor brand information. With respect to Oer-3, note further that Panasonic also makes other 7.2 megapixel Lumix digital cameras (e.g., DMC-TZ3K, DMC-LZ6, and DMC-FX12). Moreover, there is also a eld controller product with model number FX07. Clearly, we have on our hands a hard problem of matching unstructured textual descriptions of products to structured records for which it is desirable to have algorithmic solution.
Algorithm 1 O-line Training Input: U = {u1. uN } - a set of oers S = {s1. sM } - a set of structured product descriptions, M >> N M = {ui , sj i }N , (ui U , sj S) - pairs of correctly i=1 matched records, one for every ui. N = {ui , sk i }N - similarly, pairs of mismatched i=1 records. Output: D - dictionaries, w - algorithm parameters Preprocess: D CreateAttributeDictionaries(S) - build dictionaries of attributes and their values using S Train: for all u U do u SemanticParsing(u, D) - Extract plausible parses (Sec. 3.2) end for for all pairs M and pairs N do fiM ExtractSimFeatures(pairi ) N fj ExtractSimFeatures(pairj ) - Construct similarity feature vector for matched and mismatched pairs (Sec. 3.3) end for N w arg max LearnToMatch(F (, f ), {fiM }, {fj }) Train a function that maps feature vectors to match probability, F(, f ) : f [0, 1] (Sec. 3.4) Return: w, D
Algorithm 2 Online Matching Input: u - oer S, D, w Output: s - best matching s S u SemanticParsing(u, D) - (Sec. 3.2) Blocking: [ki ] Top attributes with largest weights in w S Subset of S with i (u.val(ki ) = s.val(ki ) for all si S do fi ExtractSimFeaturs(, si , K) - (Sec. 3.3) u P (match(si , u)) F (w, fi ) - Matching score of a pair (Sec. 3.5) end for Return: s = arg maxsi P (match(u, si )) - Best Matching score of all pairs (Sec. 3.5)
2. RELATED WORK
The problem of matching records has been studied under various topics including record linkage [2, 3, 4, 5], duplicate detection [6, 7], entity resolution [8, 9, 10], and merge/purge [11]. While our work continues this rich lineage of work, there are distinguishing traits in our setting that call for fresh approaches and techniques. For instance, while the work of Newcombe [4] (later formalized by Fellegi and Sunter in [3]) pioneered the probabilistic approach to matching, their work (and much of the subsequent record linkage literature) tacitly assumes that the data to be matched consists of properly structured records with a well-dened schema. The work on duplicate detection, merge/purge, and entity resolution is also targeted at structured and properly segmented records. At the other end of the spectrum, the work in the natural language processing[12] focuses on the detection of mentions of the same entity in free text. In contrast, in matching oers to products, there are components from both bodies of work: the oers consist of only free text, while the products are properly structured under a given schema. Much of the prior work has relied on presence of values for all attributes in the data records, and the goal has been to design similarity metric either at the entire record level [13, 14] or at the attribute level that are subsequently combined to measure record level match [15, 16]. This assumption is not valid in our setting. Since oers are free text, their tokens need to be mapped to attributes. However, not all tokens may map to any attribute (e.g., the token monitor in Oer-1 of Fig. 1), and when they do map, they can be ambiguously mapped to multiple attributes (e.g., the token
Panasonic in Oer-1 of Fig. 1). So, unlike in previous settings, matching algorithm needs to disambiguate among the multiple possible interpretations of the oers. These problems also arise in the context of understanding queries [17, 18]. In [17], a probabilistic model is introduced to identify the annotation of a query which corresponds to best explanation of that query. In our work, we propose the notion of optimal parse which is dened with respect to a product the oer will be matched against. While prior work has focused primarily on the computation of weights for value matches and mismatches for the dierent elds of a record [15, 16], the explicit modeling of missing values has not received much attention. An exception is in [19], wherein a comparison feature vector is augmented to encode presence/absence of values. However, this approach does not explicitly penalize mismatches at the attribute level, and therefore does not leverage a strong signal for matching oers to products. Specically in the Commerce domain, Bilenko et al. [20] proposed techniques for clustering merchant oers, but assume that the oers have structured information. The challenges of matching unstructured oers to structured product specications are not present when clustering structured offers.
3. METHODOLOGY 3.1 Problem Statement
We have a database S of product descriptions, represented as structured records. Every structured record s S consists of a set of attribute name, value pairs. The attributes can be numeric or categorical. We receive an unstructured oer u as input, which is a concise free-text description that species values for a subset of the attributes in S in an arbitrary manner. The text may also contain additional words. Our objective is to match u to one or more structured records in S. We use the metric of precision and recall for judging the quality of the matching system. We take a probabilistic approach and nd the product s S that has the largest probability of match to the given oer, u. Our matcher is learned in an oine stage (Algorithm 1). For this, we postulate a small training set U of unstructured oers. Each u U has been matched to one structured record in S (set M). We also have mismatched records
Panasonic Lumix DMC-FX07 digital camera [ 7.2 megapixel , 2.5, 3.6x optical zoom, LCD monitor]
model optical sensor display optical zoom resolution diagonal size
Attribute Value Panasonic Panasonic Lumix DMC-FX05 2.5 in LCD Panasonic DMC-FX07 7.2 megapixel 3.6x
brand product line
display type
Attribute Type brand product line model display diagonal size display type brand model optical sensor resolution optical zoom
Maximal Parsing of Unstructured Record (a)
Figure 2: (a) An oer u (b) Two products s1 and s2 and the optimal parses of u. from S, one for every u U (set N ). In the subsequent online stage (Algorithm 2), new oers are matched one at a time. We rst do candidate selection, and then choose the best matched product amongst the candidates by applying the learned model. We next describe the key components required for the two algorithms corresponding to learning the matcher in the oine stage and matching the oers in the online stage:
3.2 Semantic Parsing
Our matching algorithm is based on understanding the semantics in the oer descriptions and using that semantics to aid in matching. Thus, the rst step in matching is the semantic parsing step in which the semantics present in the oer is understood. This is operationalized in a three stage process consisting of tagging the oer with attributes, identifying plausible parsings based on the tags and nally obtaining an optimal parse. We describe each of these steps below. Tagging: The tagging step identies attribute names present in the oer and associates all strings in the oer that can be assigned to them. Let A represent all the attributes present in structured data available in product descriptions S. We rst build an inverted index D from S such that D(v) returns the attribute name a A associated with string v. Given an oer u, let Zu represent the set of all n-grams (up to n = 4) present in u. Then, the tagging step identies the set of all attribute name, value pairs in u: Ru = { } a, {v|v Zu , D(v) = a|} |a A (1)
of an oer is dened to be a particular combination of all attributes identied in the tagging step such that each attribute is associated with exactly one value. Thus, if each attribute a has ka values, then there are exactly a ka plausible parses in the oer. Typically, ka is small and thus, only a small number of parses are plausible. The example in Fig. 2(a) has a single value for six of the seven identied attributes and the product line attribute has two values. Thus, this oer has two plausible parses, one parse in which lumix is the product line and other in which panasonic lumix is the product line, while the values do not change for other attributes between parses. Multiple plausible parses arise because of ambiguities in the data. Therefore, we maintain these plausible parses until the oer is paired with a product which gives rise to the optimal parse of the oer with respect to that product. Optimal parse: When an oer is paired with a product u, s M (also for pairs in N ), we use the parse of the oer that is best in agreement with the product. By best in agreement we mean, the parse in which the maximum number of attributes agree in their values in u and s. We call this plausible parse as the optimal parse of the oer to the product. Note that optimal parse is dened only with respect to the product. Dierent products can give rise to a dierent choice of plausible parse to be optimal. Continuing with our example using Fig.2b, the optimal parse of u corresponding to product s1 is the plausible parse with Panasonic Lumix as the product line. When u is paired with s2, both plausible parses are optimal since s2 does not have product line specied.
Fig. 2(a) shows an oer for digital camera, and the output of the tagging step. A portion of the output from this step is { {brand, {panasonic}}, {product line, {lumix, panasonic lumix}}, where brand and product line are the two of the identied attributes, with brand having a single value panasonic and product line having a set of two values {lumix, panasonic lumix}. Plausible parse: Given the tagging, a plausible parse
Similarity Feature Vectors
Similarity between an oer and a product is measured in terms of their similarity on the values of the attributes present in them. Since products have a large number of attributes, we choose a subset of these attributes that are present in oers. In particular, using Eqn. 1, we select at-
I[Ru (k) = ] >0 (2) |U| where Ru (k) represents the values of the attribute k found in Ru (according to Eqn. 1). I[t] is the indicator function and the expression [Ru (k) = ] is dened to be true if Ru (k) has one valid value for attribute k is found in u. We would like the similarity function dened over K to take into account not only the match in values of certain attributes, but also reect mismatches or missing values in either products or oers. The function should penalize mismatches dierently from missing values; In fact, a mismatching value is a stronger indicator of the corresponding oer and product mismatching. In addition, an attribute that is frequently missing reects its lower importance for matching. With these design considerations, we dene the similarity feature vector as follows: Let u represent the optimal parse of oer u with respect to product s. Then, for the pair , s, we compute a similarity feature vector f by deteru mining similarity levels between u and s for each attribute. Let u.val(k) and s.val(k) represent the value of some at tribute k from u and s, respectively. The similarity between u and s for attribute k is dened to be: if u.val(k) = OR s.val(k) = 0 |u.val(k)s.val(k)| >] I[ fj = if kj is numeric attribute (1) max(u.val(k),s.val(k)) I[.val(k)=s.val(k)] u (1) otherwise (3) where I[z] is the indicator function. When either the optimal parse of the oer or the product has a missing value for an attribute, the corresponding feature value is 0, unlike when the values mismatch whence the value is -1. This enables penalizing the matching score dierently when u or s is missing an attribute value than if they disagree on that attribute. For categorical attributes, we use binary loss since the offer descriptions typically do not have typographical errors (perhaps due to the fact that they are shown on merchants websites). However, numeric attributes frequently have imprecise values because of round-o errors (e.g. 7 MP vs. 7.2 MP) or dierence in conversion factors (1GB = 1000 MB or 1GB =1024 MB). After some experimentation, we set to.1 to provide a less sensitive measure of similarity than that of the binary loss. This parameter is held at.1 across all categories. If desired, it can be learned using cross validation. Another possibility, is to set to zero, requiring the stringent condition that numeric attribute values should also match exactly.
tributes, K such that
relative importance between the attributes, and in turn provide a function that measures the match between an oer and product in terms of a probabilistic score. Hence, we use binary logistic regression of the form: F (w, f ) = P (y = 1|f , w) = + exp {(b + f T w)} (4)
The logistic regression learns a mapping from the similarity feature vector f to a binary label y, through the logistic function. The parameter w is the weight vector wherein each component wj measures the relative importance of the feature fj for predicting the label y. We have with us all matched and mismatched training pairs, and let fi = [fi1 , fi2 ,. , fi|K| ] be the feature vector for pair i. Let {F, Y} = {(f1 , y1 ),. , (fN , yN )} be the set of feature vectors along with their corresponding binary labels. Here, yi = 1 indicates that the ith pair is a match, otherwise yi = 0. Logistic regression maximizes an objective function which is the conditional log-likelihood of the training data P (Y|F, w): N arg max log P (Y|F, w) = arg max log P (yi |fi , w), (5)
where P (yi = 1|fi , w) is dened by Eq. 4. Note that a feature with positive weight will aect the score by increasing the probability of match for a pair with agreement on the feature, by decreasing the score in the case of a mismatch, and by leaving the score unaected in the case of a missing value.
3.5 Online Matching
During the online phase, we are given a previously unseen oer u, and the goal is to identify the best matching product s S. The scoring function learned during the oine phase provides the probability of match for a pair u, s. Naively, we can nd the best match by pairing u with every s S, calculating the pair match score, and choosing the s that results in the highest score. However, such naive pairing will cost O(|S|) operation for each oer. Instead, building upon the work in record linkage [2, 21] and merge/purge [11], we design a staged blocking strategy. We note that the products are usually categorized into a taxonomy. Therefore, in the rst stage, we use a classier to categorize the given oer into a category node in this taxonomy. This reduces the candidate set to only those products that belong to the oer category. Further, within the category, we would like to reduce the number of candidate products to match agaisnt the oer. For that, we make the following observation. The goal of the matching process is to match the oer to the product that has the largest matching score. To obtain this large score, a product needs to agree, especially on the values of the attributes that contribute large weights to the matching function. Using this insight, in the second stage, we further reduce the candidate set by identifying those top weighing attributes that can potentially give a matching score of at least in Equation 4. To describe in detail, after identifying attributes in the oer using method in Sec. 3.2, we choose attributes in the descending order of weights until the following condition is satised: b (6) fj wj log 1 j
3.4 Matching Function
We would like a matching function that can provide a probabilistic score of match between an oer and a product so that the best matching product to an oer is the one that has the largest probability. In addition, as the number of attributes in S is large, and not all attributes are present in the oers, the function needs to automatically infer attributes that are required to be matched, and also learn the relative importance between them. We nd that binary logistic regression conveniently lends itself to satisfy these two criteria. Given some labeled data of good and bad matches, and features that measure similarity between the attributes, it can automatically learn the
where weights are ordered so that wj >= wj+1 j. This equation can be derived in straightforward way from Equation. 4 by rearranging terms and taking the log. Here, corresponds P (y = 1|f , w). This set of attributes {kj } are then used to retrieve products such that the retrieved products match on value of at lease one of the attributes in {kj }. The union of all these products becomes the candidate set of products. Note that, this candidate set is a superset of products that can potentially match to oer since we consider all products that have at least one matching attribute value, within this attribute set. In our experiments, we use = 0.5 as it is mid point on the probability scale.
oer and product. Each token is associated with the tfidf score dened by log(TF(token)+1) log(IDF(token)) [14]. Here, TF(token) is the frequency of the token in the oer/product and IDF(token) is its corresponding inverse document frequency; IDF(token) is computed across all the products in the category. Similarity is measured between the normalized tf-idf score of tokens in u and s. Note that while COSINE treats all tokens as equally useful, TFIDF will weigh them inversely to their popularity. Thus, a token such as 40d (corresponding to model number of a canon eos 40d digital camera) will have higher tfidf score than digital camera that is ubiquitous in digital camera category. Typically, tokens that are unique such as model numbers and brands are those that are useful in matching. Since TFIDF can choose such unique tokens, it can serve as a good baseline to our approach.
4. EVALUATION
The matching system described in the previous section has been implemented in a working experimental search engine and is used to match all the oers received by Bing shopping to the Bing product catalog on a daily basis. In this section, we present performance results from experiments using this implementation.
4.3 Data Sets
We use a subset of the Bing Shopping catalog in our evaluation. The products belong to 87 categories related to electronics (e.g., televisions, mp3 players), computing (e.g., desktop computers, laptops) and cameras and accessories (e.g., digital cameras, camera accessories). We had a labeled set of 40,000 oers from these categories, each labeled with the corresponding matched products. There were on average 460 oers/category; the smallest category had 50 oers. For each category, we randomly sampled 100 oers (20 if the number of oers is less than 200) and used them for training the matcher. We used whatever oers were left within a category as the test set for that category. Thus, the training set for each category had at most 100 samples. The test set size varied from category to category as we made use of all the available samples, but in no case the test set was smaller than 30 samples. Results are shown using 5-fold cross validation.
Baseline Algorithms
In the absence of an algorithm, directly applicable to our problem formulation, we dene two baselines, COSINE and TFIDF for comparison. They are inspired by the work in record linkage for measuring token-level similarity ([13, 14, 20]). COSINE: This baseline uses the cosine similarity as the measure of agreement between the oer and product. Similarity is measured between the frequency distribution of tokens in u and s [20]. As s is structured, it is rst converted into a string by concatenating the content of the record. The s having the largest cosine similarity with u is taken as the best match. TFIDF: This baseline uses the tf-idf weighted cosine similarity as the measure of agreement between the
Oer Description imation 4gb nano usb 2.0 ash drive - 4gb - usb - external
Products imation nano ash drive usb ash drive - 4 gb imation 2gb usb 2.0 clip ash drive
1 0.9 0.8 0.7
Televisions Laptops Hubs Switches Digital cameras Motherboards Privacy filters Modems Bags & cases Memory cards
Table 1: Sample oer and two products from the memory cards category. Note that other than using a separate training set for each category, we do not use any category-specic feature and the same code is used for dierent categories. We use small training set (100 oers per category), which can be curated easily from successful matches (e.g., high click throughs) in a running system. We also do not have any parameters that needs to be tuned. These characteristics are critical for a solution to work at web scale.
Precision
0.6 0.5 0.4 0.3 0.2 0.1 0
Recall (a) Performance of LWMM
1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0
4.4 Experiment 1: LWMM Performance
We start by presenting precision and recall values for different categories. Fig. 3(a) shows the scatter plot of precisionrecall values that the LWMM algorithm exhibits. Each circle corresponds to a category and the area of the circle is proportional to the test set size. We have labeled some of the categories. The macro average precision is 80% while the macro average recall is 50%. By way of comparison, Fig. 3(b) (resp. Fig. 3(c)) gives the ratio of the F-measure achieved by LWMM to the F-measure achieved by COSINE (resp. TFIDF). The macro average precision and recall for COSINE are 50% and 37%, respectively. The corresponding numbers for TFIDF are 54% and 42%. We observe that LWMM has high performance over a range of categories and performs better than COSINE as well as TFIDF. The next three subsections present further ndings.
F measure for LWMM
Hubs Mouse pads
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
F measure for COSINE (b) LWMM vs. COSINE
Digital cameras AV Receivers
Difference in performance of LWMM over various categories
Tripods
Mouse pads
F measure for TFIDF COSINE (c) LWMM vs. TF-IDF
Figure 3: LWMM Performance
We see in Fig. 3(a) that LWMM has very high performance on categories such as digital cameras and televisions, but the performance becomes lower on categories such as bags & cases and memory cards. This dierence can be understood by examining how complete the information is for the products in the corresponding categories. As described in Section. 3.2, an important component of our matching system is semantic parsing which makes use of attribute dictionaries compiled from attribute data for the products. This means that the quality of the attribute directories is dependent upon the quality of the product data. For the categories such as digital cameras and television the product data is nearly complete that enables high quality semantic parsing leading to LWMMs high performance. In comparison, note that memory cards category has low recall and precision. Table. 1 shows an oer and two products from this category. The distinguishing attribute for correct matching in this category is the capacity of the memory card. However, in our product catalog, the value for this attribute is missing in all but two products (out of 7500 products).
Improvement in LWMM over COSINE
In Fig. 3(b), COSINEs F-measure is much lower than LWMM for most of the categories. By examining instances where the matches dier, we found that the main reason is that there are many tokens in the oers that are not distinguishing attribute names or values, but generic terms. Since, the cosine similarity weighs all tokens equally, it is unable to perform eective matching. This demonstrates the importance of performing semantic parsing of the oer description and using only the relevant tokens (attribute names and values) in the matching.
Monitor stands
0.9 0.8 0.7 0.6 0.5
Digital cameras
Televisions
Improvement in LWMM over TFIDF
TFIDF weighs the relative importance of the tokens, but only as measured by the frequency of their presence in the product collection. It is not cognizant of what tokens are semantically important. Thus, the token canon gets downweighted because there are very many canon cameras. However, canon as the brand is a very good indicator of a product identity. Hence, it is important to identify the attributes that are present in the oers and their relative importance for better matching. We investigated further the categories where the COSINE and TFIDF do better than LWMM. These categories include hubs, mouse pads and tripods. We found that the primary reason is that these categories have very few structured attributes. For example, tripods have only two attributes: brand and height. Additionally, many tripods have the same value for these attributes. Our matching function thus nds that it is matching too many products for an oer and refuses to return a product. LWMM thus ends up with 100% precision and 8% recall for tripods. COSINE and TFIDF, on the other hand, take chance on the name of the product and have higher recall (and a higher score for F-measure). Similar observations hold for hubs and mouse pads. Two points are in order here. First, our deployed system is designed to be conservative; it is willing to sacrice recall for precision. More importantly, this analysis points to selectively adopting a hybrid strategy. When a category is impoverished with respect to the number of the attributes required to explain its products, a hybrid scheme that uses both structured information as well as token representation should be used.
0.4 0.3 0.2 0.1 0
Mousepads
F measure for EW
(a) LWMM vs. EW
1 0.9 0.8
0.7 0.6 0.5 0.4 0.3 0.2 0.1 0
Recall
(b) LWMM vs. EW (Digital Cameras)
0.9 0.8
4.5 Experiment 2: Importance of learning weights
Fig. 4(a) shows the scatter plot of F-measure of LWMM over EWs F-measure. Clearly, learning weights makes matching better. We notice that the gains are much larger for some categories than others. To understand this dierence, we drill down on two categories digital cameras and televisions. Fig. 4(b) shows the precision-recall values for digital cameras. For this category, there were seven attributes present in at least one oer during the oine training phase. These attributes were brand, model, product line, color, resolution, optical zoom, viewnder type and video input type). EW weighs these attributes equally. At low recall, EW insists on all key attributes to agree on their values, and hence the precision becomes high. However, as we increase recall by reducing the number of agreements in attributes, precision drops. It is as expected since certain combinations of attributes provide spurious matches (eg. agreement on color and resolution of a camera). On the other hand, by
4.6 Experiment 3: Importance of treating mismatching attributes differently from missing attributes
Fig. 5 shows the scatter plot comparing the F-measures between LWMM and LW. We can see that there is a gain in F-measure for all categories, when we treat missing attributes dierently from mismatched values. The gain is less pronounced for high economic value categories such as digital cameras than for low economic value categories such as batteries. The reason is that in the case of the former, the merchants have the incentive to provide better oer descriptions containing all the necessary attributes needed for matching. Hence, missing attributes becomes less of an issue.
4.7 Experiment 4: Scalability
deviation variation of the average of number of candidates across the oers in that category. We can see that the candidate set size is much smaller than the number of products in most of the categories.
5. CONCLUSIONS AND FUTURE WORK
We studied the problem of matching unstructured textual description to structured data records that arises in the context of matching sales oers to product specications in the e-commerce websites. The product specications include both numeric and categorical data. The key distinguishing characteristics of our solution are: Semantic understanding of oer descriptions using automatically built attribute dictionaries from structured product specications contained in the product catalog A matching function that considers not only matches but also mismatches of attribute values and the missing attribute values. The function learns the relative importance between the attributes Avoidance of domain-specic features and use of staged blocking strategies for the solution to work at web scale We performed extensive experiments using Bing Shopping catalog to understand the performance characteristics of the proposed solution. The experimental results show that the proposed approach scores high on F-measure and consistently beats baseline approaches for product categories that have reasonably rich attribute structure and good data. They also point to the desirability of hybrid solutions that additionally make use of classical text matching techniques for attribute impoverished product categories. The methodology we employed for analyzing the experimental results might also be of interest to those building and analyzing web scale systems. There are a number of potential future research directions. Currently, the training data we used has positive examples (matched pairs), and we obtained negative examples by randomly pairing oers with non-matched products. Ideally, we would like negative examples that are similar to positive examples, yet mismatched so that the learning algorithm can tease out subtle nuances between matching and non-matching pairs. One possibility would be to obtain such negative examples by using active learning, taking cues from [7]. Another research direction is to learn to infer interattribute correlations that can be helpful when the number of key attributes is large. Finally, it will be interesting to apply the proposed techniques to other application domains (eg. Travel, Health) where there is preponderance need for matching unstructured text to structured data records.
6. REFERENCES
[1] G. Fulgoni, State of the U.S. Retail Economy in Q2 2009, Comscore, Tech. Rep., August 20 2009. [2] W. E. Winkler, Overview of record linkage and current research directions, Bureau of the Census, Tech. Rep., 2006. [3] I. P. Fellegi and A. B. Sunter, A theory for record linkage, Journal of the American Statistical Association, vol. 64, no. 328, pp. 11831210, 1969.
[4] H. B. Newcombe, M. J. Kennedy, S. J. Axford, and A. P. James, Automatic linkage of vital records. Science, vol. 130, pp. 954959, October 1959. [5] P. Ravikumar and W. W. Cohen, A hierarchical graphical model for record linkage, in UAI, 2004. [6] A. K. Elmagarmid, P. G. Ipeirotis, and V. S. Verykios, Duplicate record detection: A survey, IEEE Trans. on Knowl. and Data Eng., vol. 19, no. 1, pp. 116, 2007. [7] S. Sarawagi and A. Bhamidipaty, Interactive deduplication using active learning, in KDD, 2002, pp. 269278. [8] R. Agrawal, R. Bayardo, C. Faloutsos, J. Kiernan, R. Rantzau, and R. Srikant, Auditing compliance with a hippocratic database, in VLDB, 2004, pp. 516527. [9] O. Benjelloun, H. Garcia-Molina, D. Menestrina, Q. Su, S. E. Whang, and J. Widom, Swoosh: a generic approach to entity resolution, The VLDB Journal, vol. 18, no. 1, pp. 255276, 2009. [10] S. Singh, K. Schultz, and A. McCallum, Bi-directional joint inference for entity resolution and segmentation using imperatively-dened factor graphs, in ECML-PKDD, 2009, pp. 414429. [11] M. A. Hernndez and S. J. Stolfo, The merge/purge a problem for large databases, in SIGMOD, 1995, pp. 127138. [12] R. Mitkov, Anaphora Resolution. Longman, 2002. [13] A. Monge and C. Elkan, The eld-matching problem: algorithm and application, in KDD, 1996. [14] W. Cohen, Integration of heterogeneous databases without common domains using queries based on textual similarity, in SIGMOD, 1998, pp. 202212. [15] M. Cochinwala, V. Kurien, G. Lalk, and D. Shasha, Ecient data reconciliation, Information Sciences, vol. 137, no. 1-4, pp. 115, 2001. [16] M. Bilenko, R. Mooney, W. Cohen, P. Ravikumar, and S. Fienberg, Adaptive name matching in information integration, IEEE Intelligent Systems, vol. 18, no. 5, pp. 1623, 2003. [17] N. Sarkas, S. Paparizos, and P. Tsaparas, Structured annotations of web queries, in SIGMOD, 2010, pp. 771782. [18] K. Q. Pu and X. Yu, Keyword query cleaning, in PVLDB, 2008, pp. 909920. [19] J. N. S. DAndrea Du Bois, A solution to the problem of linking multivariate documents, Journal of the American Statistical Association, vol. 64, no. 325, pp. 163174, March 1969. [20] M. Bilenko, S. Basu, and M. Sahami, Adaptive product normalization: Using online learning for record linkage in comparison shopping, in ICDM, 2005, pp. 5865. [21] A. Mccallum, K. Nigam, and H. L. Ungar, Ecient clustering of high-dimensional data sets with application to reference matching, in In Knowledge Discovery and Data Mining, 2000, pp. 169178.
Tags
LS755 AR-206 PC-1246S 1248 Roland FR-7 Salomon Battery RX-V596RDS W4800 CN-HDS700TD B4542 DN-U100P C 182 System Review ICD-SX55 Estate RL41wcps1-XTL Finepix J10 8000U FAX-L380 HDR-HC9E L1753TR-SF BH-607 Bluetooth Accessories Habana 125 DVD-RV20 Meter Navman S90I Xtrememusic Librarian XL ZR45 MC L60840 X125- VGN-FW21M ETH-320 TXP42V10E Digital Camera SA-730 Monitors W2252TG-PF Software 2 AM-954 Masterchef 370 Diagrams 3 0 HP-300 Printer M198WA-BZ Usb Cable KD-G151 6 5 UE-40C6820 TX-21AP1F DV8731 WM1001ECO KD-S600 I845GE SLV-FX9 25DG21C RSG5purs CQ-DFX444 DCR-IP210 Felcom 50 Speedtouch 510I 2000 SE SCD 23 CCD-TR415E Aspire E500 IC-2E Officejet 590 Assist 7977 DCR-HC20 IR3235 Kawasaki KX65 VP-L905D WR450F-2006 Audio 300 41056VH-MN Manual Pursuit 2 X-FI GO DVR-633H-S Price Yamaha A15W Navigon 12 T765 HD Champion 30 Quantim B Msac-M2 MVX200 KF510 SGH-P180 Screenplay MX BQ-550 LSP-340 MFC-210C Navigator 2005 Motorola I90C Cdxa15 Recettes E3300 CD-SR100 SA-HE75 Range Hood WT615
manuel d'instructions, Guide de l'utilisateur | Manual de instrucciones, Instrucciones de uso | Bedienungsanleitung, Bedienungsanleitung | Manual de Instruções, guia do usuário | инструкция | návod na použitie, Užívateľská príručka, návod k použití | bruksanvisningen | instrukcja, podręcznik użytkownika | kullanım kılavuzu, Kullanım | kézikönyv, használati útmutató | manuale di istruzioni, istruzioni d'uso | handleiding, gebruikershandleiding
Sitemap
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101








1. Panasonic Lumix DMC FX01, DMC FX07 Replacement Battery Charger (Incl. Car and European Plug Adapters)
2. SD SDHC MEMORY CARD 4GB for Panasonic Lumix DMC FX07 FX10 FX100 FX12 FX3 FX30 FX33 FX50 FX55 FZ18 FZ50 FZ50 FZ8 L1 L10 LS60 LS70 LS75 LX2 LZ2 LZ6 LZ7 TZ2 TZ3 Digital Camera
3. (Many Color Available) Kroo Camera Case for Panasonic Lumix DMC Series + Screen Protector Kit (Many Color Available)
4. Underwater Waterproof Case for Panasonic Lumix DMC FX7,07,8,9,10
5. Hard Nylon Black Camera Zip Case for Panasonic Lumix DMC FS20, FS3, FS5, FX01, FX07, FX10, FX12 + Ultra Flex Tripod + Screen Protector + Wisdom Courage Wristband
6. Dicapac Waterproof Digital Camera Case for Panasonic Lumix



