OpenML
Online-Retail-II-UCI

Online-Retail-II-UCI

active ARFF CC0: Public Domain Visibility: public Uploaded 23-03-2022 by Onur Yildirim
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
  • Computer Systems Machine Learning
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Context This Online Retail II data set contains all the transactions occurring for a UK-based and registered, non-store online retail between 01/12/2009 and 09/12/2011.The company mainly sells unique all-occasion gift-ware. Many customers of the company are wholesalers. Content Attribute Information: InvoiceNo: Invoice number. Nominal. A 6-digit integral number uniquely assigned to each transaction. If this code starts with the letter 'c', it indicates a cancellation. StockCode: Product (item) code. Nominal. A 5-digit integral number uniquely assigned to each distinct product. Description: Product (item) name. Nominal. Quantity: The quantities of each product (item) per transaction. Numeric. InvoiceDate: Invice date and time. Numeric. The day and time when a transaction was generated. UnitPrice: Unit price. Numeric. Product price per unit in sterling (). CustomerID: Customer number. Nominal. A 5-digit integral number uniquely assigned to each customer. Country: Country name. Nominal. The name of the country where a customer resides. Acknowledgements Chen, D. Sain, S.L., and Guo, K. (2012), Data mining for the online retail industry: A case study of RFM model-based customer segmentation using data mining, Journal of Database Marketing and Customer Strategy Management, Vol. 19, No. 3, pp. 197-208. doi: [Web Link]. Chen, D., Guo, K. and Ubakanma, G. (2015), Predicting customer profitability over time based on RFM time series, International Journal of Business Forecasting and Marketing Intelligence, Vol. 2, No. 1, pp.1-18. doi: [Web Link]. Chen, D., Guo, K., and Li, Bo (2019), Predicting Customer Profitability Dynamically over Time: An Experimental Comparative Study, 24th Iberoamerican Congress on Pattern Recognition (CIARP 2019), Havana, Cuba, 28-31 Oct, 2019. Laha Ale, Ning Zhang, Huici Wu, Dajiang Chen, and Tao Han, Online Proactive Caching in Mobile Edge Computing Using Bidirectional Deep Recurrent Neural Network, IEEE Internet of Things Journal, Vol. 6, Issue 3, pp. 5520-5530, 2019. Rina Singh, Jeffrey A. Graves, Douglas A. Talbert, William Eberle, Prefix and Suffix Sequential Pattern Mining, Industrial Conference on Data Mining 2018: Advances in Data Mining. Applications and Theoretical Aspects, pp. 309-324. 2018.

8 features

Invoicestring53628 unique values
0 missing
StockCodestring5305 unique values
0 missing
Descriptionstring5697 unique values
4474 missing
Quantitynumeric1057 unique values
0 missing
InvoiceDatestring47635 unique values
0 missing
Pricenumeric2807 unique values
0 missing
Customer_IDnumeric5942 unique values
243007 missing
Countrystring43 unique values
0 missing

19 properties

1067371
Number of instances (rows) of the dataset.
8
Number of attributes (columns) of the dataset.
Number of distinct values of the target attribute (if it is nominal).
247481
Number of missing values in the dataset.
243007
Number of instances with at least one value missing.
3
Number of numeric attributes.
0
Number of nominal attributes.
0
Number of attributes divided by the number of instances.
37.5
Percentage of numeric attributes.
Percentage of instances belonging to the most frequent class.
0
Percentage of nominal attributes.
Number of instances belonging to the most frequent class.
Percentage of instances belonging to the least frequent class.
Number of instances belonging to the least frequent class.
0
Number of binary attributes.
0
Percentage of binary attributes.
22.77
Percentage of instances having missing values.
Average class difference between consecutive instances.
2.9
Percentage of missing values.

0 tasks

Define a new task