n the mass of clients manually analyze their behavior and basd on the results obtained try to program the rules for generating sentences. But this approach raises many questions such as: Have we defind traits correctly? Have we identifid all significant types? How many characteristic representatives of each type have we chosen? how accurate are our conclusions about causal relationships in the behavior of these people that we observe? If we could analyze in detail the behavior of a large number of randomly selectd customers – thousands or even tens of thousands – we would probably in good conscience answer positively to all such questions-doubts.

After all you cannot do such a volume of work manually And this is where machine learning will help us. Clustering So we ned to highlight the characteristic types of customers. What is the easiest thing to do here? Our computer loves numbers. Let's take and write down the characteristics of the client in the form of an orderd set of features. For example like this: age gender region of residence how many years the bank has been serviced the presence of deposits the presence of cards the presence of loans the balance in deposits the balance on cards debt on loans. In fact of course there can be much more such signs.

Some of them are numerical in nature and this is good for a computer. And the signs gender and region of residence are categorical that is they can be representd by a value from a fixd list. They can be decomposd into a set of smaller features like the client lives in Moscow the client lives in Novosibirsk etc. with the values ​​yes or no and then encode yes as the number and no as the number This will reduce the categorical features to numerical ones. Now we can replace our original feature set with a vector of numbers x x n This vector can be viewd as a point in a multidimensional space. If we draw all of our clients in this very space then we can probably see their clumps with the nakd eye.

